XML sitemap - Developer's Guide | Digital Marketing Agency India

XML sitemap - Developer's Guide

Get Hand-picked Digital Marketing News Inboxed Weekly

Developers guide to create XML sitemap effective for SEO


Quick note to programmers:
These are the steps to follow:
1. Add the required fields in all the tables where we have urls for the website (see the below section)
2. See if the last modified date for all the pages are getting updated (see the section “when is a page
considered updated”)
3. See the sitemap and sitemap index format
4. Decide whether we will have one sitemap or many sitemaps (Till the time we reach 50,000 links,
we can have it with one sitemap)
5. See the section for dynamic sites
Sitemap Introduction
A sitemap is a way of organizing a website, identifying the URLs and the data under each section.
Previously, the sitemaps were primarily geared for the users of the website. However, Google's XML
format was designed for the search engines, allowing them to find the data faster and more efficiently.
Google's new sitemap protocol was developed in response to the increasing size and complexity of
websites. Business websites often contained hundreds of products in their catalogues; while the popularity
of blogging led to webmasters updating their material at least once a day, not to mention popular
community-building tools like forums and message boards. As websites got bigger and bigger, it was
difficult for search engines to keep track of all this material, sometimes "skipping" information as it
crawled through these rapidly changing pages.
Through the XML protocol, search engines could track the URLs more efficiently, optimizing their search by
placing all the information in one page. XML also summarizes how frequently a particular website is
updated, and records the last time any changes were made.
XML sitemaps were not, as some people thought, a tool for search engine optimization. It does not affect
ranking, but it does allow search engines to make more accurate rankings and searches. It does this by
providing the data that a search engine needs, and putting it one place-quite handy, given that there are
millions of websites to plough through.
Guidelines for Sitemap:

  • A Sitemap file can contain no more than 50,000 URLs and must be no larger than 50MB when uncompressed. If your Sitemap is larger than this, break it into several smaller Sitemaps. These limits help ensure that your web server is not overloaded by serving large files to Google.
  • If you have more than one Sitemap, you can list them in a Sitemap index file and then submit the Sitemap index file to Google. You don't need to submit each Sitemap file individually.
  • The Sitemap URL must be UTF8-encoded, and encoded for readability by the webserver on which it is located.
  • If your site is accessible on both the www and non-www versions of your domain, you don’t need to submit a separate Sitemap for each version. However, we recommend picking either the www or the non-www version, and using recommended canonicalization methods to tell Google which version you are using.
  • If you’re considering hiring a consultant to help you optimize your Sitemaps, we recommend reading our recommendations on working with Search Engine Optimizers (SEOs). In addition, you should be familiar with our Webmaster Guidelines and our SEO Starter Guide. It can also be usefulto check with colleagues with similar sites or businesses.
  • A Sitemap file is independent of the language of the content. To make sure that each language version can be crawled and indexed, use unique URLs. These URLs can all be included in your Sitemap files.

#1: Multiple sitemap (when more than 50,000 pages or can have more than 50,000 pages):
Multiple sitemap should be generated for all categories, sub categories and the products. They should be
named as sitemap-category.xml, sitemap-subcategory.xml, sitemap-products.xml, sitemap-pages.xml etc
(the basis of choosing this should per table one sitemap, see details below). One sitemap file should
include URLs of all pages in that table. If it exceeds 50,000 then we shall divide that table into few
sitemaps using some field in the table like using category field for products table (sitemap-products-
mobile.xml etc)
Sitemap Index File:
Once all the sitemaps are completed, a major sitemap should be created and named as
“Sitemapindex.xml” which should contain link to all the mini sitemaps.
http://www.google.com/support/webmasters/bin/answer.py?answer=71453 the syntax of index sitemap
will be:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
Following is a standard XML sitemap syntax
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>http://www.sitemaps.org/protocol.php - Details can be found here.
This contains following information:
1. Loc: This should be the perma link of the page (There should not be any duplicate urls here)
2. Lastmod: This is the trickiest one, what should be considered last mod (explained below)
3. Changedfreq: This can be ignored, let’s not add this value but keep an option in the database
(table) for this, we may like some of the pages to be crawled faster.
4. Priority: Let’s keep it blank as well, let’s not add this value but keep an option in the database
(table) for this, we may like some of the pages to be crawled faster.
For a dynamic website, how it should happen:
Let’s assume that the pages are coming from following tables:
1. Category
2. Pages
3. Products
4. Static
5. Table 5 etc
For all the tables there should be a different sitemap. One sitemap file should include URLs of all pages in
that table. If it exceeds 50,000 then we shall divide that table into few sitemaps using some field in the
table like using category field for products table (sitemap-products-mobile.xml etc)
Each of these tables should have following fields
1. URL – Make sure it is put in full like http://www.example.com/cat1/url.html (A good developer will
only save the url not the domain in the table and attach the domain on the fly, so in the table it
shall be cat1/url.html.
2. Last modified date – Whenever that page is updated (see the section when is a page considered
updated section)
3. Change freq – (na by default default and this field is not added to xml sitemap if na)
4. Priority kept (-1 by default and this field is not added to xml sitemap if -1)
Make a script that runs every 1 hour and updates the sitemap xmls and zips it by taking the fields from
the table, arranged in the order of last modified date (latest on the top).
Also create a RSS file for each of the table using last modified date with latest 20 to 200 items depending
on how many items are updated in a day. http://en.wikipedia.org/wiki/RSS
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<title>RSS Title</title>
<description>This is an example of an RSS feed</description>
<lastBuildDate>Mon, 06 Sep 2010 00:01:00 +0000 </lastBuildDate>
<pubDate>Mon, 06 Sep 2009 16:45:00 +0000 </pubDate>
<title>Example entry</title>
<description>Here is some text containing an interesting
<link>url of the page</link>
<guid>url of the page</guid>
<pubDate>Mon, 06 Sep 2009 16:45:00 +0000 </pubDate>
Create RSS for all tables and all sections which are updated frequently like:
1. News
2. Blog
3. Forums
4. Products
5. etc
Sitemap in the robots.txt
Mention sitemap in the robots.txt, mention the sitemap index in the robots.txt
Add this line to the example.com/robots.txt
sitemap: http://www.example.com/sitemap.xml

What may be considered to be an “update”?

  • Any major content change on the page (Add/Del/Edit)
  • A comment in a particular page, product review submitted by the user
  • Any change in the sub category page of the page (Say a page mobile category is there, when there is an update in the sub category and that is reflected on the main category page, the category page last modified date shall also be considered updated)

Useful URLs:
Some of the use URLs collected from different sources. Most of the sources is not describing the best way
to create a sitemap.
1) http://www.google.com/support/webmasters/bin/answer.py?answer=183668 – How to create a
2) http://www.google.com/support/webmasters/bin/answer.py?answer=156184
3) http://www.google.com/support/webmasters/bin/answer.py?answer=183669
4) http://www.google.com/support/webmasters/bin/answer.py?answer=71936

TechShu Consultancy Pvt. Ltd
5/5 stars
Techshu is doing very good work for me. I am continuing to use them for this ongoing project