Google has released a new tool/project to aid in the indexing of sites.
According to one of their engineers:
“It’s a beta “ecosystem” that may help webmasters with two current challenges: keeping Google informed about all of your new web pages or updates, and increasing the coverage of your web pages in the Google index.”
They’re calling it Google Sitemap Protocol (GSP) 🙂
The project is hosted on sourceforge and is open source (under a creative commons license) using Python for the generation of the actual xml files etc., that are required.
More information is available on the Google help pages and there is also an interview over on Danny Sullivan’s blog which goes into it in some depth.
It’s an interesting idea and I’d be interested in experimenting with it. It certainly does have some interesting features:
” ‘accesslog’ nodes tell the script to scan Apache-style webserver
log files to extract URLs on your site.”
So it will learn about pages on your site from your access logs – interesting, but what about the pages nobody ever visits?
It also has a new take on the robots exclusion:
Filters specify wild-card patterns that the script compares
against all URLs it finds. Filters can be used to exclude
certain URLs from your Sitemap, for instance if you have
hidden content that you hope the search engines don’t find.
So you can basically exclude pages from the sitemap.
The configuration file and the sitemap it produces are XML and the actual definition can be seen at:
http://www.google.com/schemas/sitemap/0.84/siteindex.xsd
You basically setup your configuration file to reflect your site and then get the python script to generate the actual sitemap for you and “ping” Google with the updated information.
As its open source there’s nothing to stop you from hacking it to death and getting it to the same thing for another spider. Or, if webmasters could agree on a common location for the outputted file, there would be nothing stopping other spiders, such as Slurp from grabbing info from the XML.
It holds a wealth of possibilities as it uses an open standard.
And if you’re using WordPress there’s already a plugin available (which I’ve already installed)!
Don’t forget to submit the map at:
https://www.google.com/webmasters/sitemaps/
Leave a Reply