"Google has been inundated with questions as to why pages are not showing up in the index, only to explore the issue and find out that the only way to get to the pages in question is to submit a form of some type. The most obvious is corporate home pages where the user has to select the country / region in a drop down (Matt's example). Until this new release, Google couldn't crawl the pages from the home page. Other examples include product selection, category information where you have to tell the site, via a form, what you want. Web masters and publishers have be frustrated by their in ability to get a lot of content indexed because managing it requires data driven applications and the use of forms. This is Google's attempt to rectify the problem."
For those really worried about this, blocking the bot from sub pages can be done."
Matt Cutts has a good post on this.
I think another aspect of blocking the bot is the robot.txt. As Matt says, "If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form." These URLs should probably be part of the robot.txt file anyway. But if not, this should not be too arduous a task to add them.
Any way, like so many other "things" Google, this seems bigger at first than it will in hind sight.
I think another aspect of blocking the bot is the robot.txt. As Matt says, "If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form." These URLs should probably be part of the robot.txt file anyway. But if not, this should not be too arduous a task to add them.
Any way, like so many other "things" Google, this seems bigger at first than it will in hind sight.
No comments:
Post a Comment