Search engine related features
Some ideas for controlling search engines:
Deleted blogs’ robots.txt
All deleted blogs should have their robots.txt set to Disallow: /
Stop wp-login.php from being indexed by adding <meta name=”robots” content=”noindex” />
An option to disable archiving by search engines by adding <meta name=”robots” content=”noarchive” /> and Disallow the ia_archiver robot.
An option to disable pings to ping servers (such as weblogs.com.)
An option to disable feeds should help to keep out some bots, (although it’s quite a sacrifice.)
Controlling search engines for a better pagerank?
No, it won’t boost your Google PageRank. It’s about protecting privacy.
Please don’t forget to send staff a feedback including these ideas in it.
Um, we already have this feature. Dashboard -> Options -> privacy -> Don’t allow search engines. Turns off all of that.
Thanks drmike :)
@ drmike, people forget to disallow search engines before they delete a blog.
@ timethief, ok, I’ve send a feedback, thanks.
Actually I believe the header for a deleted blog page is a 404 report that the search engines should be picking up on.
Would it really matter though? Once the blog’s content is gone and the same message is repeated over and over again to teh spiders, the site would be dropped fairly quickly.
The internet archive doesn’t drop it, unless the robots.txt is set to disallow.
If IA isn’t obeying ‘noindex,nofollow,’ that’s an issue you may want to bring up with them. When you choose the “Do not let search engines in” option, the following is placed within the header:
<meta name='robots' content='noindex,nofollow' />
You can also opt out via email at info at archive dot org.
You can also set your blog to be private with the third option on that page. Gotta admit that even with the privacy setting set to 2 or 3, some search engines will index your site. That’ll happen even with a robots.txt file. If you want privacy, that’s probably going to be your bestoption. Either that or finding a host, installing the software yourself, and password protecting the directory teh blog sits in.
IA obeys those tags. The problem is when you delete a blog, IA doesn’t remove it.
An easy way to solve problem 1 and 3 is to disallow ia_archiver from wordpress.com.
Thanks for contributing.
There’s not contributing if they’re posting here in the forums. They need to send this in via feedback on Monday.
Again, if IA isn’t obeying ‘noindex,nofollow’ then teh issue is with them. If they’re not obeying internet standards, then they are the cause of their own issue.
Sorry — I assumed that the blogger would follow the instructions I gave him in the third post above and send in a feedback to staff.
@ drmike, IA copies everything from everywhere and keeps every copy forever and publicly. They do obey meta-tags, but the meta-tags are not retroactive in IA. So the robots.txt is the only option here.
@ timethief, I’ll send in an extra feedback to staff tomorrow.
You can also opt out as noted up above. Says that on their website.
Thanks for replying and letting us know you have sent in a feedback and will send in another one.
The Internet Archive also uses archive.org_bot and ia_archiver-web.archive.org
Hi is it possible to get a list of all search engine terms used to come to my blog? If yes then how? I would like to know about it…
The topic ‘Search engine related features’ is closed to new replies.