Search engine related features

  1. Some ideas for controlling search engines:

    Deleted blogs' robots.txt
    All deleted blogs should have their robots.txt set to Disallow: /

    Wp-login index
    Stop wp-login.php from being indexed by adding <meta name="robots" content="noindex" />

    Disable archiving
    An option to disable archiving by search engines by adding <meta name="robots" content="noarchive" /> and Disallow the ia_archiver robot.

    Disable pings
    An option to disable pings to ping servers (such as

    Disable feeds
    An option to disable feeds should help to keep out some bots, (although it's quite a sacrifice.)

  2. Controlling search engines for a better pagerank?

  3. No, it won't boost your Google PageRank. It's about protecting privacy.

  4. @derekblog
    Please don't forget to send staff a feedback including these ideas in it.

  5. Um, we already have this feature. Dashboard -> Options -> privacy -> Don't allow search engines. Turns off all of that.

  6. Thanks drmike :)

  7. @ drmike, people forget to disallow search engines before they delete a blog.
    @ timethief, ok, I've send a feedback, thanks.

  8. Actually I believe the header for a deleted blog page is a 404 report that the search engines should be picking up on.

    Would it really matter though? Once the blog's content is gone and the same message is repeated over and over again to teh spiders, the site would be dropped fairly quickly.

  9. The internet archive doesn't drop it, unless the robots.txt is set to disallow.

  10. If IA isn't obeying 'noindex,nofollow,' that's an issue you may want to bring up with them. When you choose the "Do not let search engines in" option, the following is placed within the header:

    <meta name='robots' content='noindex,nofollow' />

    You can also opt out via email at info at archive dot org.

    You can also set your blog to be private with the third option on that page. Gotta admit that even with the privacy setting set to 2 or 3, some search engines will index your site. That'll happen even with a robots.txt file. If you want privacy, that's probably going to be your bestoption. Either that or finding a host, installing the software yourself, and password protecting the directory teh blog sits in.

  11. IA obeys those tags. The problem is when you delete a blog, IA doesn't remove it.

  12. An easy way to solve problem 1 and 3 is to disallow ia_archiver from

  13. Thanks for contributing.

  14. There's not contributing if they're posting here in the forums. They need to send this in via feedback on Monday.

    Again, if IA isn't obeying 'noindex,nofollow' then teh issue is with them. If they're not obeying internet standards, then they are the cause of their own issue.

  15. Sorry -- I assumed that the blogger would follow the instructions I gave him in the third post above and send in a feedback to staff.

  16. @ drmike, IA copies everything from everywhere and keeps every copy forever and publicly. They do obey meta-tags, but the meta-tags are not retroactive in IA. So the robots.txt is the only option here.
    @ timethief, I'll send in an extra feedback to staff tomorrow.

  17. You can also opt out as noted up above. Says that on their website.

  18. Thanks for replying and letting us know you have sent in a feedback and will send in another one.

  19. The Internet Archive also uses archive.org_bot and

  20. Hi is it possible to get a list of all search engine terms used to come to my blog? If yes then how? I would like to know about it...

  21. If you go to your blog stats in the WordPress dashboard (Dashboard > Blog Stats) there's an option to view your search engine terms. If you click it it will show you the search engine terms for the last 7 days; if you change the line in your browser's address bar from 7&blog to something like 100&blog, then you can see the stats back for as far as you like.

    Btw, in future if you have a question which isn't exactly the same as the existing topic, you're better off starting a new thread; if we see 20 or more replies we tend to think the question has already been resolved, so we might not see it.

