Need help? Check out our Support site, then


Robots.txt

  1. Hey all,

    I started a new blog (link-juice.co.uk) , and it started to get indexed in google. So all was well, then all of a sudden some of the pages were unindexed and slowly but surely they're all being unindexed...

    I've taken a look at what google's doing (using webmaster tools) and it says "URLs restricted by robots.txt (25)" which turns out to be everything. The robots.txt file looks like this

    User-agent: IRLbot
    Crawl-delay: 3600

    User-agent: *
    Disallow: /next/

    User-agent: *
    Disallow:

    From what i can tell this is the default... any thoughts as to what i can do?

    Cheers

  2. Check under options privacy and make sure the first selection, "I would like my blog to appear in search engines like Google and Sphere, and in public" is selected and then click "update options." If it is selected, select one of the others and then click update, and then go back and select the first one again and again click "update options."

    I would also contact staff about it (not sure if support will be open for business today or not) at: http://wordpress.com/contact-support/ .

  3. No need to bother staff; that's the default robots.txt everyone has, and it's uneditable. (previous answer here). If Google are eliminating you from their indexes that's Google's responsibility, not wp.com's.

  4. @wank
    Thanks for the clarification.

  5. Thanks for the replies guys

    If google are "eliminating" me from their index because of the robots.txt which i cant edit then it is a WP problem...

    I've done a bit more research into this, with webmaster tools has a robots.txt verifcation tool that allows you to see if pages are allowed/disallowed based on your current robots.txt file. I've run through all my indexed pages and its currently saying that they are all "allowed". This is a complete contradiction to what they say in the webcrawl stats where all my pages are "blocked by robots.txt".

    Slightly confused by this...

  6. If it's showing that they are "allowed" then I would be contacting Google as it appears they've got an error or problem of some sort on their end.

  7. Ok... This is getting more and more baffling as we speak.

    So, a day or so after this discussion, i did a site:link-juice.co.uk search and saw that the site was being indexed again. I went to webmaster tools, and saw that it had picked up the proper robots.txt (discplayed in my first post).

    Today i've done the same search, and we're down to just 2 pages indexed again. I've logged into webmaster tools (prepare to be confused)...Google says:
    ---------------------------------------
    robots.txt URL http://link-juice.co.uk/robots.txt
    Last downloaded November 26, 2007 3:55:50 AM PST
    Status 200 (Success) [?]
    ---------------------------------------

    It then displays the robots.txt file (the above). I can test this file in webmaster tools and all pages are allowed. However, when you visit link-juice.co.uk/robots.txt, its gone back to:
    ---------------------------------
    User-agent: *
    Disallow: /
    ---------------------------------

    I logged into wordpress and went to privacy to see that it still says:
    "I would like my blog to appear in search engines like Google and Sphere, and in public listings around WordPress.com. "

    I'm not usually one for swearing in three letter acronyms, but WTF?

  8. hmm, this http://link-juice.co.uk/robots.txt is really Worse Than a Failure.

    I'm baffled as well.

  9. lol!

    Ummm, I've sent wordpress.com a question and am waiting for a response... here's hoping they get back to me.

    Cheers for all your feedback guys!

  10. tvsportstonight
    Member

    The crawl-delay of 3600 appears to be wordpress.com's way of telling Google to not "webcrawl" more than one page every 3600 seconds on that site. Basically telling the webcrawler to go away.
    Good for reducing load on wordpress.com's servers. Not so good for bloggers wanting to get indexed.

  11. Moot, as it seems the OP has moved off WordPress.com.

  12. tvsportstonight
    Member

    I'm thinking of doing the same if I can't get indexed.

  13. If you think your WordPress.com blog's robots.txt is incorrect, please contact support.

    Blogs are indexed by search engines but it sometimes takes a while.

Topic Closed

This topic has been closed to new replies.

About this Topic