I started a new blog (link-juice.co.uk) , and it started to get indexed in google. So all was well, then all of a sudden some of the pages were unindexed and slowly but surely they’re all being unindexed…
I’ve taken a look at what google’s doing (using webmaster tools) and it says “URLs restricted by robots.txt (25)” which turns out to be everything. The robots.txt file looks like this
From what i can tell this is the default… any thoughts as to what i can do?
Check under options privacy and make sure the first selection, “I would like my blog to appear in search engines like Google and Sphere, and in public” is selected and then click “update options.” If it is selected, select one of the others and then click update, and then go back and select the first one again and again click “update options.”
I would also contact staff about it (not sure if support will be open for business today or not) at: http://wordpress.com/contact-support/ .
No need to bother staff; that’s the default robots.txt everyone has, and it’s uneditable. (previous answer here). If Google are eliminating you from their indexes that’s Google’s responsibility, not wp.com’s.
Thanks for the replies guys
If google are “eliminating” me from their index because of the robots.txt which i cant edit then it is a WP problem…
I’ve done a bit more research into this, with webmaster tools has a robots.txt verifcation tool that allows you to see if pages are allowed/disallowed based on your current robots.txt file. I’ve run through all my indexed pages and its currently saying that they are all “allowed”. This is a complete contradiction to what they say in the webcrawl stats where all my pages are “blocked by robots.txt”.
Slightly confused by this…
If it’s showing that they are “allowed” then I would be contacting Google as it appears they’ve got an error or problem of some sort on their end.
Ok… This is getting more and more baffling as we speak.
So, a day or so after this discussion, i did a site:link-juice.co.uk search and saw that the site was being indexed again. I went to webmaster tools, and saw that it had picked up the proper robots.txt (discplayed in my first post).
Today i’ve done the same search, and we’re down to just 2 pages indexed again. I’ve logged into webmaster tools (prepare to be confused)…Google says:
robots.txt URL http://link-juice.co.uk/robots.txt
Last downloaded November 26, 2007 3:55:50 AM PST
Status 200 (Success) [?]
It then displays the robots.txt file (the above). I can test this file in webmaster tools and all pages are allowed. However, when you visit link-juice.co.uk/robots.txt, its gone back to:
I logged into wordpress and went to privacy to see that it still says:
“I would like my blog to appear in search engines like Google and Sphere, and in public listings around WordPress.com. “
I’m not usually one for swearing in three letter acronyms, but WTF?
hmm, this http://link-juice.co.uk/robots.txt is really Worse Than a Failure.
I’m baffled as well.
Ummm, I’ve sent wordpress.com a question and am waiting for a response… here’s hoping they get back to me.
Cheers for all your feedback guys!
The crawl-delay of 3600 appears to be wordpress.com’s way of telling Google to not “webcrawl” more than one page every 3600 seconds on that site. Basically telling the webcrawler to go away.
Good for reducing load on wordpress.com’s servers. Not so good for bloggers wanting to get indexed.
Moot, as it seems the OP has moved off WordPress.com.
I’m thinking of doing the same if I can’t get indexed.
The topic ‘Robots.txt’ is closed to new replies.