Access to Raw HTML and Robots.txt
Questions on accessing underlying infrastructure:
1) Is there a way to get access to the raw HTML for each content page on my blog? I want to be able to set meta tags, and then for any images I of course want to be able to customize the ALT tag.
2) Do we have access to robots.txt? I have been told by several people that access to robots.txt is important as well, to customize what search engines access on your site.
3) I assume that I can at minimum set the exact string to be found in the Title of each content page?
The blog I need help with is poetrydude66.wordpress.com.
1) you can read it, but you can’t change it. Of course you can change the alt text on any image right in the upload or edit image box.
2) no. You are reading advice that doesn’t apply to your blog. I suggest you read up on WordPress.COM, not WordPress.ORG, to avoid wasting your time.
3) of course.
Regarding images, on Edit Image I see a Caption, Alternative Text, and Description. Which options in the IMG tag do those correspond to?
Regarding robots.txt, there really isn’t a good reason they couldn’t give you access to it. I understand you are not allowed to control it, but that’s not a good thing.
Do they generate sitemaps automatically, and what is naming convention for those?
Do you want to debate whether or not these are “good reasons?” I’m not here for that. I just answer technical questions. Most of the answers are already in the Support documents http://support.wordpress.com/
The alt test on an image is on the LINK.
Won’t debate robots.txt, per your request.
Are the Caption, Alternative Text, and Description fields for each image in the Library put into some kind of meta tag for the image?
Where do they put sitemaps?
Have you gone through the Support documents yet?
Yes, but I wasn’t asking about the semantics of each field. I was asking about the actual implementation and which HTML tags are being set from those values.
No problem I’ll just create content and inspect HTML in future.
After making my blog public and adding it to Google Webmaster Tools, I tried to add sitemap.xml to Google Webmaster Tools. Google is telling me that my robots.txt forbids them from gathering the sitemap.xml pages.
I assume my robots.txt was updated when I went public? Maybe Google is referring to an old cached version it might be keeping?
Any ideas on why this happens?
Yes, because your blog was set to Private. It will take a number of weeks, possibly up to two months, for Google to re-index your site, no matter what you do on this end because their cached version of your site indicates it is not to be crawled.
To expedite search engine indexing of your content see here > http://onecoolsitebloggingtips.com/2010/01/21/omg-i-cant-find-my-blog-on-google/
The topic ‘Access to Raw HTML and Robots.txt’ is closed to new replies.