blog scraper

  • Author
    Posts
  • #229482

    inkandpages
    Member

    I know this topic has been discussed before, but I never saw a satisfactory solution to it (if so, I missed it, sorry). There is, for example, a “powered by WordPress blog” called “Politics in America” at http://www.truthfulnews.com/ that is such a site and it’s scraping and sucking content from all over WordPress (used them as incoming links rather than comments). Is this just one more indignity we must tolerate? We can mark comments as spam. Why can’t we mark scrapers as such?

    #229707

    inkandpages
    Member

    Opps. Make that url http://www.truthfulnews.com/2008/05/ (site doesn’t load without the date after the slash)

    #229709

    I’d go to the blog and mark it as spam, if I were you. Top right corner of the dashboard.

    #229713

    inkandpages
    Member

    Not sure what you mean. I don’t see a place that says “spam” in the top right corner of the dashboard, especially not for incoming links. Could you be a bit more clear please? Thanks

    #229715

    Oh, I did not click on link. I was under the impression that it was a wordpress.com blog; which it is not.

    #229716

    inkandpages
    Member

    Thanks for trying. I wrote “powered” by WordPress. Can we only mark WordPress blogs as spam then? I guess there’s no remedy for this *%*&^) problem.

    #229719

    Yeah, we can only mark wordpress.com as spam. But you could always contact that blogger, if you were so inclined.

    #229721

    my guess it is Obambi Blog, as calling him out on his BS, got my posts redirected to this same site you mention.

    #229729

    raincoaster
    Member

    It’s a news aggregator rather than a straight-up blog scraper; they’re getting to be more common. Just contact the admin and ask to be taken off the list of blogs they post. If that doesn’t work you MAY be able to contact the web host, but because the sections posted are so small, he could well have a fair use defence.

    #229870

    inkandpages
    Member

    If I knew how to contact the admin or web host I would do it. But all “contact” and “comment” functions for the ordinary human being are unavailable at this “news aggregator.” So the lesson here is: Move along, nothing to see here. Let all content thieves grab whatever wherever whenever. The web is free, dude, and that’s the ultimate “fair use” disclaimer.

    <rage>Yet Monsanto claims it can copyright and own life itself.</rant>

    #229872

    The administrative information for the truthfulnews.com domain is available via whois: http://www.whois.net/whois_new.cgi?d=truthfulnews&tld=com

    #229880

    can’t we report it to Google? i think i read that we can – somewhere in a forum thread. not sure though!

    #229881

    dlager
    Member

    @ raincoaster is this not a splog?
    I will drop the link at splog watch sites if it is…
    But I am not sure what a “news aggregator “is… or if it is intrusive.

    #229882

    I’m not sure, but it wouldn’t surprise me. However, that will (at best) change whether Google shows the site in search results or not. All of the other search engines would need to be contacted as well.

    #229883

    dlager
    Member

    this is a new idea that someone has:

    http://exposingsploggers.wordpress.com/

    I have a blogroll link to them and some other people do to…
    If you think this is a content -theft from a splog lumineria check it out.

    #229884

    I just took a quick look at the site. Like a couple of others I’ve seen now, it doesn’t actually grab the entire post. What it does is to grab a small excerpt of a post and show it as a teaser; in order to read any more of it, you have to actually go to the original source. In most cases, I expect raincoaster is right and this would qualify as fair use.

    One way to look at it is this: if you treat their pingbacks as spam, you’ll end up with an incoming link that isn’t part of a link exchange. From what I’ve heard, that should (in theory) make your post more appealing to the search engines, and get you more traffic. Since it’s set up as a blog, it’s probably also adding to things like your authority in technorati. I’m not sure I like the fact that someone operating a bunch of news aggregators can shape opinion on the internet the same way that mainstream media does, but that seems to be a possibility.

    #229931

    inkandpages
    Member

    Thanks to all who have replied to this. No doubt it’s part of the evolution of the ‘net. I guess we can all agree it’s user beware. :) Time Thief sent me to this site.
    http://onecoolsite.wordpress.com/2008/05/10/splog-off-dealing-with-content-theft/
    I haven’t looked at it yet, but maybe it will be helpful to others?

    I understand the “fair use” idea and the theoretical strengthening of “authority” by getting hit by these splogs or news aggregators, or whatever you want to call them, but no matter what you call them, I see them as intrusions by bots, not responses by conscious/intelligent human beings.

    Call me a purist, but IMHO these bots degrade and dehumanize the blogging experience. The only choice I see is to kill my computer along with my television. Gaaaaaah.

    #229940

    dlager
    Member

    well i got one of these too, today:

    http://sdaomega.wordpress.com/2008/05/11/spiritualformation/

    It ping a static page of mine on breathing exercises… (no tags on static pages)

    And my page on breathing exercises, which it took a short excerpt from, has nothing to do with “spirituality.

    i have no problem with this ping, but i do not understand the motive for these “aggregators.”

    #229941

    dlager
    Member

    In fact this site is splogging and taking snippets of posts and turning them into downloadable PDF and doc files!! never seen this kind of splog before!

    #229942

    ellaella
    Member

    I haven’t seen that before either. I get splogged 12- 30 times per day and while I don’t bother checking them all anymore, I do check randomly. And that’s a new one on me.

The topic ‘blog scraper’ is closed to new replies.