+ Svara på ämne
Visar resultat 1 till 9 av 9

Ämne: All sites that use live ODP data are down now...

  1. #1
    browser007
    Visitor

    All sites that use live ODP data are down now...

    ... while you can easily access dmoz.org itself.

    What's happening? Are we not authorized to use odp data anymore?

    You can do the test by yourself, go to:
    http://dmoz.org/Computers/Internet/S...ull-index.html
    and pick a site of your choice: unless they cache some pages on their own server, there's no information from dmoz.org


  2. #2
    Member Editor giz is on a distinguished road
    Reg.datum
    maj 2002
    Inlägg
    1 556

    Re: All sites that use live ODP data are down now.

    Maybe they should be using http://ch.dmoz.org/ instead?
    ODP Editor g1smd

  3. #3
    browser007
    Visitor

    Re: All sites that use live ODP data are down now.

    Are you suggesting that if all these sites use ch.dmoz.org, there are no problems at all?




  4. #4
    browser007
    Visitor

    Re: All sites that use live ODP data are down now...

    Apparently, a techie has blocked access to sites that use ODP data since Saturday, September 27 (last time my server cached a file was 27-Sep-2003 3:40 PM EST)

    PS. It's not only about my site, but about ALL sites that use ODP live data :-(






  5. #5
    Moderator Meta windharp is on a distinguished road windharps avatar
    Reg.datum
    apr 2002
    Ort
    Germany
    Inlägg
    4 363

    Re: All sites that use live ODP data are down now...

    You did try to access dmoz.org by hand? Most likely it was no "techie" that blocked access but server load that prevents you from accessing it.

    ch.dmoz.org is a bit out of date as is de.dmoz.org. Both are external mirrors at different places (hint: That are countrycodes in fromt of dmoz.org ) so they don't put load on our main server.
    ODP Meta Editor windharp
    Wichtige Links: Deutsche ODP FAQ / Deutsche ODP-Richtlinien
    Important Links: English ODP FAQ / English ODP Guidelines


  6. #6
    Moderator Meta theseeker is on a distinguished road
    Reg.datum
    mar 2002
    Ort
    Spokane, WA
    Inlägg
    306

    Re: All sites that use live ODP data are down now...

    When these types of scripts that use live data first showed up, dmoz staff asked the writers of the script to aim them at some other server, like the netscape servers (though I suspect that may not be allowed anymore either). But the programmers making the scripts wanted the most up to date data, and so eventually ignored that request.

    Since the dmoz.org system was not made to handle a lot of traffic (that's why the data is distributed through the RDF), the number of robots and screen scraping programs have been slowing the servers down for quite some time. I'm quite surprised that it's taken this long, but from all the signs, I would say that sites taking data directly from the public servers are going to be blocked now.

    I suggest exploring other avenues, like processing the RDF. The mirror servers are probably not the solution. They are provided free of charge, and the people providing them are probably not going to keep providing the types of resources it would take to satisfy all the sites that want live data.


  7. #7
    browser007
    Visitor

    Re: All sites that use live ODP data are down now...

    Thank you for this information.
    I'm trying to find a way to chop these huge RDF dumps in manageable pieces, instead of using ODP live data.

    P.S. I know what the problem is with these spiders:

    When Googlebot or another SE spider comes along on my site to index pages, this bot leaves traces in MY server log files. But the bot will use my program to request pages on dmoz.org, so in the log files of dmoz.org, there is no trace of the Googlebot, but instead, it appears that my site is an unknown robot that abuses the dmoz.org server - and they block access to my site :-(




  8. #8
    Moderator Meta windharp is on a distinguished road windharps avatar
    Reg.datum
    apr 2002
    Ort
    Germany
    Inlägg
    4 363

    Re: All sites that use live ODP data are down now...

    At least Googlebot is very obedient regarding the robots.txt files. Since it is unnecessary to spider borrowed content anyway, such should be excluded by default. [My oppinion!]
    ODP Meta Editor windharp
    Wichtige Links: Deutsche ODP FAQ / Deutsche ODP-Richtlinien
    Important Links: English ODP FAQ / English ODP Guidelines


  9. #9
    browser007
    Visitor

    Re: All sites that use live ODP data are down now...

    windharp, I agree with you: spidering "borrowed content" is not quite fair. It's too late now to exclude bots, because my site has no access to dmoz.org anymore.
    Wish I did that from the beginning...

+ Svara på ämne

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Liknande ämnen

  1. search into ODP data - other solutions than live?
    By Awb in forum Using ODP Data
    Replies: 1
    Senaste inlägg: 13-01-2004, 03:20
  2. can sites using ODP data list sites?
    By in forum General ODP Issues
    Replies: 4
    Senaste inlägg: 07-06-2002, 07:39

Bookmarks

Regler för att posta

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts