Corruption in content.rdf

Discussion in 'Using ODP Data' started by JasonTimmins, Feb 21, 2009.

  1. JasonTimmins

    JasonTimmins Member

    Joined:
    Feb 21, 2009
    Messages:
    18
    Hi There,

    I'm not sure if this is the right place but I thought I'd mention it anyway.

    My import routine for the DMOZ data is blowing-up with an XML scheme failure around line 26946609 of this week's (2008/02/18) content.rdf file.

    It's to do with an external link to portaljove .com in Top/World/Español/Regional/Europa/España/Comunidades_Autónomas/Comunidad_Valenciana/Educación. The record seems to have two descriptions (one of with is missing it's closing tag) and a second title tag inside one of the descriptions. Anyway, it's a bit of a mess. Can an editor take a look at it?

    Cheers
    Jason
    <URL deleted>
  2. dermotz

    dermotz Member

    Joined:
    Mar 18, 2004
    Messages:
    112
    The DMOZ rdf dump has always contained errors.

    You need to change your software to copy with the corrupted data.
  3. hansfn

    hansfn Moderator DMOZ Meta

    Joined:
    Aug 4, 2005
    Messages:
    22
    dermotz, you shouldn't reply to a thread about a topic you clearly don't have any real knowledge about. The current content.rdf is seriously broken with incomplete/broken/cut-off/mixed elements which no software can fix. I have reported the problem in bugs forum internally in DMOZ/ODP. Let's hope that the sysadmin/developers can fix this quickly.

    PS! I have found one more in World/Norsk/Kunst_og_kultur/Litteratur/Forfattere/Ø/
  4. JasonTimmins

    JasonTimmins Member

    Joined:
    Feb 21, 2009
    Messages:
    18
    Hi There,

    Thanks for the update. This week's content file is much worse than the previous week's. I found seven errors before I gave up fixing them and admitted defeat.

    Let's hope the DMOZ people get it together soon.

    Bye for now
    Jason.

    PS. If it helps, I have the seven broken XML chunks on file.
  5. cmeerw

    cmeerw Member

    Joined:
    Feb 9, 2008
    Messages:
    10
    This week's RDF dump appears to be OK.


    Christof

Share This Page