Перейти к содержимому


- - - - -

Corruption in content.rdf


  • Чтобы отвечать, сперва войдите на форум
4 ответов в теме

#1 JasonTimmins

JasonTimmins

    Member

  • Members
  • 9 Сообщений:

Опубликовано 21 February 2009 - 04:45 AM

Hi There,

I'm not sure if this is the right place but I thought I'd mention it anyway.

My import routine for the DMOZ data is blowing-up with an XML scheme failure around line 26946609 of this week's (2008/02/18) content.rdf file.

It's to do with an external link to portaljove .com in Top/World/Español/Regional/Europa/España/Comunidades_Autónomas/Comunidad_Valenciana/Educación. The record seems to have two descriptions (one of with is missing it's closing tag) and a second title tag inside one of the descriptions. Anyway, it's a bit of a mess. Can an editor take a look at it?

Cheers
Jason

Изменено: makrhod, 21 February 2009 - 08:16 AM
No link dropping in signatures, please. (QC link altered.)


#2 dermotz

dermotz

    Member

  • Moderated Users
  • 56 Сообщений:

Опубликовано 26 February 2009 - 03:49 PM

The DMOZ rdf dump has always contained errors.

You need to change your software to copy with the corrupted data.

#3 hansfn

hansfn

    Moderator

  • Meta
  • 11 Сообщений:
  • Editor Namehansfn

Опубликовано 27 February 2009 - 01:30 AM

dermotz, you shouldn't reply to a thread about a topic you clearly don't have any real knowledge about. The current content.rdf is seriously broken with incomplete/broken/cut-off/mixed elements which no software can fix. I have reported the problem in bugs forum internally in DMOZ/ODP. Let's hope that the sysadmin/developers can fix this quickly.

PS! I have found one more in World/Norsk/Kunst_og_kultur/Litteratur/Forfattere/Ø/

#4 JasonTimmins

JasonTimmins

    Member

  • Members
  • 9 Сообщений:

Опубликовано 28 February 2009 - 12:26 PM

Hi There,

Thanks for the update. This week's content file is much worse than the previous week's. I found seven errors before I gave up fixing them and admitted defeat.

Let's hope the DMOZ people get it together soon.

Bye for now
Jason.

PS. If it helps, I have the seven broken XML chunks on file.

#5 cmeerw

cmeerw

    Member

  • Inactive
  • 5 Сообщений:
  • Editor Namecmeerw

Опубликовано 04 March 2009 - 03:25 PM

This week's RDF dump appears to be OK.


Christof




0 пользователей читают эту тему

0 members, 0 guests, 0 anonymous users