Welcome to Resource Zone.

RDF dump data, fixed ????

heroine

Member
Joined
Apr 1, 2007
Hi @LL,

I have tried parsing the portion example structure of the RDF files from dmoz (rdf.dmoz.org) but it says : Fail to parse RDF and XML parsing failed.

I am hoping to parse correctly the whole 'structure' but since the file is large, i am using so far the example(a portion of the rdf structure.rdf.u8) provided in the website.

Has anyone managed / fixed to successfully parse the structure example?

Care to explain/share it please...???
I have made some changes to the code but still failed..:mad:

thanks!
 

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
I usually try my tools on the Kids & Teens files. They are much smaller, so it's a lot easier to parse them. I never tried to parse the example though.

Please note that the RDF dump is not valid RDF code, because it was designed at a time when the RDF specification was not yet fully finished. So using a standard RDF parser might throw a lot of errors, but in general the files should be readable.
 

heroine

Member
Joined
Apr 1, 2007
In that case what i shall do then ??? Is there any tool that helps to fix and debug it ?
As my project requires to use the structure and content dump from dmoz. Once this is done, i would have to transfer the files into a relational database.

please help......
 

tschild

kEditall/kCatmv
Curlie Meta
Joined
May 3, 2002
It is valid XML, so you can use XML tools.

Alternatively, you can process the files line by line by your own script (statefully, as you need to know what sort of item the line in question is part of). That's what I do for my own purposes.
 

heroine

Member
Joined
Apr 1, 2007
tschild said:
It is valid XML, so you can use XML tools.

Alternatively, you can process the files line by line by your own script (statefully, as you need to know what sort of item the line in question is part of). That's what I do for my own purposes.
Thanks for the input.....
Which XML tools would you suggest then?
I would take the hardship way of going through line by line....any other alternatives to opt for ? The RDF dump has a lot of errors.... Is there any new fixed RDF dump available now ?

/H
 

brmehlman

Member
Joined
Nov 6, 2002
I've had good luck with a SAX parser in Java. Don't try a DOM parser, the tree is too big.

Code:
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
 

chaos127

Curlie Admin
Joined
Nov 13, 2003
The RDF dump has a lot of errors...
What do you mean by errors? If you're refering to it not being valid RDF, then this is a known feature. The ODP format was decided before the RDF spec was finalised. (The issue is more that it's erroneously refered to as an "RDF dump", when it should really be described as an "XML Data Dump".) If you are refering to other problems, we might like to know about them...

As for tools to use, have you tried looking at https://curlie.org/Computers/Intern...rectory_Project/Use_of_ODP_Data/Upload_Tools/ ?
 
Top Bottom