Welcome to Resource Zone.

are foreign characters broken in categories ?

JeanLucDmoz

Member
Joined
Sep 29, 2010
Hi,

I downloaded http://rdf.dmoz.org/rdf/structure.rdf.u8.gz and http://rdf.dmoz.org/rdf/categories.txt (and other files that contain DMOZ categories), but all foreign characters are replaced by one or two question marks.

Here is an example of what I get :
Code:
<altlang r:resource="French:Top/World/Fran??ais/Arts/Audiovisuel/Animation"></altlang>
where I expect
Code:
<altlang r:resource="French:Top/World/Français/Arts/Audiovisuel/Animation"></altlang>
I inspected the binary content of the file and it really contains hexadecimal 3F where there is a question mark. So I guess this is not a matter of encoding method.

This problem does not exist with the sample at https://curlie.org/docs/en/rdf/structure.example.txt .

As I am new with ODP data, I could have misunderstood something. Please help me sort this out.

Jean-Luc
 

JeanLucDmoz

Member
Joined
Sep 29, 2010
I have downloaded http://rdf.dmoz.org/rdf/archive/2010-09-02/categories.txt . This archived version does not have the above problem with international characters.

So there is a bug in the latest release. Any idea when a solution can be expected ?

Thank you.

Jean-Luc
 

chaos127

Curlie Admin
Joined
Nov 13, 2003
Yes, we're already aware of this problem. It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
 

JeanLucDmoz

Member
Joined
Sep 29, 2010
Thank you for your answer.

I noted that the version dated September 26 (the one where I discovered the problem) has been replaced by a version dated October 3, but the international characters are still broken. :(

Jean-Luc
 

JeanLucDmoz

Member
Joined
Sep 29, 2010
It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.

Jean-Luc
 

Elper

Curlie Admin
RZ Admin
Joined
Sep 15, 2004
International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.
Jean-Luc
For similar reasons that cars get recalled I expect ;)
The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)
 

JeanLucDmoz

Member
Joined
Sep 29, 2010
The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)
Thank you.

The latest content.rdf.u8.gz I see in http://rdf.dmoz.org/rdf/ is dated October 17 and it still contains Fran??ais and M??t??o where I expect Français and Météo. :(

Jean-Luc
 

Elper

Curlie Admin
RZ Admin
Joined
Sep 15, 2004
A new RDF (supposedly free of the utf-8 issue) has been published. (19[sup]th[/sup] October) :)
 
Top Bottom