Forum Settings
Forums

[BUG] Encoding errors in the API response

#1
Aug 17, 2011 3:48 AM

Offline
Joined: Apr 2011
Posts: 7
I noticed that sometimes the XML response contains invalid unicode.

For example this call:

http://myanimelist.net/api/anime/search.xml?q=Onegai+My+Melody

returns as one of the titles: "Onegai My Melody Kirara�".
The actual title should be: "Onegai My Melody Sukkiri♪".

Someone one stackoverflow.com commented that this is probably because the wrong PHP function was used (htmlentities() instead of htmlspecialchars()).

For a more detailed desription see: http://stackoverflow.com/questions/7070111/handling-unicode-in-the-http-response-xml
Modified by StackedCrooked, Aug 17, 2011 3:54 AM
 
#2
Aug 17, 2011 3:51 PM

Offline
Joined: May 2008
Posts: 4068
Yeah the encoding problems need to be addressed. Switching to htmlspecialchars() would certainly be less disruptive to the unicode portions of the data, and use of htmlentities() is the likely culprit. The xml feed is served as utf-8 so there is no reason to convert characters other than: < > ' " & ...and possibly invalid unicode (control characters and such, that should could just be stripped with iconv() or mb_convert_encoding()... or even a regular expression).

see also: http://myanimelist.net/forum/?topicid=289175