MyAnimeList.net

Forums

Recent Posts | My Watched Topics | My Ignored Topics | Search

Special characters encoding in XML
MyAnimeList.net Forum »» Support »» Special characters encoding in XML

#1
06-20-12, 2:43 AM

Offline
Joined: Jan 2010
Posts: 10
Already posted on the Anime Database forum but maybe it was the wrong board to post in...

I've noticed that in some cases the XML returned by the API has misrepresented characters. For example, what should be encoded as ' is encoded as ' and I've seen it happen for " as well which is being encoded as ". I've spotted those two cases in the XML for Wolf's Rain and Rockman.EXE.

This doesn't happen for all the series though. For instance, Mai-Hime also has the quot character in the synopsis field and it's correctly encoded as ".

Should I work around it or is it something you guys can correct?
 
#2
06-20-12, 11:07 AM

Offline
Joined: Aug 2007
Posts: 148
I'm working with an app that uses the API as well, and it's all a mess. As stupid as it is, just try decoding the HTML entities multiple times over, it corrects most of it.
 
#3
06-20-12, 11:40 AM

Offline
Joined: Jan 2010
Posts: 10
Yeah, I'm already doing that on other cases but in here if I try decoding the value & to & this will affect the correctly encoded ampersands as well. Then separating the ones that belong to an encode from the legit ones would be a major pain in the back...

It's not that it can't be done though. I just wanted to see if it was something that could be fixed where fixing is needed. Thanks for the reply though.
 
#4
06-21-12, 9:08 AM

Offline
Joined: Aug 2007
Posts: 148
Are you working with Android? This is what I did:

String html = StringEscapeUtils.unescapeHtml(html);
html = StringEscapeUtils.unescapeHtml(html);
html = StringEscapeUtils.unescapeHtml(html);
... more code to deal with whatever is still left over afterwards if need be.

I'm not exactly sure about the order in which it unescapes the HTML entities, but it seems to work well for me. I still get some garbage left over though, mostly because apparently there are a lot of database entries (particular for synopsis') which contain characters encoded from all sorts of different character sets... What a pain. I literally have to regex replace all of them, hard coded... Which I hate doing.

Bah.
 
#5
06-21-12, 12:43 PM

Offline
Joined: Jan 2010
Posts: 10
No, I'm working with C# on Windows. What I'm doing right now is when I receive the stream with the xml, I convert it to string and parse all the encoded characters that are correctly encoded and decode them. After that I parse the string again for other apparent encodings and remove them. And because of what I have to help me manipulate the xml file, I also have to parse all unencoded apostrophes and replace them with something else that I can identify. That happens because when an apostrophe appears in a title (ie.: Wolf's Rain) it's not encoded.

I am not even bothering with Japanese characters, at least for now. I just parse and remove the code that is supposed to represent them but that's a different issue and I don't think it to be the API's fault.

So yeah, it seems that even in a different platform I share your pain. But I shall persevere! I will not give up on this. If the issue with the ampersand is not going to be resolved on this end, I'll just have to deal with it and work around it on mine.

Also, good luck with your project ;)
 
#6
07-11-12, 10:29 AM

Offline
Joined: Jul 2012
Posts: 1
try read as string

xmlnode.ChildNodes.Item(0).InnerText.Trim();
str = xmlnode.ChildNodes.Item(0).InnerText.Trim()

src : http://csharp.net-informations.com/xml/how-to-read-xml.htm

sarc.
 
Top
Help     FAQ     About     Contact     Terms     Privacy     AdChoices