Special characters encoding in XML
MyAnimeList.net Forum »» Support »» Special characters encoding in XML
#1
06-20-12, 2:43 AM
|
|
|
Offline Joined: Jan 2010 Posts: 10 |
Already posted on the Anime Database forum but maybe it was the wrong board to post in... I've noticed that in some cases the XML returned by the API has misrepresented characters. For example, what should be encoded as ' is encoded as ' and I've seen it happen for " as well which is being encoded as ". I've spotted those two cases in the XML for Wolf's Rain and Rockman.EXE. This doesn't happen for all the series though. For instance, Mai-Hime also has the quot character in the synopsis field and it's correctly encoded as ". Should I work around it or is it something you guys can correct? |
#2
06-20-12, 11:07 AM
|
|
|
Offline Joined: Aug 2007 Posts: 148 |
I'm working with an app that uses the API as well, and it's all a mess. As stupid as it is, just try decoding the HTML entities multiple times over, it corrects most of it. |
#3
06-20-12, 11:40 AM
|
|
|
Offline Joined: Jan 2010 Posts: 10 |
Yeah, I'm already doing that on other cases but in here if I try decoding the value & to & this will affect the correctly encoded ampersands as well. Then separating the ones that belong to an encode from the legit ones would be a major pain in the back... It's not that it can't be done though. I just wanted to see if it was something that could be fixed where fixing is needed. Thanks for the reply though. |
#4
06-21-12, 9:08 AM
|
|
|
Offline Joined: Aug 2007 Posts: 148 |
Are you working with Android? This is what I did: String html = StringEscapeUtils.unescapeHtml(html); html = StringEscapeUtils.unescapeHtml(html); html = StringEscapeUtils.unescapeHtml(html); ... more code to deal with whatever is still left over afterwards if need be. I'm not exactly sure about the order in which it unescapes the HTML entities, but it seems to work well for me. I still get some garbage left over though, mostly because apparently there are a lot of database entries (particular for synopsis') which contain characters encoded from all sorts of different character sets... What a pain. I literally have to regex replace all of them, hard coded... Which I hate doing. Bah. |
#5
06-21-12, 12:43 PM
|
|
|
Offline Joined: Jan 2010 Posts: 10 |
No, I'm working with C# on Windows. What I'm doing right now is when I receive the stream with the xml, I convert it to string and parse all the encoded characters that are correctly encoded and decode them. After that I parse the string again for other apparent encodings and remove them. And because of what I have to help me manipulate the xml file, I also have to parse all unencoded apostrophes and replace them with something else that I can identify. That happens because when an apostrophe appears in a title (ie.: Wolf's Rain) it's not encoded. I am not even bothering with Japanese characters, at least for now. I just parse and remove the code that is supposed to represent them but that's a different issue and I don't think it to be the API's fault. So yeah, it seems that even in a different platform I share your pain. But I shall persevere! I will not give up on this. If the issue with the ampersand is not going to be resolved on this end, I'll just have to deal with it and work around it on mine. Also, good luck with your project ;) |
#6
07-11-12, 10:29 AM
|
|
|
Offline Joined: Jul 2012 Posts: 1 |
try read as string xmlnode.ChildNodes.Item(0).InnerText.Trim(); str = xmlnode.ChildNodes.Item(0).InnerText.Trim() src : http://csharp.net-informations.com/xml/how-to-read-xml.htm sarc. |




