Search API
MyAnimeList.net Forum »» Club Discussion »»
MAL API »» Search API
#1
07-20-09, 10:10 PM
|
|
|
Site Administrator
Offline Joined: Nov 2004 Posts: 9320 |
Discuss search API here. |
#2
07-20-09, 10:30 PM
|
|
|
Offline Joined: Jul 2007 Posts: 3496 |
Maybe add the score as well like in the normal search. Maybe also genres? (guess you'd need to do those changes we discussed first) |
#3
07-20-09, 11:56 PM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
Good job, very neat! I noticed that you separate synonyms with [;] and that very not XML-like. To present a list of items in XML a very common convention is: <synonyms> <synonym>the first</synonym> <synonym>the second</synonym> </synonyms> This is a lot easier to parse too. As Kotori suggested score will be useful, for example to order the results by score :) Modified by pigoz, 07-21-09, 12:09 AM |
#4
07-21-09, 12:16 AM
|
|
|
Offline Joined: Jul 2007 Posts: 3496 |
; is much faster to parse than child nodes though, I like it |
#5
07-21-09, 1:59 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
A raw string processing may be faster, but it is not like parsing an XML list is so slow that it need gimmicky solutions to increase performance. Especially using a serial parser like SAX there is no substantial performance difference. In my humble opinion following standards, convention and best practices is good unless they are unreasonable. |
#6
07-21-09, 7:46 AM
|
|
|
Offline Joined: Jul 2007 Posts: 3496 |
This is just a special case though for myanimelist syns, none of the names contain a ;. It used to be a "," before, which was a problem since some Anime names contain that. Never heard of SAX; I made my own parser for malu since DOM was crap and very slow. Now it takes ~10-20 ms for parsing a list with ~1k entries. It will still be fast with child nodes, but it's a different case for those parsing with php etc. Modified by Kotori, 07-21-09, 7:53 AM |
#7
07-21-09, 11:10 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
SAX is very fast and has very low memory footprint, it is good to parse an XML file and do something with the data on the fly (for example displaying it or storing it into your local data structure). DOM is designed to make it easy for you to edit the XML document so it loads all the document in memory and builds a tree data structure from the XML, which of course is overkill if you do not need to edit the original tree. A regex parser would be faster if you compile the regex, but it is not worth the effort imho. |
#8
07-21-09, 12:05 PM
|
|
|
Site Administrator
Offline Joined: Nov 2004 Posts: 9320 |
Added in <score> Debating on the <synonyms> thing. I dont' care either way really. Whatever is easiest for you guys. |
#9
07-23-09, 3:15 AM
|
|
|
Offline Joined: Jan 2007 Posts: 5 |
+1 for <synonyms>. it would be more XML and easier to parse. if there's any changes on the delimiters, our apps will continue running without changing anything. |
#10
07-23-09, 9:29 AM
|
|
|
Offline Joined: Apr 2009 Posts: 103 |
The problem with synonyms is that the database is full of junk, including mixed delimiters. As the server has no way to cleanly split that into individual titles it can't transmit those to clients, in any format ^^; The problem with parsing is that the database is full of junk, including invalid characters. As the server directly dumps that into XML all the fast parsers choke and die. Piping through W3C tidy first works fine. Modified by Wile, 07-23-09, 9:44 AM |
#11
08-03-09, 1:18 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
The search API is bugged, it returns text with special characters encoded like HTML (stuff like &quot;). This is wrong, it should return special characters encoded as declared by the XML header (in our case UTF-8). This actually makes the Cocoa XML Parser to fail, unless I clean the xml before. :| Modified by pigoz, 08-03-09, 2:22 AM |
#12
08-03-09, 3:16 AM
|
|
|
Offline Joined: Apr 2009 Posts: 103 |
pigoz said: This is wrong, it should return special characters encoded as declared by the XML header No. There are a few entities XML insists on. The only nuisance is that MAL has them in CDATA as well. (do more research before using bold ;) This actually makes the Cocoa XML Parser to fail, unless I clean the xml before. I doubt it. Probably fails because of all the newlines and tabs outside of tags. But you should clean anyway. Luckily what I mentioned in my last post is already built in as NSXMLDocumentTidyXML. |
#13
08-03-09, 3:49 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
By cleaning I actually mean I'm using NSXMLDocumentTidyXML. |
#14
08-03-09, 7:49 PM
|
|
|
Offline Joined: Jul 2007 Posts: 3496 |
pigoz said: The search API is bugged, it returns text with special characters encoded like HTML (stuff like &quot;). This is wrong, it should return special characters encoded as declared by the XML header (in our case UTF-8). This actually makes the Cocoa XML Parser to fail, unless I clean the xml before. :| Yep, I noticed when desc. failed to load.. few modifications in my custom parser fixed it. @ Xinil: if synonyms delimiter is changed to child nodes please notify it :) |
#15
08-04-09, 6:28 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
Actually I read & and similar character are correct in XML, I apologize for the bold and for being rude. Anyway there is still some strange stuff going on in the synopsis field. for example in http://myanimelist.net/api/anime/search.xml?q=full+metal there is stuff like: &quot;<br /> where it should be "&alt;br /> Modified by pigoz, 08-04-09, 8:39 AM |
#16
08-05-09, 8:56 AM
|
|
|
Offline Joined: Jan 2009 Posts: 92 |
Something is wrong here too: http://myanimelist.net/api/anime/search.xml?q=pokemon Pokémon encoded as Poké |
#17
08-16-09, 5:21 PM
|
|
|
Offline Joined: Aug 2008 Posts: 1 |
This keeps getting better. I was wondering if its possible to add anime characters/voice actors to the search list. I was listening to music on rhapsody and it displays random info about the band artists. So i thought wouldn't it be cool to watching an anime and then get to see info on the characters. Just an idea. <anime> <title>...</title> <...etc other data> <Characters> <name>...</name> <alias>...</alias> <voice>...</voice> <story> ...</story> <pic>...</pic> </Characters> <Characters> <name>...</name> <alias>...</alias> <voice>...</voice> <story> ...</story> <pic>...</pic> </Characters> </anime> Arigato for the nice work Xinil. |
#18
08-31-09, 6:51 PM
|
|
|
Offline Joined: Sep 2007 Posts: 8 |
It does look like the encoding is borked for Japanese characters. For example: http://myanimelist.net/api/anime/search.xml?q=naruto returns badly encoded Japanese characters for anime ID 936. Does anyone have a workaround for this? Also, it'd be nice, but not 100% necessary, if the content type returned from the server is "text/xml" instead of "text/html". Otherwise, the search API is working fine, thanks for that Xinil. |
#19
09-18-09, 2:12 AM
|
|
|
Offline Joined: Apr 2008 Posts: 50 |
Indeed, this issue is bugging me aswell. The standard XML parsers cry when i try to parse the response. The XML tag says it's UTF-8, but obviousely, it's not.. I'm building a small Java App so any hints on how to (temp.) work around this issue would be nice. I wouldn't mind helping out on this project allthough my time is limited (Full-time Job as a It-Professional during the day) So.. just give me a PM, thnx in advanced |
#20
10-24-09, 1:16 PM
|
|
|
Offline Joined: Jul 2008 Posts: 1 |
Regarding the invalid characters in the XML, the PHP extension Tidy worked out for me. Although it just converts the obstructing characters into HTML Unicode, it returns valid XML - enough for me anyway. It utilizes libtidy from the HTML Tidy Library Project (http://tidy.sourceforge.net/), there is also a Java version (http://sourceforge.net/projects/jtidy) linked. |
