MyAnimeList.net

Forums

Recent Posts | My Watched Topics | My Ignored Topics | Search

Search API
MyAnimeList.net Forum »» Club Discussion »» MAL API »» Search API

Must be a Club Member to Reply
Pages (2) [1] 2 »
#1
07-20-09, 10:10 PM
Site Administrator
Offline
Joined: Nov 2004
Posts: 9320
Discuss search API here.
 
#2
07-20-09, 10:30 PM

Offline
Joined: Jul 2007
Posts: 3496
Maybe add the score as well like in the normal search. Maybe also genres? (guess you'd need to do those changes we discussed first)
 
#3
07-20-09, 11:56 PM

Offline
Joined: Jan 2009
Posts: 92
Good job, very neat!

I noticed that you separate synonyms with [;] and that very not XML-like. To present a list of items in XML a very common convention is:

<synonyms>
<synonym>the first</synonym>
<synonym>the second</synonym>
</synonyms>

This is a lot easier to parse too.

As Kotori suggested score will be useful, for example to order the results by score :)
Modified by pigoz, 07-21-09, 12:09 AM
 
#4
07-21-09, 12:16 AM

Offline
Joined: Jul 2007
Posts: 3496
; is much faster to parse than child nodes though, I like it
 
#5
07-21-09, 1:59 AM

Offline
Joined: Jan 2009
Posts: 92
A raw string processing may be faster, but it is not like parsing an XML list is so slow that it need gimmicky solutions to increase performance. Especially using a serial parser like SAX there is no substantial performance difference.

In my humble opinion following standards, convention and best practices is good unless they are unreasonable.
 
#6
07-21-09, 7:46 AM

Offline
Joined: Jul 2007
Posts: 3496
This is just a special case though for myanimelist syns, none of the names contain a ;. It used to be a "," before, which was a problem since some Anime names contain that. Never heard of SAX; I made my own parser for malu since DOM was crap and very slow. Now it takes ~10-20 ms for parsing a list with ~1k entries. It will still be fast with child nodes, but it's a different case for those parsing with php etc.
Modified by Kotori, 07-21-09, 7:53 AM
 
#7
07-21-09, 11:10 AM

Offline
Joined: Jan 2009
Posts: 92
SAX is very fast and has very low memory footprint, it is good to parse an XML file and do something with the data on the fly (for example displaying it or storing it into your local data structure).

DOM is designed to make it easy for you to edit the XML document so it loads all the document in memory and builds a tree data structure from the XML, which of course is overkill if you do not need to edit the original tree.

A regex parser would be faster if you compile the regex, but it is not worth the effort imho.
 
#8
07-21-09, 12:05 PM
Site Administrator
Offline
Joined: Nov 2004
Posts: 9320
Added in <score>

Debating on the <synonyms> thing. I dont' care either way really. Whatever is easiest for you guys.
 
#9
07-23-09, 3:15 AM

Offline
Joined: Jan 2007
Posts: 5
+1 for <synonyms>. it would be more XML and easier to parse. if there's any changes on the delimiters, our apps will continue running without changing anything.
 
07-23-09, 9:29 AM

Offline
Joined: Apr 2009
Posts: 103
The problem with synonyms is that the database is full of junk, including mixed delimiters. As the server has no way to cleanly split that into individual titles it can't transmit those to clients, in any format ^^;

The problem with parsing is that the database is full of junk, including invalid characters. As the server directly dumps that into XML all the fast parsers choke and die. Piping through W3C tidy first works fine.
Modified by Wile, 07-23-09, 9:44 AM
 
08-03-09, 1:18 AM

Offline
Joined: Jan 2009
Posts: 92
The search API is bugged, it returns text with special characters encoded like HTML (stuff like &amp;quot;). This is wrong, it should return special characters encoded as declared by the XML header (in our case UTF-8).

This actually makes the Cocoa XML Parser to fail, unless I clean the xml before. :|
Modified by pigoz, 08-03-09, 2:22 AM
 
08-03-09, 3:16 AM

Offline
Joined: Apr 2009
Posts: 103
pigoz said:
This is wrong, it should return special characters encoded as declared by the XML header

No. There are a few entities XML insists on. The only nuisance is that MAL has them in CDATA as well.

(do more research before using bold ;)

This actually makes the Cocoa XML Parser to fail, unless I clean the xml before.

I doubt it. Probably fails because of all the newlines and tabs outside of tags.

But you should clean anyway. Luckily what I mentioned in my last post is already built in as NSXMLDocumentTidyXML.
 
08-03-09, 3:49 AM

Offline
Joined: Jan 2009
Posts: 92
By cleaning I actually mean I'm using NSXMLDocumentTidyXML.
 
08-03-09, 7:49 PM

Offline
Joined: Jul 2007
Posts: 3496
pigoz said:
The search API is bugged, it returns text with special characters encoded like HTML (stuff like &amp;quot;). This is wrong, it should return special characters encoded as declared by the XML header (in our case UTF-8).

This actually makes the Cocoa XML Parser to fail, unless I clean the xml before. :|


Yep, I noticed when desc. failed to load.. few modifications in my custom parser fixed it.

@ Xinil: if synonyms delimiter is changed to child nodes please notify it :)
 
08-04-09, 6:28 AM

Offline
Joined: Jan 2009
Posts: 92
Actually I read &amp; and similar character are correct in XML, I apologize for the bold and for being rude. Anyway there is still some strange stuff going on in the synopsis field.

for example in http://myanimelist.net/api/anime/search.xml?q=full+metal there is stuff like: &amp;quot;&lt;br /&gt;

where it should be &quot;&alt;br /&gt;
Modified by pigoz, 08-04-09, 8:39 AM
 
08-05-09, 8:56 AM

Offline
Joined: Jan 2009
Posts: 92
Something is wrong here too:
http://myanimelist.net/api/anime/search.xml?q=pokemon

Pokémon encoded as Pok&Atilde;&copy;
 
08-16-09, 5:21 PM

Offline
Joined: Aug 2008
Posts: 1
This keeps getting better. I was wondering if its possible to add anime characters/voice actors to the search list. I was listening to music on rhapsody and it displays random info about the band artists. So i thought wouldn't it be cool to watching an anime and then get to see info on the characters. Just an idea.

<anime>
<title>...</title>
<...etc other data>

<Characters>
<name>...</name>
<alias>...</alias>
<voice>...</voice>
<story> ...</story>
<pic>...</pic>
</Characters>

<Characters>
<name>...</name>
<alias>...</alias>
<voice>...</voice>
<story> ...</story>
<pic>...</pic>
</Characters>

</anime>

Arigato for the nice work Xinil.
 
08-31-09, 6:51 PM

Offline
Joined: Sep 2007
Posts: 8
It does look like the encoding is borked for Japanese characters. For example: http://myanimelist.net/api/anime/search.xml?q=naruto returns badly encoded Japanese characters for anime ID 936.

Does anyone have a workaround for this?

Also, it'd be nice, but not 100% necessary, if the content type returned from the server is "text/xml" instead of "text/html".

Otherwise, the search API is working fine, thanks for that Xinil.
 
09-18-09, 2:12 AM

Offline
Joined: Apr 2008
Posts: 50
Indeed, this issue is bugging me aswell.
The standard XML parsers cry when i try to parse the response. The XML tag says it's UTF-8, but obviousely, it's not.. I'm building a small Java App so any hints on how to (temp.) work around this issue would be nice.

I wouldn't mind helping out on this project allthough my time is limited (Full-time Job as a It-Professional during the day)

So.. just give me a PM,

thnx in advanced
 
10-24-09, 1:16 PM

Offline
Joined: Jul 2008
Posts: 1
Regarding the invalid characters in the XML, the PHP extension Tidy worked out for me.
Although it just converts the obstructing characters into HTML Unicode, it returns valid XML - enough for me anyway.

It utilizes libtidy from the HTML Tidy Library Project (http://tidy.sourceforge.net/), there is also a Java version (http://sourceforge.net/projects/jtidy) linked.
 
Top
Pages (2) [1] 2 »
Help     FAQ     About     Contact     Terms     Privacy     AdChoices