Formulas behind the graphs
MyAnimeList.net Forum »» Club Discussion »»
MALgraph 3 »» Formulas behind the graphs
Must be a Club Member to Reply
#1
03-08-12, 11:28 AM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
I wanted to elaborate a bit about the formulas we used, hence this topic. Before all, here is "explanation" of common conceptions. This didn't make its way onto the MALgraph itself, since it's rather self-explaining, but I added this here for the sake of completeness. (Sorry for misaligned images.) we mean set of all titles owned by given user. , where ; all of them are pretty much self-explaining. we mean that user didn't rate given title.Score distribution Involves: completed, completing, dropped, on-hold; rated and unrated. This is the most basic graph. It renders your score distribution in form of a bar chart. Each bar's width is proportional to the following: ![]() ...where: ![]() So it is equal to how many entries have given score. Technically it is then divided by and multiplied by constant to make bars proportional to some pixel width.Personally I think the more titles user has watched, the more this graph should look like Gaussian curve. (Then again a lot of users unlike myself don't bother to add crap.) In any case, I have thought about including Shapiro-Wilk test. It is used to check whether some distribution is significantly different from normal distribution. In the end we decided to drop the idea, since it could result in flame and wouldn't have very meaningful result (just a boolean - I don't find booleans very exciting). Score to time distribution Involves: completed, completing, dropped, on-hold; rated and unrated. In my opinion this is the second most basic graph. It renders how much time you spent on watching titles that you rated with given score. What's the point of this? Perhaps our alpha title for this graph should be helpful - "How much time do you waste on crap". As a rule of thumb, . So we use this formula:![]() where: ![]() Whether this should or shouldn't look like Gaussian curve when you force yourself to watch every is hard to answer and depends on too many factors for me to comprehend (for example, 1990s were full of long-ass 50+ ep series, which can be interpreted in multiple ways). Probably yes. With 10% significance. :PFavorite decades Involves: completed, completing, dropped, on-hold; rated only; having aired year specified only. So I wanted to know what decade I like the most. (To be honest, I suspected 1990s for SM, NGE and DBZ. Nope.) So we made this line chart, which uses following: ![]() ...where: ![]() In other words, we compute mean score of titles from every decade user has ever watched anything from. At first we were also going to make some weighting. We thought about title cummulative duration or amount of titles user watched. But after a short while, we realized that there are at least twice as much english-subbed anime available from 2000s as from 1990s (not to say 1980s). That explains why we didn't like weighting using title amount. As for show duration, there are very little shows from 2000s that have 50+ episodes (counting out never-ending singletons like One Piece)... at least comparing to 1990s and 1980s. There is also this idea where we combine both of above: few lengthy titles from 1990s vs lots of short titles from 2000s should result in draw, right? Well, as far as it theoretically works, it ridiculously violates KISS principle. So we decided to stick with just mean scores. Status distribution Involves: completed, completing, dropped, planned, on-hold; rated and unrated. This one is rather self-contained and there's not much to talk about. We just count how many titles have given status and present it in form of a pie chart. Favorite genres Involves: completed, completing, dropped, on-hold; rated only. This is my favorite. Each genre has associated an internal score; we sort them by that score and present the top and bottom 10 genres to the user. This is done using: ![]() ...where: ![]() So, we take user's mean score of all titles ( But this is equal to So why do we put it this way? I did it this because I wanted to make genre scores (so far equal to Now, why the weighting? Imagine that user has watched 300 titles tagged with "demon" and one tagged with "angel". Obviously "demons" are more meaningful than "angels". Now, how we should interpret this? We took following facts into our consideration: Taking into account everything above, we decided to stick with simple approach: let's use only means (global and genre-specific); then let's make sure that the less user has watched from given genre, the less "autonomous" his "opinion" (genre-specific mean score) is. That's how we ended up with the final formula. The rest was just finetuning of (The difference between We also considered following: This had many flaws pointed out in topic discussion. ![]() I don't remember exactly why, but in the end it turned out to be rather crappy. Possibly for similar reasons as above. ![]() This one behaved almost exactly like your everyday arithmetic mean. part because of loldivisionbyzero for best genres.Finally, there was this: ![]() ...and this one: ![]() ...but these worked exactly opposite to what we wanted to achieve. So yeah, square root. As a final note, I'd like to add that even though our internal score distribution for these genres is non-linear, we still display it like it was linear (imagine best genre scored with 300, next one with 20 and then 19 - they will be displayed successively with 16pt, 15pt and 14pt font). Episode count chart Involves: completed only; rated and unrated. Here we come up with few length thresholds (1 episode, 2-5 episodes and so forth); we count how many titles fall into each group and then we present it to the user in form of a pie chart. Favorite studios Involves: completed, completing, dropped, on-hold; rated only; having studio specified only. Everything in this chart is computed exactly like in favorite genres. Timeline Involves: completed only; rated and unrated; having start and finish year specified only. Like most of graphs, here we create thresholds in form of year-month and count how many titles were completed in given month. There was idea to make duration-based weighing, but it would be unintuitive. We may add it some day as another series which would be rendered in the same chart. I considered removing current / active month from this graph, but I figured some users may find it useful and want to compare their performance from current month with performance from last month, so I left it intact. Suggestions Involves: completed only; rated and unrated. Not much to talk about. We look at user's completed titles, sort them by rating and display all related titles that aren't on user's list. This may change some day and we'll base it on user recommendations, but if that were the case, we'd probably implement another chart instead of removing existing feature. Modified by rr-, 04-03-12, 12:07 PM |
#2
03-31-12, 3:31 AM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
[edit]This is response to kFYatek's post, which I accidentally deleted 。・゚・(ノД`)・゚・。. He suggested to modify formulas for genre and producer value calculation, namely replace: with: and provided short analysis how this would work.[/edit] Your point is obviously correct: the difference gets amplified, not the score. This is intentional. But, de facto, your suggestion makes things kind of worse. Consider following: According to your suggestion: Now this would be interpreted like this... user loves demons and hates angels as well as humans. In fact, angels are even more hated than the humans, even though they received 7 in contrast to hundred of 1. In reality I see it like this: demons are somewhat liked since he keeps watching them, angels are rather meh (but might become point of interest), humans are mega crap. The point is that 'least liked' genres would cease to make sense, since they would contain 'most meh' or 'least watched' genres alongside 'most hated' ones (<-- I added "humans" in order to demonstrate that). The first thing that comes to my mind that could fix that is that we could use two functions, one for calculating most liked and second for most disliked genres... for example, like this: where... But this means using two separate distance functions for the same thing. Crazy. That's why I designed it like this. This isn't perfect, but IMO it's alright enough. Disliked genres are below 'zero' (which is global mean), liked genres are above zero and we can scale it however we want. The more data you have, the better results you get. Modified by rr-, 04-03-12, 12:17 PM |
#3
03-31-12, 11:10 AM
|
|
|
Offline Joined: Jul 2009 Posts: 288 |
I messaged a few people (too few for it to count as a representative group) asking which version of favorite genres seemed more accurate for them. It ended up in a perfect tie. >_> We'll stick with the "old" formula for now. Thanks for the suggestion. |
#4
03-31-12, 7:29 PM
|
|
|
Offline Joined: Apr 2009 Posts: 147 |
Well, I know that the formula I wrote wasn't perfect, either. But I had another proposal in my mind already. How about this?: ![]() The 10 is here to make it the same scale as the first term, so that both terms account to the final score in equal halves. This way, we can also account for unrated titles (a person who watches tons of demon despite not rating it, pretty much loves it, doesn't he?). I haven't tried it in practice, so maybe some fiddling with it (like adding weights or square-rooting something) would make it better. ![]() |
#5
04-01-12, 8:39 AM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
Again, this greatly punishes groups that are small in size by boosting the others :( However, your post inspired me to think about it some more and this morning I came up with something new. So I asked a question: "What should we do, when there's no sufficient data?". The answer: extrapolate it! That's how I came up with this: 1. Genres that have relatively few titles should tend to global average score. 2. Genres that have relatively many titles should tend do their own average score. 3. [bonus] Treat unrated entries like they were rated with global average score. It somewhat realizes the goals. It doesn't work extraordinarily well with lists containing vast amounts of unrated entries, but... oh well. Formally, it goes like this... ...where: Originally weight was going to be just: ... but I wanted to have some kind of "logarithmic" scale (like, the more titles, the less difference it makes), hence the square in final equation. Note that since After fri extensively checked it, he said it's good, so we're probably going to use it in next release. |
#6
04-01-12, 1:20 PM
|
|
|
Offline Joined: Jul 2009 Posts: 203 |
It might be a little bit off topic , but am I the only person in this club who has no idea what those above are talking about ? xD |
#7
04-01-12, 7:44 PM
|
|
|
Offline Joined: Apr 2009 Posts: 147 |
@loskierek: Judging from how I'm the only one besides the admins who dared to speak in this topic, I think you're in the majority, in fact. @chrupky: First, about my proposal: chrupky said: Again, this greatly punishes groups that are small in size by boosting the others :( I thought this was the idea all along the way. If someone watches tons of eg. mecha even despite rating it as crap, he must really love that genre, doesn't he? But anyway, thanks for getting inspired by me. We'll see how the updated formula will work out in practice. My profile doesn't seem to be updated yet, so I don't really know yet. But it certainly feels much better, judging from calculations on that demon/angel example. [EDIT] My profile got updated, and I don't really see too much difference... but well, the results seem more or less reasonable when I look at the tooltips. [/EDIT] PS Czemu ja się dopiero teraz zorientowałem, że tu sami rodacy? ;) Modified by kFYatek, 04-02-12, 4:54 AM ![]() |
#8
04-02-12, 11:03 AM
|
|
|
Offline Joined: Mar 2010 Posts: 420 |
ok, the results yielded by the new formulas seem a lot closer to what I was expecting now. Great job! |
#9
04-02-12, 4:52 PM
|
|
|
Offline Joined: Mar 2009 Posts: 474 |
A clarification on the syntax, is max g' a binding of g' to the largest genre in the genre calculation? If that is the case it's possible that there's an error in your calculation. Using the given count and genre mean in the hover text doesn't give me the same results following your calculation. Also don't you think it might be better to take the square root (or to the power of 2/3) of the genre ratio in your weights. As lists grow in length they'll tend towards common genres such as comedy rather than favourite genres excessively punishing smaller less common genres. Discard the above if my reading of max g' is incorrect. |
#10
04-02-12, 11:02 PM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
Your reading of g' is correct, the whole max part gets largest size of any genre group from user's titles. If your calculations mismatch, it means you're doing it wrong. Note the fact that largest group might not even be visible (for example, it can have perfect average score). We already tested changing exponent to 0.5 (as well as few other The graphs above show how group size affects the weight, which affects final score (x). If weight = 1, x will be equal to mean score of user titles for given genre, if weight = 0, x will be equal to global user's average. Further special treatment of outliers is going to be awkward. I was mad with previous models when I kept seeing 'cars' as 'hated' even though I have watched only 'Redline' and 'Tailenders', so yeah. Modified by rr-, 04-03-12, 8:38 AM |
#11
04-03-12, 5:29 AM
|
|
|
Offline Joined: Mar 2009 Posts: 474 |
That wasn't my intended suggestion. My intended suggestion was http://www.wolframalpha.com/input/?i=plot+1-%281-%28x%5E%282%2F3%29%29%29%5E2%2C+x%3D0..1 Larger lists will always tend towards containing a large number of frequently found genres such as comedy and romance which can be found in conjunction with almost all other genres. The existence of a particularly large genre is unreasonably punishing to other smaller genres. An example from my graph. I have 266 series in the comedy genre, even if this is not my largest genre I would need to watch ~75 of a genre for the evaluated score to be the midpoint of the global and genre means. Considering the Game genre in my list, it has the highest mean of all genres with 1.44 points above my global mean. To get it's evaluated score to be the midpoint I would need to watch 75 of the 83 MAL entries with that genre. tl;dr Common genres unreasonably skew results on large lists. I'm still getting a calculation mismatch though, and it's irrelevant if the largest group isn't visiible, if the largest group is even bigger the result is even more incorrect. If I chuck the following into a Haskell interpreter using the values given for the Game genre from my graph let w = 1-(1-(10/266))^2 in 6*(1-w) + 7.44*w I get 6.10623551359602 as the output, leaving me to consider either the tool tips to give incorrect results or your implementation to be wrong as this doesn't match your evaluated result. |
#12
04-03-12, 8:16 AM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
Your point is? I'll write this again: we are already doing this, using similar weighting to the one you proposed. See for yourself that it doesn't make much difference: There is also this, which takes above approach to the extreme: http://www.wolframalpha.com/input/?i=plot+%281-%281-x%29%5E2%29%5E0.5%2C+x%3D0..1 (it is basically circle placed in (1,0) with radius 1): We're gonna test them again... As for the mismatch... Modified by rr-, 04-03-12, 9:16 AM |
#13
04-03-12, 9:30 AM
|
|
|
Offline Joined: Jul 2009 Posts: 288 |
So yeah, we tested it again on a few users, starting with 2/3 and going up to 1. It seems that the constant 8/9 yields the best results, boosting small groups a bit while not making it illogical (e.g. there were two series tagged as "Police", and (2/3) boosted it to 2nd place, while (8/9) kept it a little lower, but still higher than before, because those two series were rated with 8s). |
#14
04-03-12, 10:26 AM
|
|
|
Offline Joined: Mar 2009 Posts: 474 |
Thanks for the extra testing. Was the mismatch an actual error then? Or a mistake on my end regarding the mean? |
#15
04-03-12, 10:43 AM
|
|
|
Offline Joined: Nov 2010 Posts: 47 |
It was an error on our end, but it wasn't very serious; formula remained the same. (The way we calculated average score was wrong.) |
we mean set of all titles owned by given user.
, where
; all of them are pretty much self-explaining.
we mean that user didn't rate given title.

and multiplied by constant to make bars proportional to some pixel width.
. So we use this formula:

is hard to answer and depends on too many factors for me to comprehend (for example, 1990s were full of long-ass 50+ ep series, which can be interpreted in multiple ways). Probably yes. With 10% significance. :P





part because of loldivisionbyzero for best genres.




