Top Anime: Adjustment to the Weighted Ranking System [Waived]

Feb 27, 2016 6:12 AM

Offline

May 2013

1289

I'd like to see arguments about why this show is unfair to certain types of shows (which the current system is) instead of just people whose ego is hurt by this suggestion for whatever unfathomable reason. Your individual opinion can be fully represented by your list (unless you randomly decide not to include some stuff making it not representative, which is up to the individual) so lamenting that YOUR individual opinion isn't ALSO strongly represented in a weighted average score taken from 3 million users is just nonsensical. Because it never is. And it shouldn't be. Top lists are about the shows, not about the users.

You guys keep dismissing the arguments against this as if they didn't have any value. Even if you ignore all the "influence" arguments, which are not really "influence", their point is quite different, there have been multiple arguments that are actually explaining why this is unfair to many shows but they haven't really been addressed by the supporting side.

For once, I'd like to see an argument supporting this system, that is not just based on the desire for a different system.
I don't see how you can keep supporting a system that completely disregards context when acquiring data. That's a red flag in any kind of survey, which this basically is, since it takes data from many people.
It arbitrarily decides which data should have more weight and at times just strait up changing them. How does that even remotely sound fair to all shows to you? It's more unfair than the current system by a large margin, i might add, and you guys keep supporting it while claiming that it is more fair because it gives older shows a chance. Which is blatantly false; you are basing this on what ends up on the top lists, rather than how the math works. Does liking the end result justify the means? Imo, no.
And that is why I'm against this method. I'm for the idea, but against this particular execution.

Also, how is it not about users? What you're saying is factually wrong. Top lists, only exist because of the users, take the users away and even this weighted ranking system has no results. Top lists don't just appear out of thin air. There is not a single top list on planet earth that is not about the users, when it's results come from data of many users. I dunno where you guys see top lists about shows without users but I'd certainly like to see that.
It's like claiming the elections were never about the voters but about the candidates.
The only kind of top lists that don't rely on massive amounts of votes are those made by one or a small team of people, and almost every time those are disregarded.

Furthermore, myanimelist is not a site for critics, it is based around it's userbase which is mostly filled with casual anime fans. Claiming that just one rating doesn't make a difference and it shouldn't is not really true as I explained above. It does make a difference and it is why this system manages to get those results, because it is corrupting the data one by one.

The top lists derive from the opinions of users. I don't see how allowing the system to corrupt the data before getting the results is even remotely acceptable.

Feb 27, 2016 7:41 AM

#202

Jappy

Offline

Apr 2012

Eucli said:
You guys keep dismissing the arguments against this as if they didn't have any value. Even if you ignore all the "influence" arguments, which are not really "influence", their point is quite different, there have been multiple arguments that are actually explaining why this is unfair to many shows but they haven't really been addressed by the supporting side.

For once, I'd like to see an argument supporting this system, that is not just based on the desire for a different system.
I don't see how you can keep supporting a system that completely disregards context when acquiring data. That's a red flag in any kind of survey, which this basically is, since it takes data from many people.
It arbitrarily decides which data should have more weight and at times just strait up changing them. How does that even remotely sound fair to all shows to you? It's more unfair than the current system by a large margin, i might add, and you guys keep supporting it while claiming that it is more fair because it gives older shows a chance. Which is blatantly false; you are basing this on what ends up on the top lists, rather than how the math works. Does liking the end result justify the means? Imo, no.
And that is why I'm against this method. I'm for the idea, but against this particular execution.

Also, how is it not about users? What you're saying is factually wrong. Top lists, only exist because of the users, take the users away and even this weighted ranking system has no results. Top lists don't just appear out of thin air. There is not a single top list on planet earth that is not about the users, when it's results come from data of many users. I dunno where you guys see top lists about shows without users but I'd certainly like to see that.
It's like claiming the elections were never about the voters but about the candidates.
The only kind of top lists that don't rely on massive amounts of votes are those made by one or a small team of people, and almost every time those are disregarded.

Furthermore, myanimelist is not a site for critics, it is based around it's userbase which is mostly filled with casual anime fans. Claiming that just one rating doesn't make a difference and it shouldn't is not really true as I explained above. It does make a difference and it is why this system manages to get those results, because it is corrupting the data one by one.

The top lists derive from the opinions of users. I don't see how allowing the system to corrupt the data before getting the results is even remotely acceptable.

I'd like to see a decent number of examples alongside the general reasoning when discussing if it's unfair or not. I don't think it would be too difficult to find them but it gives you a lot more credibility and allows you to go further in-depth with the point you'd be making. Also helps those who do not see it as being unfair when responding.

Desire for an accurate system derived from the context of a user's personal list which you somehow interpret as having disregarded context and 'arbitrarily deciding which data should have more weight and straight up changing them'. I asked you to elaborate on what you meant by context before and here you mention context again and I really am not sure what context you're getting at here. From what s1rnight posted, context is considered in the form of % of a rating in any user's given list. Even if you think that is not enough, it is certainly not being arbitrary (wouldn't hurt to post your own idea of context if this is the case instead of just using the word 'context' here and there). I've never seen it as giving older shows a chance, but putting all shows on the same playing field which is impossible with the current rankings - this will never change. It just so happens that a lot of these shows happen to be old. Ends vs means debate... I don't think it applies here as there is no debate. This is an experimental process, you keep changing the mathematics behind it (including contexts which would provide different starting points for any given formula) until you reach the most optimal result - thus justifying the means. The question then becomes what is the 'optimal result' which I say is having all shows on equal footing. That if a user was to watch a given show, a weighted system would 'pay-out' at a much more reliable rate than the MAL rankings. Something that I see the site as having already achieved. If it has that much more room for improvement, it goes to show how unreliable MAL is once you get into the 3 digit rankings.

The user base being mostly casual doesn't have an impact really. They'll watch shows based on friends' recommendations, genres, popularity, status, currently airing etc. Sure, they might sometimes watch something based on its rank but it's mostly irrelevant. Then there are critics/elitists/whatever term you want to use who just avoid the rankings in the first place. Who exactly is actively supporting the rankings then?

It's common sense that if User X is removed from the picture in a user base this large, there is no observable change in the rankings. No one user has that much influence. Of course if a number of users are removed then there will be changes.

Feb 27, 2016 8:40 AM

#203

Delenai

Offline

Nov 2014

406

@Jappy

but putting all shows on the same playing field which is impossible with the current rankings

Why?

until you reach the most optimal result

So what is this? I asked before but somehow noone willing to answer.

Feb 27, 2016 8:46 AM

#204

Eucli

Offline

May 2013

1289

Jappy said:

I'd like to see a decent number of examples alongside the general reasoning when discussing if it's unfair or not. I don't think it would be too difficult to find them but it gives you a lot more credibility and allows you to go further in-depth with the point you'd be making. Also helps those who do not see it as being unfair when responding.

Desire for an accurate system derived from the context of a user's personal list which you somehow interpret as having disregarded context and 'arbitrarily deciding which data should have more weight and straight up changing them'. I asked you to elaborate on what you meant by context before and here you mention context again and I really am not sure what context you're getting at here. From what s1rnight posted, context is considered in the form of % of a rating in any user's given list. Even if you think that is not enough, it is certainly not being arbitrary (wouldn't hurt to post your own idea of context if this is the case instead of just using the word 'context' here and there). I've never seen it as giving older shows a chance, but putting all shows on the same playing field which is impossible with the current rankings - this will never change. It just so happens that a lot of these shows happen to be old. Ends vs means debate... I don't think it applies here as there is no debate. This is an experimental process, you keep changing the mathematics behind it (including contexts which would provide different starting points for any given formula) until you reach the most optimal result - thus justifying the means. The question then becomes what is the 'optimal result' which I say is having all shows on equal footing. That if a user was to watch a given show, a weighted system would 'pay-out' at a much more reliable rate than the MAL rankings. Something that I see the site as having already achieved. If it has that much more room for improvement, it goes to show how unreliable MAL is once you get into the 3 digit rankings.

The user base being mostly casual doesn't have an impact really. They'll watch shows based on friends' recommendations, genres, popularity, status, currently airing etc. Sure, they might sometimes watch something based on its rank but it's mostly irrelevant. Then there are critics/elitists/whatever term you want to use who just avoid the rankings in the first place. Who exactly is actively supporting the rankings then?

It's common sense that if User X is removed from the picture in a user base this large, there is no observable change in the rankings. No one user has that much influence. Of course if a number of users are removed then there will be changes.

I thought I'd already developed my point enough in previous posts. Maybe I'm confusing pm's though. Anyway, I'll briefly try to explain what I mean by context.

Bear with me here, since I'm generalizing to give the basic idea, don't take everything too literally.

When someone is introduced to anime, chances are they are gonna begin with something almost universally accepted as at least good. Following that, their next few encounters with the medium are also likely to be of the same type, something at least good. They are probably getting recommendations from their friends, a top list they saw on the internet, maybe even MAL's top lists. In my experience, apart from irregular recommendations from friends, chances are the vast majority of their first few anime are gonna be actually at least good. So having a mean score of 8 or 9 is not really that they are relentlessly overrating, but what they are watching, is, at least to them, very good. Now, I'm not trying to imply that they are NOT overrating at all, but that the difference on the scores, is likely to not be that much even if they rate much more harshly.

So for example, someone who has seen 10 anime, has a mean score of 8.5 and has rated anime "A" an 8, is likely to keep that anime an 8 or even drop it down to a 7 or even a 6 if they start being reaaaaly stricter on their ratings down the way. However, this system will interpret that 8 as a below average (maybe a 2 or a 3 at best), since it's "below the average of this particular user's average encounter with anime", ignoring the context of this guy's watching and picking patterns. This guy has watched 10 of them, and chances are they are all at least good since his picking criteria have already been narrowed down by the various sources he is taking them from.
For example if he was taking them from MAL's 100 top anime, he is likely to pick good anime. Imo the majority of people can agree that at least most of the ones in the top 100 deserve to be called at least good, even if you don't agree with their current ranking.

Now, imo, this can continue until this user starts developing tendencies to dig deeper to find other stuff, but will likely still not encounter that many anime that he thinks are bad, since almost anybody has at least some slight ability to pick stuff they are gonna like.

Adding to this, I'd like to say that there is no definite number of anime one has to go through until his ratings can be taken as not overrating. This guy could continue to watch up to 100 title, but his score could still remain above 8, since there is are so many factors to consider. half of those titles could be sequels or movies or ova to anime he already liked for example, maintaining that high mean score. He could have some habit of not rating the stuff he dislikes, which I think I saw somewhere being mentioned, and this decision would actually negatively affect his other ratings, just because the system is based on his average encounter, and this guy decided to not rate bad things. You could counter this example, but just think that this system is trying to get an opinion from each user that has a rating above 5 and converted to a universal system(because this guy's data obviously shows that because this is rated 7 and he has a mean of 7.5 this is bad according to his average encounter) where everything below his mean score, is considered below the average. This is also ignoring context which by the way the current system due to it's simplicity doesn't do.
Basically, what I'm trying to say, is that an anime being below/above a random user's mean score, shouldn't be translated as below/above the average(in general), because those two are very very very different things.
There are so many factors to consider before even makings such assumptions that if the system isn't at least solid in it's calculations, data are going to be unusable. Which is what is happening here. The only difference is that it displays that data regardless of that.

It's not a matter of context being "enough". Context isn't countable. It's just context. You can't have enough or little context .You just have it or you don't. The % in this case has been misunderstood as context while in reality it is not. It is a random number that it has very little to do with the actual context.
The examples above are just a margin of the whole thing. it is so complicated that the more you are trying to account for it the more complicated it gets. That's why the current system is actually not that bad.

Now, As far as the current system goes, of course it could use many changes, BUT, even if this system too is, in a way, ignoring context, it is ignoring it from everywhere cancelling each aspect of it out. The system proposed, is ignoring context only when taking into account high ratings, with high mean scores, and benefiting low ratings with low mean scores. This happens mainly because the matter at hand is a top list, and not a bottom list, and because frankly most people don't care what is at the bottom, only at the top.

The supporting side of this system has mentioned multiple times that the current system is not fair to all shows. Personally I don't really see why it's that unfair, so If someone doesn't mind, I'd like you to elaborate on this.
(I get why one could say that it's unfair to older shows, not obscure ones, in a way. Still, from my perspective it is impossible to continue to give the same opportunities to older shows, if you can think of a way to do that, I'm all ears.)

This is an experimental process, you keep changing the mathematics behind it (including contexts which would provide different starting points for any given formula) until you reach the most optimal result - thus justifying the means. The question then becomes what is the 'optimal result' which I say is having all shows on equal footing.

I agree with this, but what is happening right now is not an experimental process, but a debate. There is no changing of the mathematics so far, and I don't think that will change, since this thread is discussing this particular approach of the user who mentioned the system. It would be an experimental process, if Xinil just showed up and said, "we're looking to implement a new ranking/rating system, how do you think this system should be? " And then a bunch of users tried a bunch of different formula until the result was satisfying. Here we are just debating if we want this system or not, not how the system should be.

The user base being mostly casual doesn't have an impact really. They'll watch shows based on friends' recommendations, genres, popularity, status, currently airing etc. Sure, they might sometimes watch something based on its rank but it's mostly irrelevant. Then there are critics/elitists/whatever term you want to use who just avoid the rankings in the first place. Who exactly is actively supporting the rankings then?

It's common sense that if User X is removed from the picture in a user base this large, there is no observable change in the rankings. No one user has that much influence. Of course if a number of users are removed then there will be changes.

As I explained, the user basebeing mostly casual is actually really important. And The "critics" I mentioned above has a different meaning than elitists. I meant that as actual critics. Because this system wouldn't be half bad if the users where actually only critics. Kind of like a more complicated rotten tomatoes system.

Of course, Just one user being removed doesn't have much impact. Personally I wouldn't really complain if this system actually removed people. But it's not what it does. Instead of just disregarding data from casuals for example, it is changing them and it produces false results. While if it just ignored them we would get a much more understandable result. Still not valid, but better.

EucliFeb 27, 2016 8:57 AM

Feb 27, 2016 10:06 AM

#205

Jappy

Offline

Apr 2012

Delenai said:
@Jappy

but putting all shows on the same playing field which is impossible with the current rankings

Why?

until you reach the most optimal result

So what is this? I asked before but somehow noone willing to answer.

I'm convinced that you're either just trolling or have poor comprehension. You straw-man Pullman even though they explicitly argued the opposite of what you implied they said, you ask what the target is which I interpret as you trying to bait those in favour of this proposal as posting a list of favourites in some sort of trap, then you make this post. The optimal result is literally not only right above the second quote, but right before you end the second quote as well. As for why not all shows are on equal footing, read the first page or two.

@Eucli I get what you're saying with context and I guess I phrased it poorly. Not context but the information which leads to having a full understanding. What I'm saying is you don't need all of the information of how the user's scores came to be an work within each other. Not only that, but a full understanding is unattainable so you will have to settle for less than perfect. Still don't agree with the % of a rating being irrelevant though.

The current system is fair in a world where everyone watches every show... kind of - although rankings are primary, the ratings would probably need more significant figures but I digress. I don't think the distinction between old shows and obscure shows is that big. There's a pretty big overlap between the two and there are outliers. e.g. Hyouge Mono is recent and obscure while NGE is an older yet still popular title.

If there was such a user that has a strict rating system but also only shows that met their own high standards then yes, that's a case where the system would fail in its purpose - although rare. Off the top of my head you could change the formula to consider the score of a show in addition to the % of a rating thing it has going on currently, or start from scratch if you want. You can use the MAL score, a weighted score, maybe even mean score but only considering users with more than 50D duration etc. Implementing these things is not my forte though, and in a way you're right. It's not like there's much that can be done in this thread. I'm not even sure if suggestions were made here that s1rnight could actually obtain the data and test the suggestions made.

Feb 27, 2016 11:32 AM

#206

Delenai

Offline

Nov 2014

406

@Jappy
Yeah I saw it, and you still not said shit.
Why do you think the current one is not give every anime equal footing?
From what you said, its obvious, you have no fcking idea how it should look, or what it should be. Nobody in their right mind would use the word "accuracy" on something that they do not know. But you are still saying the same shit over and over again without explaining yourself.
You have no proof that the current system is "inaccurate" but you still claiming it is.

Every other thing what you believe I did or not is pretty much irrelevant.
BTW if you look a little up you could see that Pullman is still stating that popularity give the anime advantage. And he did it way before I pointed it out too. So what are talking about.

Feb 27, 2016 11:42 AM

#207

jal90

Offline

Oct 2010

11839

I am not against the idea of weighing ratings, but I am definitely and very deeply against the idea of accepting this weighted rank system as the only alternative. And yes, it is personal. There seems to be a grudge against personal input here when this is all about how much each one's rating contributes to a show's average. I want my 8s to count as 8s because that is what I thought the show was worth when I rated it. Or my 5s to count as 5s. Or my 2s to count as 2s. Otherwise rating here does NOT make sense.

As I say every time there's a suggestion that may be interesting for a lot of people: make it optional, for people who actually do not give a fuck if their 10 becomes a 7 in the average or the other way. I'm all for optional tools and ranks. But don't make me accept it as a golden standard. The current system may be not fair, but it at least is equal and representative: I rate and my rating weighs in the same way as anybody else's. This proposed system is neither of the three. It's just a bunch of assumptions on how it should be according to some virtual statistic value that is arbitrarily given the status of preferential.

jal90Feb 27, 2016 11:49 AM

I watch movies/I write about movies (in Spanish)/I watch anime

Feb 28, 2016 2:21 AM

#208

Mint

Offline

Oct 2014

617

Okay, I want to break this down incredibly simplified since I still don't see any proper responses to any of the very basic, fundamental flaws of a proposed weighted system both in general, and the specific system in this thread.

Jappy said:

From what s1rnight posted, context is considered in the form of % of a rating in any user's given list. Even if you think that is not enough, it is certainly not being arbitrary (wouldn't hurt to post your own idea of context if this is the case instead of just using the word 'context' here and there). I've never seen it as giving older shows a chance, but putting all shows on the same playing field which is impossible with the current rankings - this will never change.

Let's make this incredibly simple. I really don't get this "puts all shows on the same playing field" argument. How?

The current system values each user's input equally. Let's just ignore anomalous outliers and dummy accounts for a second and just consider legitimate individuals scoring their lists. The current system values each person's input equally, and ranks each show accordingly with a minimum input population to avoid anomalous, highly volatile low member counts (according to the site only 50, which is incredibly low).

The very basis of a weighted system assumes each user's credibility is not equal. It has absolutely nothing to do with "putting all shows on a level playing field", since there's nothing in the current system which differentiates between shows, and nothing in the proposed system changes that either. The only thing that changes is the population's relative impact on scores based on an arbitrarily chosen mean and distribution model.

I think "context" mentioned is a little too loose of a term, and is more accurately to do with the underlying factors which directly contribute to the existence of rating bias. As I've mentioned, it's ludicrous to assume a majority of the population simply picks shows purely at random. That's undeniable.

Jappy said:

It's common sense that if User X is removed from the picture in a user base this large, there is no observable change in the rankings. No one user has that much influence. Of course if a number of users are removed then there will be changes.

This is another ridiculous argument. The point here isn't about giving any specific individual a certain amount of power. The point is completely shifting rating weights towards some people more than others, which obviously has a huge impact on the scores.

Now this per se isn't a problem, we can assume some people to have more credibility than others, but my gripe with this currently structured system is that the values chosen are completely arbitrary and don't attempt to take into account a multitude of other relevant factors.

Here's a very simple breakdown of what each ranking system currently assumes so you can get a better handle on what's going on, and why this methodology of thinking is poisonous if you don't actually put practical thought into it, and as so eloquently put by other posters, "get emotional".

Current MAL ranking system:

1. Every user has the capability to discern a relative value of a certain given show he/she has chosen
2. Every user's input is equal, and an equal amount of thought and consideration is placed into each rating

-> Doesn't matter how or what each person watches, all ratings are accepted as equal input.

Under this specific proposed weighting system:

1. Each user's input is not equally credible, so a relative weight has to be attached to each users rating, in this case relative to his/her's score distribution.
2. The most credible users have a mean of 5

2 -> assumes every person picks shows at random to watch and rate
-> assumes the "true" ratings distribution of all MAL DB shows to approach that of a centralized distribution with a mean of 5.

Now we can agree that not everyone is equally credible, example being a 10 year old kid versus an experienced critic who has reviewed hundreds of shows, hence the positive initial intent of a weighted system.

But there are so many problems with arbitrarily assigning a centralized distribution with mean 5 as a basis for all critique. Not only is it mindlessly arbitrary (perfectly and likely more legitimate critics can have a mean 4, 6, but are valued less than someone with a 5, who may well be rating everything 10 and 1).

Why do you believe that there is a bias significant enough such that this proposed system doesn't simply misconstrue data to an even higher extent? This system ignores the real factor of loss aversion, and simply aggregates it as decreased credibility. It then comes down to how we can link credibility to distribution type, and the short answer is, you can't.

Jappy said:

The current system is fair in a world where everyone watches every show... kind of - although rankings are primary, the ratings would probably need more significant figures but I digress.

The current system is fair in the sense that any and every opinion is valid for any given show. If you were to pick up an older show and give it a 10, it would be a 10. Not become a 8 because your mean is 6. Every show is given an equal chance to be rated by any individual, irregardless of his own list.

Jappy said:
This is an experimental process, you keep changing the mathematics behind it (including contexts which would provide different starting points for any given formula) until you reach the most optimal result - thus justifying the means.

I want to end on this note. The big risks (especially seen very apparently prevalent ITT) are that people use mathematics here to justify their reasoning behind why this methodology is a more "accurate" reflection of "true" ratings. As I've mentioned already and will mention again, mathematics is simply a logical process, it's not an explanation as to the validity of a system in a social environment. The assumptions of which the mathematics is based on have to be sound for the results to make sense. This is the very real problem of why classical economics and econometrics failed worldwide policy and the birth of behavioural economics arose. Too many mathematical processes with no real attention paid to the actual agents of the process, the people.

@Eucli
There's one thing I will say. I'm not going to appeal to the casual viewerbase or casual viewers in general, since this is indeed attempting to appeal for a more critically evaluative rating system. There are much more deep rooted issues in this proposed system outside of marginalizing the "casual" viewer rating impact.

Feb 28, 2016 4:24 AM

#209

Jappy

Offline

Apr 2012

@Mint

Yes, that's the fundamental of a weighted system but how does that contradict my claim regarding level playing field? Regarding that claim specifically, to quote my earlier post, with the majority of the user base being classed as 'casual viewers', "they'll watch shows based on friends' recommendations, genres, popularity, status, currently airing etc." It's rather homogeneous and those that pick up the discarded shows aren't going to belong to that user group. These shows are going to end up consistently lower than a lot of the seasonal type shows only because the usergroups interested in them differ which is directly affected by a different use in one's own rating scale (therefore it being entirely justifiable to factor in the nature of every user's personal rating scale when creating a weighted system). It ends up having nothing to do with the merits of either show. I never assumed users pick shows at random.

The 5.00 argument you've made simply does not hold. Check s1rnight's post on the last page. The curve itself has a larger effect. If you have someone who just rates 10s and 1s for a mean score of 5.00, they have a 12:15 ratio of 10s (44.44%) to 1s (55.56%). They're 10s would only be worth 7.78 (if I've done this correctly) and 1s, 2.78. Even people that rate every third show they watch a 10 still have their 10s weighted at 8.33. It's far more generous than you make it out to be.

JappyFeb 28, 2016 4:29 AM

Feb 28, 2016 5:27 AM

#210

Mint

Offline

Oct 2014

617

@Jappy

And how does this exactly make it "level"? Let me take a very black and white hypothetical. Let's just say show X was only watched by these "casuals" (with a list score mean of 10), and all of them rated it 10, while this separate entity, "perfect raters" watched only other titles. Suddenly, that show under this system will be rated 5 instead. Now let's imagine for show Y, 50% of these "perfect raters" rated the same show 10, while the "casuals" continued rating it 10. Suddenly, the score is only valued 7.5. Let's now compare it with show Z, where all "perfect raters" have rated this show 10, but no casual voters have casted their vote. This show is rated 10.

What exactly makes Z > X? More importantly, what makes Z > Y, even though it's supposedly critically acclaimed? The system simply punishes all shows with a high "casual" viewership. It makes no sense. This is of course an oversimplified version.

Jappy said:

visual - http://puu.sh/nbdvL/67c7a67bab.png
spreadsheet - http://puu.sh/nbdsZ/f0d4efd4a1.xlsx

This was the formula I was referring to. Which holds true since this is your vision of what a weighted scoring system would look like. This only relies on mean alone and a 1, 10 rater would have his 1's and 10's have full impact on scores.

As for s1rnight's formula:

s1rnight said:
(((number of shows lower than your score) + ((number of shows equal to your score)/2)) / number of shows you have rated) * 1000

so if someone has rated a show a 9, and they've watched 25 shows, and rated 22 shows less than a 9, and 3 shows a 9 (making 9 their maximum rating, given to 3/25 shows), it'd come out to (23.5/25) => 9.4. someone rating the same show a 9, has watched 25 shows, rated 10 shows less than a 9, and 10 shows a 9 (making 5 tens, 10 nines, 10 less than nines) would come out to (15/25) => 6

--

s1rnight said:

since you've rated 2% of anime a 10 (and consequently 98% below a 10), your 10 rating would be roughly equivalent to 98% + 1% (the median of the "tens") to give a weighted rating of 9.9. going downwards, a 9 would be equivalent to 89 + 4.5 => 9.3, and 8 equivalent to 69 + 10 => 7.9, 7 equivalent to 55 + 12 => 6.7, 6 equivalent to 32 + 11.5 => 4.35 etc etc.

your ratings are quite bell-curvey so you can see the ratings stick pretty closely to their out of 10 equivalents, at least until it gets to 6. it's still above 5 at the "sevens" mark but drops below at the "sixes", which makes perfect since considering your mean's between 6 and 7

I was going off the assumption that it approaches a normal distribution, at the very least a centralized theorem that the "true" ranking system mimics such a distribution.

Jappy said:
It ends up having nothing to do with the merits of either show. I never assumed users pick shows at random.

You seem to struggle with the idea of loss aversion. You may not think or assume this, but this system inherently does. It somehow seems to assume a user is indifferent or completely uninformed when picking a show to watch. It's in the way it's calculated. Just like a negative skew would imply ratings distributions mimic one with a higher mean, or a positive skew imply ratings clustered over a lower mean. Weighting in this very specific way assumes to skew all results as if each list should be expected to fit a specific distribution, regardless of what the "actual" distribution currently is.

Feb 28, 2016 5:30 AM

#211

Eucli

Offline

May 2013

1289

Mint said:

@Eucli
There's one thing I will say. I'm not going to appeal to the casual viewerbase or casual viewers in general, since this is indeed attempting to appeal for a more critically evaluative rating system. There are much more deep rooted issues in this proposed system outside of marginalizing the "casual" viewer rating impact.

I agree. Was kinda fixed on that when talking about context, so I revolved my argument around it.
I should have gone deeper, but you pretty much covered more things than I could have said, soo good post!

Also this one is a nice sum. I mentioned earlier that this system assumes the user picks anime like an RNG machine, but this one explains it way better.

EucliFeb 28, 2016 5:33 AM

Feb 28, 2016 6:06 AM

#212

Jappy

Offline

Apr 2012

@Mint

As far as hypothetical scenarios go, that is truly out there. In a scenario like this the consequences should be considered. All three shows would gain exposure through word-of-mouth with show Z having the benefit of also being first in the rankings. All three shows end up approaching whatever value it is. Keep in mind that a show being '7.5' means nothing in the context of a weighted system. The highest show on the site currently LoGH at 'only' 8.88. Also the term 'perfect rater'... I hope you mean in the sense of using the rating scale and not 'they have perfect judgement'.

That is not 'my vision' of a weighted system. Personally, I'm not tying myself down to any one implementation of a weighted system. It was only a visual representation of what Max had posted at the time which turned out to be wrong.

And yes, I've reread what you've said about loss aversion a few times and don't quite get it.

Feb 28, 2016 7:06 AM

#213

Mint

Offline

Oct 2014

617

Jappy said:
@Mint

As far as hypothetical scenarios go, that is truly out there. In a scenario like this the consequences should be considered. All three shows would gain exposure through word-of-mouth with show Z having the benefit of also being first in the rankings. All three shows end up approaching whatever value it is. Keep in mind that a show being '7.5' means nothing in the context of a weighted system. The highest show on the site currently LoGH at 'only' 8.88. Also the term 'perfect rater'... I hope you mean in the sense of using the rating scale and not 'they have perfect judgement'.

The entire purpose of the hypothetical was to demonstrate that the mere addition of "casual" (I'll just use the term unoptimal from now on since I dislike this reference) raters drops rating in an illogical matter. 7.5 doesn't mean much, but what it does mean in a ranking system is that it's ranked lower than 7.6, 7.7, 8 and 10, since it's the only metric of evaluation.

That's a good point to realize (as I've pointed out already in almost all of my posts here) too, that the "perfect rater" is simply one that has a mean of 5 and a centralized distribution of ratings. Obviously "the perfect rater" here is a term used on the basis of the system, and to an extent, used ironically. Any weighted system unless adjusted upwards in some manner will have all scores dropped so of course the static value holds little meaning.

Jappy said:

That is not 'my vision' of a weighted system. Personally, I'm not tying myself down to any one implementation of a weighted system. It was only a visual representation of what Max had posted at the time which turned out to be wrong.

Yeah, and it was wrong and inaccurate, and there's no evidence as to suggest s1rnight's version is more "accurate" than the current system. I'm arguing based on fundamental mathematical evaluation here, which like I said, is what people here fallaciously use as a basis of logic rather than analyze what such a system entails.

Jappy said:

And yes, I've reread what you've said about loss aversion a few times and don't quite get it.

I'll forgive you for that, but here, I'll help you out. Loss aversion is a behavioural theory that arose from the failed assumption that human beings value gains and losses equally. Rather than get into the details, the conclusion was this: losses and gains aren't valued equally. Example: Offer A: you have 50% chance to lose $10, 50% chance to gain $20. So your average payout is $5. B: you have 50% chance nothing happens, 50% chance to gain $8. This gives you an average payout of $4. If people were indifferent to losses, A would always be picked over B. In reality, this is not the case.

Translating this to time, not every person is going to pick shows indifferently (let's just assume for convenience that s1rnight's proposed distribution mimics "true" rating distributions). Most people have the constraint of time, and will most likely rather invest time into deciding whether or not they should or should not watch a show based on the wealth of preview information available. Thus this suggests a negative skew in ratings for this populous. This stays true even if someone wants to watch something less than good just for the sake of it, as we can assume this is in a small or insignificant proportion.

Feb 28, 2016 8:26 AM

#214

Jappy

Offline

Apr 2012

I understand the theory itself, I just wasn't making the connection but isn't it just what Eucli mentioned and I acknowledged in posts #214/215?

For what Max posted, it's not as if it has no reasoning behind it. It's not just maths for the sake of it. You can argue that those with an unusually high or low mean should not have as much of a say; this coupled with the belief at the time that the rather promising initial results of the weighted system were from what Max posted lead to me defending it. I still haven't completely given up on the idea of treating outliers specially being either a part of or the base for the weighting although I don't have as much confidence as I can't see results anywhere. As I've already said, what is actually in place also does have a logic behind it although both fall prey to the concept of 'loss aversion' if I'm on the same page as you now and understanding correctly. However wouldn't that be the case with any weighted system? Regardless of the logic behind it, the outliers/special cases will always generate some wonky or unfavourable results yet I'm skeptical to claim that this is enough of a reason to put down any suggestion to implement one. It's quite reasonable that if you try to modify whatever formula you have to accommodate these users it will end up breaking other things. Not so eager to jump to a conclusion though. To me, it becomes a matter of opinion and thus you weigh the pros and cons and given the initial results I'll side with a weighted one. No system is going to be perfect.

JappyFeb 28, 2016 9:51 AM

Mar 1, 2016 2:19 PM

#215

_Ghost_

Offline

Jan 2013

730

The point of the "All Anime" ranking is to show the most popular shows based on their scores, and number of voters, at the given time x. While x is the moment you click on it. Popular new shows that make the top, simply are popular at that given time.

Adjusting the scores as suggested here would be similar to changing the scoring system, which has been waived numerous of times. Some think a binary system is fine, others a penta, and so on. MAL will stick to a 1-10 rating system, and will base its ranking accordingly.

As already mentioned by others there are several issues with this suggestion.

First being, people will tend to watch shows they like. They will watch recommendations based on the shows they like, and thus rate these shows in a similar positive manner.

People with around 100 entries (which is a large majority of MAL's userbase) will mostly likely not have seen a lot of shows they would actually rate with a 1. The more shows you see, the closer you get to an average of 5. The 5 average is actually the "optimal" average, the true average tends to be closer to 6. So the suggestion itself panders more to the few than the many. Which is not the point of the ranking.

Using the mean score also means that the scores would be dependent on other shows. This causes rankings to be skewed even on a micro-level, which in the case below wouldn't make much sense.

For example:

User A (Mean Score 10): HxH (10), Death Note (10), FMA:B (9), ...

-> HxH (~5), Death Note (~5), FMA:B (~5), ...

User B (Mean Score 10): HxH (10), Death Note (8), FMA:B (7), ...

-> HxH (~5), Death Note (~4), FMA:B (~4), ...

User C (Mean Score 5): HxH (5), Death Note (6), FMA:B (7), ...

-> HxH (~5), Death Note (~6), FMA:B (~7), ...

With the resulting ranks:

1. FMA:B (~16)
2. Death Note (~15)
2. HxH (~15)

Whereas the original ranking would be:

1. HxH (25)
2. Death Note (24)
3. FMA:B (23)

The suggestion depends too heavily on a diverse scoring ladder, which simply isn't the natural scoring pattern of users. While the flaw is apparent on the micro-level with my example, on the macro-level you probably won't see a huge negative impact within the top 100-200 ranking shows, but it becomes more apparent with middle, or low ranking shows.

At the time of writing this:

Sword Art Online, Ranked 551 on MAL
Sword Art Online, Ranked 10508 on s1rnight's site

Probably to the amusement of some, SAO, an easily recommended series for beginners, #2 in Popularity on MAL, is seeing a drop of nearly 10 000 ranks. Essentially, due to exactly its nature of being a "beginner show" (meaning 10s are being evaluated as ~5s), and not much backing from the users with averages around 5. (Due to being polarizing.)

Any adjustments to formulas will result with similar results as long as they are dependent of the mean score, hence why the suggestion is waived.

_Ghost_Mar 31, 2016 4:38 AM

See All Discussions

» Japanese Language Board LifelineByNature - Sep 23	10	by Captain-577 »» 10 hours ago
» I think it's now time to update abuse section on mal forum guidelines to add "anime tourist" in insult category. jacobPOL - Sep 22	9	by NS2D »» Yesterday, 11:49 AM
» Remove Non-Anime Content from MAL (Music Videos, PVs, CMs, etc.) TaviiTavii - Sep 22	2	by Shishio-kun »» Sep 23, 12:08 AM
» Stop with the 'We value your privacy' pop-ups _cjessop19_ - Sep 9	7	by Shishio-kun »» Sep 22, 12:13 AM
» Request to Increase Favorite Character Limit ENANO7211 - Sep 21	0	by ENANO7211 »» Sep 21, 2:46 PM

Top Anime: Adjustment to the Weighted Ranking System [Waived]

More topics from this board

» Japanese Language Board

» I think it's now time to update abuse section on mal forum guidelines to add "anime tourist" in insult category.

» Remove Non-Anime Content from MAL (Music Videos, PVs, CMs, etc.)

» Stop with the 'We value your privacy' pop-ups

» Request to Increase Favorite Character Limit

MoreTop Anime

MoreTop Airing Anime

MoreMost Popular Characters