Suggestion: don't show the number of shared anime and affinity for private lists
An in-depth explantation (and code) can be found here:
https://github.com/stephen-huan/MAL-affinity-attack
The basic idea is that knowing the number of shared anime and the affinity
against a user with a private list leaks information about their list. For
example, one could systematically test whether an anime is in the private
list by creating a list with only that anime, and then check the number of
shared anime. If it's 1, then the anime is in the list, otherwise it isn't.
This seems very inefficient, but there's ways to make it better in practice
--- one doesn't need to go through all the 17,000+ anime in the MAL database,
they can take e.g. the top 1000 which is probably good enough. Also there's a
more efficient binary tree algorithm discussed in the above link that further
reduces the number of trials, especially for small lists.
The more dangerous attack is once the anime in the private list is known,
the ratings for each anime can be determined very efficiently: the number of
affinity checks it takes is just the number of anime in the private list.
This is extremely reasonable and can even be done by hand for small lists.
Thus, there exists a two-phase attack:
1. Determine the anime in a private list with the number of shared anime
2. Determine the ratings of each anime using the affinity
This completely determines a private list and can actually be
done in practice, see the code for a simulation of the situation.
One counter-argument: the statistics take a long time to update, on the
order of ~20 minutes. First, that is just a constant factor difference,
the asymptotic efficiency of the algorithm is linear (e.g. doubling the
private list size just doubles the time to find out all the ratings). All
the ratings in a private list of size 300 (roughly the average list size)
could still be found out in 300/3 = 100 hours, or about 4 days which is
not unreasonable. Second, the algorithm easily parallelizes, which means
each query can be done simultaneously and therefore the attacker only
needs to wait for 20 minutes total.
The ratings attack directly attacks the forum post
"Private ratings but a public list" which suggests implementing the
feature that anime is public but their ratings are private:
https://myanimelist.net/forum/?topicid=1530213
Because the ratings attack is far more efficient than the anime determination
attack, such a feature would be useless in the eyes of an attacker.
The most similar forum post to this one would probably be
"Private anime/manga lists not really private" which discusses how
showing 3 random picks from the private list leaks information:
https://myanimelist.net/forum/?topicid=1204325
This is a slight variation on the "coupon collector's problem":
https://en.wikipedia.org/wiki/Coupon_collector%27s_problem
Even for just 1 random anime shown it would take on average n log n refreshes,
so an expected 385 ln 385 ~= 2,300 refreshes. My ratings attack is n refreshes,
so it is even more efficient than this exploit in terms of refreshes.
Finally, the last forum post I could find that is similar would be
"Private lists are not really private" which describes a direct API exploit:
https://myanimelist.net/forum/?topicid=1525076
Lastly: I don't use private lists and I don't know anyone who does. I don't
have any personal investment in this, I just think that private lists should be
private. This is an extremely minor change and unlikely to affect anyone ---
I doubt anyone has taken advantage of this attack before. But I felt that if
I discovered the attack, I should publicize the fact such an attack exists. |