The goal of this assignment was to test the hypothesis that players who are killed in the game by cheaters become more likely to adopt cheating themselves. In other words, to test wether cheating is contagious.
We tested the hypothesis by analysing gameplay data gathered from the online game PlayerUnknown's Battlegrounds. The game is a player versus player shooter game in which up to one hundred players fight in a battle royale, a type of large-scale last man standing death-match where players fight to remain the last alive. Players are able to cheat by adopting unapproved software that gives them an advantage, for example being able to see through walls or getting guns that aim automatically.
To test the hypothesis, we counted how many victims of players become cheaters within a certain period of time. In other words, how many victim–cheater motifs we observe. We then simulated alternative universes, in which the players played the same games in the same sequence but happened to be killed by someone else. Finally, we compared the number of victim-cheater motifs observed in the real world to what we observed in the "randomised" worlds. If cheating actually is contagious, we would expect that the number of observed victim-cheater motifs is higher in the real world.
Games = Networks
Imagine each game as a directed network, in which the players are the nodes, and a kill between two players constitutes an edge, from killer to victim. If cheating is contagious, we expect victims of cheaters to become cheaters more often than players that were never killed by a cheater. 
Thus, we count how often victims of cheaters become cheaters themselves within a certain period of time. In the example below, players 2-5 are victims of a a cheater, player 1. In the next game, two of the victims, player 3 and player 5, become cheaters themselves. 
Then, we simulated alternative universes in which the players played the same games in the same sequence but happened to be killed by someone else. In the example below, we would expect that, in the original network, players 7-10 will have a higher likelihood of cheating in the next games. In the randomised network, we would "expect" that players 8, 11, 15 and 16 are more likely to cheat in upcoming games.
If cheating does spread through contact with cheating opponents, the count of victims that become cheaters in the empirical network should be significantly higher to that in a suitably randomised network. 
In other words, we expect the true victims of a cheater, players 7-10 to become cheaters more often in upcoming games than the "fake" victims of cheaters, players 11, 15 and 16. However, if both the true and the fake victims of cheaters become cheaters at the same rate, it means that there is no evidence for social contagion of cheating.
We analysed a dataset that contained 6'000 matches, in which over 500'000 kills were made. Each match can consists of up to 100 players, which means that one has to imagine the networks above with up to 100 nodes. 
In a separate dataset, we got the player account ID of cheaters, the estimated date when the player started cheating, and the date when the player's account was banned due to cheating.
Apart form testing the hypothesis, the goal of this assignment was to write legible, modular, and optimised Python code. We were only allowed to use fundamental Python data types (lists, tuples, dictionaries, NumPy arrays etc.) and were banned from advanced data querying and data analysis packages (including Pandas). Furthermore, we had to think about how to create "suitably" randomised networks. For example, the games must be randomised in such a way that a player cannot kill if he has already been killed.
I found no evidence for the contagion of cheating in the data provided. This result is in line with the paper that was published using this dataset: the researchers found that social contagion is only likely to exist for those players who both experience and observe cheating multiple times.
My code received full marks for oder-of-growth optimisation and legibility. As for modularity, I only got points deducted because I saved all my functions in one .py file instead of different ones. 
Overall, my code got 83 out of 100 possible points (with a class average of 63 points).
Unfortunately, I am again not allowed to share my code for this assignment, as the school might want to reuse the assignment in the future. 

Back to Top