In the last part I argued that lumping all data together would diminish the signal we get. Statistical tools allow us to look at a selection of data and generated a p-value that gives us an idea how significant a finding is.
Most of you will know the famous quote – Lies, damn lies and statistics
Slicing and dicing data to find a s significant p-value is something rightly frowned upon by anyone doing proper statistical work. A hypothesis should be made ahead of the experiment to be truly valid.
Slicing and dicing data gives you multiple chances to you need to correct the p-value accordingly. Relying on extreme data is fraught with danger as you invite to base your findings on possible false data.
I’m putting all of this out here as a disclaimer. I’m not blind or oblivious to these issues. The data I had access to is flawed. The experiment (with hindsight) that gathered the data is flawed. I therefore can’t just look at a p value < 0.05 and call it a day. But p-values still are a powerful tool if I put them into context.
So let us start with the most unlucky players. This was some work I did very early on when I didn’t even had an idea what happened. I was mainly looking for a chance that seemed sensible. I define the most unlucky player as the player who needed to check the highest number of Pokemon to find x shiny. I then look at the number of shiny found and calculate the p-value associated with this data point.
Shiny found | Pokemon checked | out of n players | p value1 in 25 | p value1 in 50 | p value 1 in 100 | p value 1 in 150 |
20 | 850 | 1 | 0.006 | 0.808 | 1.000 | 1.000 |
11 | 821 | 1 | 0.000 | 0.105 | 0.873 | 0.990 |
9 | 1000 | 5 | 0.000 | 0.005 | 0.457 | 0.863 |
8 | 680 | 3 | 0.000 | 0.073 | 0.756 | 0.959 |
7 | 850 | 7 | 0.000 | 0.005 | 0.385 | 0.789 |
6 | 1197 | 10 | 0.000 | 0.000 | 0.046 | 0.315 |
5 | 2062 | 19 | 0.000 | 0.000 | 0.000 | 0.006 |
4 | 900 | 21 | 0.000 | 0.000 | 0.054 | 0.284 |
3 | 1100 | 26 | 0.000 | 0.000 | 0.005 | 0.065 |
2 | 992 | 56 | 0.000 | 0.000 | 0.003 | 0.039 |
1 | 800 | 79 | 0.000 | 0.000 | 0.003 | 0.030 |
0 | 750 | 108 | 0.000 | 0.000 | 0.0005 | 0.007 |
Looking at this data we can clearly rule out a drop rate of 1 in 25 and 1 in 50. The poor player who reported 5 shiny out of 2062 would have to be among the most unlucky persons who ever played. Assuming a 1 in 50 rate the chance of this happening is 1 in 997947802242 – yes – you read correctly – 1 in 997 trillion. so either this data point if wrong – or our hypothesis is wrong.
And now we look at the most lucky players. I excluded the last row – there can be a lucky player who found zero shiny.
Shiny found | Pokemon checked | out of n players | p value1 in 25 | p value1 in 50 | p value 1 in 100 | p value 1 in 150 |
20 | 850 | 1 | 0.994 | 0.192 | 0.000 | 0.000 |
11 | 824 | 1 | 1.000 | 0.897 | 0.129 | 0.011 |
9 | 182 | 5 | 0.195 | 0.004 | 0.000 | 0.000 |
8 | 240 | 3 | 0.624 | 0.054 | 0.001 | 0.000 |
7 | 150 | 7 | 0.253 | 0.011 | 0.000 | 0.000 |
6 | 200 | 10 | 0.692 | 0.109 | 0.004 | 0.000 |
5 | 140 | 19 | 0.490 | 0.063 | 0.003 | 0.000 |
4 | 130 | 21 | 0.598 | 0.121 | 0.010 | 0.002 |
3 | 39 | 26 | 0.069 | 0.008 | 0.001 | 0.000 |
2 | 32 | 56 | 0.135 | 0.026 | 0.004 | 0.001 |
1 | 7 | 79 | 0.029 | 0.008 | 0.002 | 0.001 |
These values clearly rule out any shiny rate smaller then 1 in 50. The data is less extreme – but the 20 in 850 and 9 in 182 still corresponds to a 1 in 1.9 million chance.
We now have a problem. If our data is correct, then we can have neither a high nor a low shiny rate as the lucky and unlucky players just don’t fit.
Some of you will point out that this is just false data. After all – this was survey data. Therefore some deliberately false data or bias is to be expected. And some of that surely has happened.
But just declaring the data invalid when your hypothesis doesn’t fit is lazy. There is another reason why I believe at least most of the data is actually correct. The SilphRoad Research team found a very similar overall drop rate. We regard the SilphRoad Research data as the gold standard and tend to trust it. This data also showed similar patterns as above. Some researchers who seemed ‘too lucky’ and some who seemed ‘too unlucky’.