Counting, Part 1

I am tquackperson.

What if you can make a mistake while counting?

It’s an unfortunate reality. We aren’t perfect counters.

Suppose you start counting at 1. You keep counting until you make a mistake. When you make a mistake, you write down the last number you got right. This is your score. Then, you start over.

You repeat this 76 times before getting bored. The scores you wrote down were:

                   
249 331 17 36 29 104 118 301 47 11
88 461 8 12 51 12 201 13 9 97
383 1 3 6 100 194 6 78 5 56
33 67 9 5 12 3 30 1 4 150
430 12 1 1 248 13 27 7 5 239
456 17 43 12 5 2 3 10 1 17
2 15 13 2 1 147 81 3 1 186
29 60 95 66 90 118        

Being mathematically minded, you wonder how you can model this. Assume that you have some fixed chance $p$ of making a mistake each time you count. Each time you play this counting game, let the random variable $C$ denote your score for that game. Then, $C \sim \mathrm{NB}(1, p)$, meaning that $C$ follows the negative binomial distribution. Great! Now we just need to find a good value for $p$ based on the score data.

We’ll use maximum likelihood estimation for this task. Given our observations $C_1$, $C_2$, $\ldots$, $C_n$ (where $n = 76$ in this case), what is the value $\hat{p}$ that maximizes the likelihood that $C_i$ were all drawn from $\mathrm{NB}(1, \hat{p})$? This likelihood, which we’ll denote $L(p)$, is:

\[L(p) = \prod_{i=1}^n P(C = C_i) = \prod_{i=1}^n p (1 - p)^{C_i}\]

We wish to find:

\[\hat{p} = \argmin_p L(p)\]

Since the $\log$ function monotonically increases,

\[\argmin_p L(p) = \argmin_p \log L(p)\]

It will be far easier to compute the maximum log-likelihood. Working through the math now,

\[\begin{aligned} \log L(p) &= \sum_{i=1}^n \log \left[ p (1 - p)^{C_i} \right] \\ &= \sum_{i=1}^n \left[ \log p + C_i \log (1 - p) \right] \\ &= n \log p + \log (1 - p) \sum_{i=1}^n C_i \\ &= n \log p + c \log (1 - p) \end{aligned}\]

where $c = \sum C_i$.

$L(p)$ is maximized when $\frac{\partial}{\partial p} \log L(p) = 0$:

\[\begin{aligned} \frac{\partial}{\partial p} \log L(p) = \frac{n}{p} - \frac{c}{1-p} &= 0 \\ \therefore\ n (1 - p) - p c &= 0 \\ \therefore\ n &= p (n + c) \\ \therefore\ p &= \frac{n}{n + c} \end{aligned}\]

Thus, $\hat{p} = \frac{n}{n + c}$.

Remarkably, the individual counts themselves have no bearing on the probability estimate. Only the number of games played and the total of scores matter.

For our case, we can compute $\hat{p} = \frac{76}{76 + 5799} \approx 0.012936$. We have a 1.3% chance of failure!


Home

All Posts