Dvd Avins (

**barking_iguana**) wrote2016-07-07 04:24 pm### (no subject)

Let's put that question I asked in May in plain language.

I have a bag of unfair coins. Some will come up heads half the time, some less, some more. I don't know if they're wildly spread out, usually stay between, say, 45% and 55%, or what.

I take a good handful of the coins and I flip each of them several times. I'm not systematic, though. So coins don't all have the same number of flips. But at least for each coin I flipped, I know how many heads and how many tails I got.

I want to know just how bad the whole bunch of coins are. Or at least, the best guess I can make from the data I have. If I took the whole bag and flipped every coins millions of times, I could easily calculate the standard deviation of heads probability among the coins. But what's the best I can do from the limited information I actually have?

I have a bag of unfair coins. Some will come up heads half the time, some less, some more. I don't know if they're wildly spread out, usually stay between, say, 45% and 55%, or what.

I take a good handful of the coins and I flip each of them several times. I'm not systematic, though. So coins don't all have the same number of flips. But at least for each coin I flipped, I know how many heads and how many tails I got.

I want to know just how bad the whole bunch of coins are. Or at least, the best guess I can make from the data I have. If I took the whole bag and flipped every coins millions of times, I could easily calculate the standard deviation of heads probability among the coins. But what's the best I can do from the limited information I actually have?

## no subject

markgritter.livejournal.com2016-07-09 04:47 pm (UTC)(link)The set of coins' true probabilities has a mean and variance. For purposes of building a maximum-likelihood model, we need to assume some distribution or family of distributions, so assume a normal distribution. (This step is already sketchy because we know the values are bounded between 0 and 1, and thus the variance is maximized at 1/4, so maybe research would show a better starting point.)

Given model parameters M = ( mean probability of heads, variance ) we can then calculate a likelihood score on your observations. (Normally Theta is used for M in the literature.)

We need to work with density functions f( X | M ). Fortunately your example is discrete, so the probability density function is the same as the probability itself. What that notation means is that given an event X == a measurement of one particular coin, calculate the probability of that outcome given a particular choice of model parameters M.

So, for a trivial example, say M = ( 0, 0 ). Then f( zero heads in 3 samples | M ) = 1.

A less trivial example, say M = ( 0.5, 0 ). Then f( zero heads in 3 samples | M ) = 1/8.

I'm assuming you're going to do this numerically so a Monte Carlo approach may be good enough--- pick a bunch of coin from C at random with distribution M, flip them each the same number times as in X, record the results for many trials, and use that as f( X | M ). (You might want to pick your model for C in such a way as to make calculating f( X | M ) easy.)

Then we define likelihood of the observations as

f( X1, X2, ..., XN | M ) = f( X1 | M ) * ... * f( XN | M )

though often log-likelihood is used which lets you add instead of multiply.

Now that you know how to calculate the likelihood score, you can then find the value of M which maximizes the likelihood, keeping your observations fixed. If f( X | M ) can be represented as a function, it may be possible to come up with a mathematical answer in terms of the X's but in practice I think it will be too complicated to do so.

So you will have a lot of computational work to do trying out various M's to find one which has a high likelihood.

As an example of simplifying the model for your coins, you could assume that they have discrete probabilities, in a simple case the coins have J probability of being all-heads, K probability of being all-tails, and 1-J-K probability of being fair. Then your model M = (J, K) but the formula for f( X | M ) can be expressed simply:

If X has all heads, then f( X | M ) = J * (1-J-K)*(0.5)^( num trials of this coin)

If X has all tails, then f( X | M ) = K * (1-J-K)*(0.5)^(num trials)

If X has a mix, then f(X | M) = (1-J-K)

Unfortunately that approach gets more complicated too if we add another bucket.

Edited 2016-07-09 16:50 (UTC)## no subject

markgritter.livejournal.com2016-07-09 04:52 pm (UTC)(link)## no subject

markgritter.livejournal.com2016-07-09 10:01 pm (UTC)(link)