OK, this is out of my comfort zone, but if I had to solve this problem here's how I'd tackle it.

The set of coins' true probabilities has a mean and variance. For purposes of building a maximum-likelihood model, we need to assume some distribution or family of distributions, so assume a normal distribution. (This step is already sketchy because we know the values are bounded between 0 and 1, and thus the variance is maximized at 1/4, so maybe research would show a better starting point.)

Given model parameters M = ( mean probability of heads, variance ) we can then calculate a likelihood score on your observations. (Normally Theta is used for M in the literature.)

We need to work with density functions f( X | M ). Fortunately your example is discrete, so the probability density function is the same as the probability itself. What that notation means is that given an event X == a measurement of one particular coin, calculate the probability of that outcome given a particular choice of model parameters M.

So, for a trivial example, say M = ( 0, 0 ). Then f( zero heads in 3 samples | M ) = 1.

A less trivial example, say M = ( 0.5, 0 ). Then f( zero heads in 3 samples | M ) = 1/8.

I'm assuming you're going to do this numerically so a Monte Carlo approach may be good enough--- pick a bunch of coin from C at random with distribution M, flip them each the same number times as in X, record the results for many trials, and use that as f( X | M ). (You might want to pick your model for C in such a way as to make calculating f( X | M ) easy.)

Then we define likelihood of the observations as

f( X1, X2, ..., XN | M ) = f( X1 | M ) * ... * f( XN | M )

though often log-likelihood is used which lets you add instead of multiply.

Now that you know how to calculate the likelihood score, you can then find the value of M which maximizes the likelihood, keeping your observations fixed. If f( X | M ) can be represented as a function, it may be possible to come up with a mathematical answer in terms of the X's but in practice I think it will be too complicated to do so.

So you will have a lot of computational work to do trying out various M's to find one which has a high likelihood.

As an example of simplifying the model for your coins, you could assume that they have discrete probabilities, in a simple case the coins have J probability of being all-heads, K probability of being all-tails, and 1-J-K probability of being fair. Then your model M = (J, K) but the formula for f( X | M ) can be expressed simply:

If X has all heads, then f( X | M ) = J * (1-J-K)*(0.5)^( num trials of this coin) If X has all tails, then f( X | M ) = K * (1-J-K)*(0.5)^(num trials) If X has a mix, then f(X | M) = (1-J-K)

Unfortunately that approach gets more complicated too if we add another bucket.

## no subject

Date: 2016-07-09 04:47 pm (UTC)markgritter.livejournal.comThe set of coins' true probabilities has a mean and variance. For purposes of building a maximum-likelihood model, we need to assume some distribution or family of distributions, so assume a normal distribution. (This step is already sketchy because we know the values are bounded between 0 and 1, and thus the variance is maximized at 1/4, so maybe research would show a better starting point.)

Given model parameters M = ( mean probability of heads, variance ) we can then calculate a likelihood score on your observations. (Normally Theta is used for M in the literature.)

We need to work with density functions f( X | M ). Fortunately your example is discrete, so the probability density function is the same as the probability itself. What that notation means is that given an event X == a measurement of one particular coin, calculate the probability of that outcome given a particular choice of model parameters M.

So, for a trivial example, say M = ( 0, 0 ). Then f( zero heads in 3 samples | M ) = 1.

A less trivial example, say M = ( 0.5, 0 ). Then f( zero heads in 3 samples | M ) = 1/8.

I'm assuming you're going to do this numerically so a Monte Carlo approach may be good enough--- pick a bunch of coin from C at random with distribution M, flip them each the same number times as in X, record the results for many trials, and use that as f( X | M ). (You might want to pick your model for C in such a way as to make calculating f( X | M ) easy.)

Then we define likelihood of the observations as

f( X1, X2, ..., XN | M ) = f( X1 | M ) * ... * f( XN | M )

though often log-likelihood is used which lets you add instead of multiply.

Now that you know how to calculate the likelihood score, you can then find the value of M which maximizes the likelihood, keeping your observations fixed. If f( X | M ) can be represented as a function, it may be possible to come up with a mathematical answer in terms of the X's but in practice I think it will be too complicated to do so.

So you will have a lot of computational work to do trying out various M's to find one which has a high likelihood.

As an example of simplifying the model for your coins, you could assume that they have discrete probabilities, in a simple case the coins have J probability of being all-heads, K probability of being all-tails, and 1-J-K probability of being fair. Then your model M = (J, K) but the formula for f( X | M ) can be expressed simply:

If X has all heads, then f( X | M ) = J * (1-J-K)*(0.5)^( num trials of this coin)

If X has all tails, then f( X | M ) = K * (1-J-K)*(0.5)^(num trials)

If X has a mix, then f(X | M) = (1-J-K)

Unfortunately that approach gets more complicated too if we add another bucket.