Going Back to the Monkey Typewriter Problem

by Mithrandir

I began this blog with two articles on the Monkey Typewriter Theorem and, in the last one, I’ve said that there should be a possibility to stop the Monte Carlo simulation at exactly the needed point (when the required precision has been attained).

Truth is there may be at least one method to do this.

Let’s suppose that our simulation is made in order to obtain a single real value (the theory presented here can be extrapolated to the multidimensional case with a little pain and some tricks but it works). We will consider the Monte Carlo simulation as a process in which we experimentally measure a natural measure.

Suppose we name the result of the i-th measurement x_i. We’ll note a_x the real value (the one we cannot obtain) and we will take

\displaystyle\widetilde{x_n} = \sum_{i=0}^{n}{x_i}

to be the average of the n computations (n runs of the Monte Carlo simulation). By the Law of Large Numbers (the same law that stays behind the Monte Carlo simulation) we can be certain that, with a high probability p

a_x \in [\widetilde{x_n} - z_l\sigma(\widetilde{x_n}), \widetilde{x_n} + z_l\sigma(\widetilde{x_n})]

where z_l and p are strongly related to each other. In fact, a_x is uniformly distributed in this interval with a pdf of constant value C. By integrating this pdf we obtain

1 = \displaystyle\int_{-\infty}^{\infty}{Cda_x} = 2Cz_l\sigma(\widetilde{x_n})

from which one can obtain the C value.

Now, let’s turn our attention to a concept known as the information entropy. We will use this concept to obtain a relation which will enable us to stop from making further simulations. By using it’s general definition, we have

H(p(x)) = -a\displaystyle\int_{-\infty}^{\infty}P(x)log_b(P(x)\Delta x)dx

H(p(x)) = -a\displaystyle\int_{\widetilde{x_n}-z_l\sigma(\widetilde{x_n})}^{\widetilde{x_n}+z_l\sigma(\widetilde{x_n})}Clog_b(C\Delta x)dx

H(p(x)) = -2az_l\sigma(\widetilde{x_n})Clog_b(C\Delta x)

By taking into account the formula for C above, we obtain

H(p(x)) = a log_b(2z_l\sigma(\widetilde{x_n})) - a log_b(\Delta x)

H_n = a log_b \displaystyle\frac{\sigma_n}{\Delta x}

which turns out to be something similar to what we were looking for (in the last line I’ve changed the notations a little, without changing the names and the meanings, only to show that what we have obtained depends strongly on the number of measurements we have undertaken). In fact, by turning our attention to the (apparent) information we obtained in the last measurement

I_{app} = H_{n-1} - H_{n}

we see that, because \sigma_{n} < \sigma _{n-1} and \sigma_{n} > 0, our last result is always positive and decreasing.

Thus, if someone would be able to deduce a relationship between the precision of the measurement and the value of I_{app}, we could stop the Monte Carlo process earlier or decide – algorithmically, thus eliminating any need for human intervention – to continue for another 1000 (random value) runs.

However, there’s a glitch: one single malformed data will affect our computation too badly. Thus, we need something to eliminate any outliers from the early stages of computation.

And we need some code results.

About these ads