2,3 Equidistribution

Several things to go into details here: exponential growth (exercise 2) and the multiplicative (Benford) vs. linear (equidistribution), leading to a way to prove something is Benford.

Recall that we want to find the what proportion of a sequence \{a_n\} satisfies d 10^m \leq a_n < (d+1) 10^m for some integer m, with d being the leading digit we choose.

Let’s go over our example of population growth and modify things to give a new interpretation: the probability becomes P(d 10^m \leq 2^n < (d+1) 10^m)

=P(\log [d 10^m] \leq \log 2^n < \log[(d+1) 10^m])

=P(\log d \leq n\log 2-m < \log(d+1) )             (1)

Interpretation: Note that \log d and \log (d+1) lie between 0 and 1, so even though we specify m is whatever integer, it is forced to be the greatest integer that’s less than n\log 2 (for example if n\log 2=23.134..., m has to be 23 and n\log 2-m=0.134...). Notation: n\log 2-m will be denoted n\log 2 \mod 1 and geometrically the interval from 0 to 1 is glued to become a circle and 0.3 is the same as 1.3, 2.3, … and as n increases by 1, n\log 2 \mod 1 looks like rotation around a circle (with an irrational angle of 108.37… degree). Hence the name irrational rotation, meaning \log 2 is irrational (Exercise: why is log 2 irrational? hint).

Notice the answer we seek (\log(d+1)-\log d) is precisely the difference between the left and right bound in \log d \leq n\log 2 \mod 1 < \log(d+1). So what we need to see is that the sequence of points generated by irrational rotation is evenly distributed around the circle:

Definition: \{c_n\} is equidistributed in [0,1] if for any interval (a, b) in [0,1], the proportion of \{c_n\} in (a, b) is b-a.

So we are done if we show \{n\log 2 \mod 1\} is equidistributed.

By the way, if we repeat everything above replacing \{2^n\} by \{a_n\}, we get the following:

Sufficient condition for Benford: if \{ \log a_n\mod 1\} is equidistributed in [0,1], then \{a_n\} is Benford.

In fact this is how many sequences are shown to be Benford.
Example 1: showing geometric Brownian motion is Benford is reduced to showing bell-shape probability distribution mod 1 is equidistributed in [0,1], which is more tractable although still messy. Therefore we won’t do it (showing \sum_{n=-\infty}^{\infty}\int_a^be^{-\pi (x+n)^2/N}/\sqrt{N} dx \rightarrow b-a as N approaches infinity).
Example 2: Fibonacci sequence is Benford. It is defined by recurrence relation a_n=a_{n-1}+a_{n-2} and the first two values are 0 and 1. In fact, most sequences defined by recurrence relation is Benford. The reason is that they are roughly exponential (for Fibonacci, a_n=b^n+c^{-n}\approx b^n where b, c>1), so that taking log gives something that looks like irrational rotation for large n and can be shown to be equidistributed.
(Details for these examples can be found here)

Let’s get back to showing
Theorem: Irrational rotation, \{ n b\mod 1\}, is equidistributed in [0,1], where n=1, 2, … and b is irrational (Exercise: what happens if b is rational, say 3/4?)

One thing special about irrational is the following

Property: For any x in [0,1], there exist some number in \{ n b\mod 1\} as close to x as we want (but not equal to x).
Proof by example: let’s say x=0, b=0.31 and we want to find n so that n b\mod 1 so that we get closer to x by half or .31…/2. From the sequence {0.31…, 0.62…, 0.93…=0.07…, …}, it is not hard as the third number is good enough . It turns out we can keep on repeating, this time getting closer than .07…/2: {.07…, 0.14…, …, .98…=0.01…}. So why does it work? Because as you get close to x, the two numbers in the sequence that sandwich x are distance b away from each other so one of the two numbers has to be at most b/2 away from x. Picking x to be 0, we have find a way to increment be less than b/2, and so we have find a way to continue, this time finding two numbers in the sequence separated by less than b/2 that sandwich x (why doesn’t this work for rational, say b=1/4? It fails during the second step using x=0, because some points land on 0 and so we cannot cut the distance from x by half.)

Going back to the theorem, we want to see why it works for the case [0, 0.5]. We need to show the proportion*

\frac{\textrm{number of points } m b\mod 1\textrm{ is in [0, 0.5] for m=1, ..., n}}{n} \rightarrow 0.5
as n approaches infinity.

Proof (Sketchy): It looks nasty to compute so let’s be crafty. We can ask about the proportion for [0.5, 1] instead. The answer should be the same intuitively and the above property confirms our intuition: by skipping an initial segment of \{ n b\mod 1\}, it will be \{ x+n b\mod 1\} for any x arbitrarily close to what we want, which is 0.5. Although skipping an initial segment will affect the proportion, but letting n go to infinity eliminates the effect.
Finally, since the proportion in [0, 0.5] and [0.5, 1] must add up to 1, each must be 0.5 and we are done. This argument can be repeated for other intervals and we are finally done.

The above concept is called translation invariance, characterizing uniform probability uniquely and is thus useful in proofs. It will be seen later because it is a disguise for scale invariance of log distribution.

Exercise a (generalized Benford’s law): If instead of the leading digit, we specify the first two digits that we want the probability, say what’s the proportion of 2^n with starting with 14, how would you compute it? Can you come up with a general formula? How about if we use base 7 instead of base 10.

Answer: You want to repeat the procedure to get (1). The first one is log 15 – log 14. If you change the base, then again repeat and you will see that the formula only changes from log base 10 to log base 7.

Different View: Dynamical System

The proportion* above is called the time average as you think of the irrational rotation as happening every second. There is also a space average, which is just the length of the interval under consideration (for above it was [0, 0.5]). Equidistribution can be obtained without using the sketchy argument above: it is a consequence of the fact that time average equals space average:

\frac{1}{n}\sum_k f(T^k x)=\int_0^1 f(x) dx

where Tx=x+b, and f(x)=1 if x mod 1 is in the interval ([0, 0.5] for above) and f(x)=0 otherwise.

Exercise: how does this give equidistribution?

For those who know fourier series, the proof of the formula with our specific map T involves showing the formula is true for f(x)= \cos(2 \pi nx) (or sine) and then claim that we are done because f is periodic.

For more detail, click here.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.