Uniformity

by Mithrandir

Suppose we have to make a quiz. The questions have different degrees of difficulty and come from several different chapters. We have the following problem:How can we ensure that we select the questions uniformly from both domains?

First, we see that the problem is similar to finding how uniformly scattered are some points on a rectangle. Yet, even this problem has no simple solution. For example, taking the centre of all the points and seeing if it is close to the centre of the rectangle fails. In the same manner will fail any method using statistical momentums.

What is simpler than one 2D problem? A 1D one. We arrived at the point where we must determine if one function’s graph is horizontal. But this can be solved by using a derivative. However, this solution doesn’t properly extend to higher dimensions because we are forced to use finite differences (our problem is a discrete one).

Let’s look at the problem from another point of view. Plotting one instance we have:

Initial distribution

Without reducing the generality or changing the data aspect, we will divide each point value by its sum, thus normalizing the distribution. Also, we’ll take into account an uniform distribution over the same interval:

Normalize distribution

We can easily see now a solution: the shape of the distributions is similar only if their common area is 1. Thus, the uniformity degree is given by the common area of the distributions (normalized, of course):

Take common area

We can easily extend this idea to more dimensions. I don’t know about the fractal dimensions but this solution seems to work for all integer ones.

Here is a Python script used for one instance of the problem, illustrating the idea:

def sum_dist(d):
    s = 0
    for line in d:
        s += sum(line)
    return s
 
def get_pdf(md):
    s = sum_dist(md)
    return [[col/(s+0.0) for col in line] for line in md]
 
def orig_dist(ids):
    return [[len(col) for col in line] for line in ids]
 
def overlap(d1, d2):
    l = [zip(a, b) for (a,b) in zip(d1, d2)]
    return sum_dist([[min(x) for x in line] for line in l])
 
def main():
    ids = ...
    U = get_pdf(get_uniform())
    orig = orig_dist(ids)
    print "Degree of uniformity: ", overlap(get_pdf(orig), U)
 
if __name__ == "__main__":
    main()

That’s all.

About these ads