Visual Studio 2010 Data Generation Distributions

I hate it when you look up an enum definition and the only help you get is the list of the values.  I already know that!  What does each one do?  This is the case in Data Generation Plan of Visual Studio 2010 with the Column’s Distribution property.

HorizontalRuleWide

In this post, I will look at the possible values of the Column’s Distribution property.  The valid values are Uniform, Normal, NormalInverse, Exponential, ExponentialInverse, but you already know that.

To create these graphs, I used a data-generation plan that generated 10000 values from 0 to 100 for each distribution type.  From this I was able to create idealised graphs that show how the data is shaped.  These graphs (except for the uniform distribution) do not include the noise introduced by the data generator to randomise the generation a bit.

Uniform

The uniform distribution is pretty obvious, but you might think that you get the same value each time.  This is not the case and like all the distributions, the uniform distribution is randomised.

uniform

The mean of the uniform distribution is equal to the total samples divided by the number of samples.  From first glance, the randomised output looks like varies by about 20%.

Normal

The normal distribution is what you would expect:

normal

Normal Inverse

This one has me stumped.  I originally though it was an inverted normal distribution, but it did not match the data.  The best approximation I could make was with a quadratic equation:

normalinverted

This does not fit the sample data well, but its close.  At least you can get an idea of what it will look like.

Exponential

The exponential distribution is actually the reverse of what I originally thought; it is actually an exponential decay.

exponential

Exponential Inverse

This distribution is a mirror of the Exponential distribution.

exponentialinverted

A Flaw in the PRNG

There appears to be a flaw with the pseudo random number generator.  The first and last value of the range will only appear at half the rate that it should.  This should not affect data generation in most cases, but keep it mind if you are analysing test results.  It may lead you on a wild goose chase.

Well, there you have it, how the distribution property on the column will shape the data generation.  I hope this saves you some time.