For a while now I’ve been categorizing my posts. At the time of design for this site I didn’t really have an idea of what I was
going
to do with the categories, but speculated I might build a tag cloud. The
problem was I didn’t really understand the fundamentals of a tag cloud:

Get all categories from posts

Count number of posts per category

Add weighting to category

Format text to distinguish weighting

Link to all posts within that category
On the surface it looks fairly simple. Query the database, do some mathematical
fiddling to get weights, create a List of links and stick a style attribute to modify
fontsize based on weighting.
Well it’s not quite that easy. The query was fairly straightforward, and creating
the list and stylizing it was pretty simple. Getting the weights on the other
hand was a pain. The problem was figuring out a fair weighting solution.
Standard Deviation and Statistical Analysis
Originally I figured I could get away with ordering the numbers, and then finding
the difference between each increment. Unfortunately some topics I discuss more
frequently than others. For instance, I talk about SQL a whole lot more than
I talk about Open Sourcestuff. A huge skewing problem would have arose.
So, going back to High School Statistics class (holy crap, I actually can use this
in real life! My teacher [,if he were dead,] would be rolling over in his grave)
I go back to one of the first things I learned: The Standard Deviation.
The standard deviation is a measure of the dispersion of a collection of numbers.
It is defined as the rootmeansquare (RMS) deviation of the values from their mean,
or as the square root of the variance. To calculate the standard deviation on
a dataset:

Find the mean, ,
of the values.

For each value x_{i} calculate its deviation ()
from the mean.

Calculate the squares of these deviations.

Find the mean of the squared deviations. This quantity is the variance s^{2}.

Take the square root of the variance.
In calculation form it looks like this:
The variable s signifies the standard deviation. Thanks to Wikipedia
for the pictures.
In C# the code would look like:
double StdDev(List<double> values, out double mean)
{
mean = Statistics.Mean(values);
double sumSquares = 0;
int count = 0;
foreach (double d in values)
{
double diff = (d  mean);
sumSquares += diff * diff;
count++;
}
return Math.Sqrt(sumSquares / count);
}
If they taught math in the form of programming (“this is how you do it in code…”),
Math might not have sucked! Yes, I do realize math is the precursor to programming,
but a boy can dream.
Back to the Cloud
Once the standard deviation is found, I can apply the weighting. Essentially
I started at a deviation of 0 and assigned it the default fontsize for this site.
As the deviation falls below 0 the font decreases, and as it increases above 0 it
increases. Fairly straightforward. Once I got the weighting, I applied
CSS, and built a web part accordingly.