A histogram is a way to understand the distribution of data. Assume you have student scores in the form of a list like so:
You can observe that the highest score is 100, the lowest score is 55 and there is a wide distribution of scores in between them. Some scores, like 80 and 91 appear twice, some scores like 77 appear four times, and so on. A histogram is a way to understand these patterns.
In this blogpost we will develop a way to compute and plot a histogram from this data.
The first step is to group data into intervals, buckets, or values. Here we will simply count the number of times each score appears.
Compute the histogram
Let us write a Python function count_occurrences() that takes a list such as above and outputs a dictionary where each key is a specific score and the value associated with that key is the number of times the given score appears.
Here is how such a function might work:
Note that the input is the list (“s”) and the output is the histogram variable, which is a dictionary. We use a for loop to iterate through each element of the list. For each element we see if it already appears as a key in the dictionary. If it does we increment the count. If not we initialize the count to 1.
If we apply it on the above scores list like so:
Printing the histogram
We can now pretty print the dictionary like so (after sorting on the keys):
The output is:
You can see clearly that the scores 77 and 82 appear the highest, namely 4 times each. You can see that the lowest score is 55, which appears once. The highest score is 100, which appears twice, and so on.
Plotting the histogram
Finally we can pretty print these values by using asterisks (stars, “*”) in place of the numbers so we can visually see the bins that are bigger and those that are smaller.
In the above code, we first print the bin value and then print a series of asterisks proportional to the count for that bin. We use “end=’’” to stay on the same line.
The output is:
You will notice that the last key, which is 100 (and has three digits), appears a little offset. To fix this problem we can use a formatted print statement when printing the bin value:
In the above code we use three characters to print the key and thus the keys are all right aligned. This leads to a more elegant output:
Kodeclik is an online coding academy for kids and teens to learn real world programming. Kids are introduced to coding in a fun and exciting way and are challeged to higher levels with engaging, high quality content.