Skip to main content

Command Palette

Search for a command to run...

Key Terms to Understand When Plotting a Histogram

What type of graph is in the cover picture above😉?

Updated
•3 min read
Key Terms to Understand When Plotting a Histogram
V

I am a software engineer with over two years of professional experience. I specialize as a backend engineer but also work in full-stack capabilities. I use JavaScript/TypeScript, Python, and PHP to solve real-world problems every day.

Histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson.

To construct a histogram, the data is divided into intervals called BINS and the FREQUENCY of DATA POINTS within each bin is depicted by the height of the bar.

💡
Histograms are useful for visualizing the shape, central tendency, and variability of data, making them a fundamental tool in statistics and data analysis.

Here are the key terms to understand when plotting a histogram (some of which I highlighted in uppercase above):

Bins

A bin is an interval into which data points are grouped. Each bin represents a range of values, and the height of the bin corresponding to each bin indicates the frequency of data points that fall within that range.

Bins help in organizing and visualizing the distribution of data in a histogram.

It is important to note that the range of values that form a bin (known as bin width) strongly influences the appearance of a histogram. This means that if the bin width is too small, there won't be sustainable insights from the dataset visualization because there will be to many noises. If the bin width is too large, it is also challenging to get much insight, because there will scarce details.

It is important to note that the range of values that form a bin (known as bin width) significantly influences the appearance of a histogram. If the bin width is too small, the visualization may be cluttered with too much noise, making it difficult to derive meaningful insights. Conversely, if the bin width is too large, the histogram may lack detail, also hindering the extraction of useful information.

💡
Determining the optimal bin width often requires some experimentation.

Modality

Modality refers to the number of peaks in a histogram and is the first aspect to consider when interpreting it.

  1. A distribution with one peak is called Unimodal.

  2. A distribution with two peaks is called Bimodal.

  3. A distribution with three peaks is called Trimodal.

And so forth...

Skewness

Skewness refers to the measure of asymmetry in the distribution of data points in a histogram. It indicates whether the data points are more concentrated on one side of the mean than the other.

Skewness is divided into two types:

  1. Positive skewness (right-skewed) - This occurs when the tail on the right side of the distribution is longer or fatter than the left side.

  2. Negative skewness (left-skewed) - This occurs when the tail of the left side of the distribution is longer or fatter than the right side.

Skewness helps in understanding the direction and extent of deviation from a normal distribution. It should be the second aspect to consider when interpreting a histogram.

Conclusion

I hope that this article has helped enlightened you on some key terms to watch out for when plotting and interpreting histograms. In closing, there is also something called Kurtosis, which refers to the number of outliers in the given distribution visualized by the histogram. Types of this include Leptokurtosis, Mesokurtosis, and Platykurtosis. You can learn more about Kurtosis here.

ByteBite Wisdom

Part 3 of 6

This series breaks the mold of our other series with articles under 600 words. These concise pieces offer bitwise information aimed at creating awareness or presenting alternative approaches to common tasks.

Up next

Practical Problem Solving

Slow Down First to Solve Problems Efficiently