Skip to main content

Command Palette

Search for a command to run...

Scatter Plot Basics: Everything You Need to Know

Understand how scatter plots help in showing two continuous variables.

Published
3 min read
Scatter Plot Basics: Everything You Need to Know
V

I am a software engineer with over two years of professional experience. I specialize as a backend engineer but also work in full-stack capabilities. I use JavaScript/TypeScript, Python, and PHP to solve real-world problems every day.

Overview

Data visualization enables us to gain better understanding of our data. It transforms large and complex datasets into visual formats that are easier to understand and interpret, even by non-technical people. With data visualization, we can identify trends and patterns a lot faster, that were not obvious in the raw data, or after performing some statistical calculations.

What is a Scatter Plot?

A scatter plot provides a way to visualize the relationship between two continuous variables.

Customizing your scatter plot

💡
A continuous variable is a type of variable that can take on an infinite number of values within a given range. These values are not restricted to whole numbers and can include fractions and numbers.

Continuous variables are often measured rather than counted and can represent quantities such as height, weight, temperature, and time. They are essential in statistical analysis and data visualization for representing data that can vary smoothly and continuously.

Where are Scatter Plots Used?

Scatter plots are particularly useful for:

Identifying Correlations

A correlation a statistical measure that describes the extent to which two variables are related to each other.

💡
It indicates whether an increase or decrease in one variable corresponds to an increase or decrease in another variable.

In data visualization, correlation asks the question:

How close are the points to forming a straight line?

Correlations can be:

  1. Positive Correlation: Both variables move in the same direction. As one variable increases, the other also increases.

    💡
    There is a direct variation between the two variables.
  2. Negative Correlation: Both variables move in opposite directions. As one variable increases, the other decreases.

    💡
    There is an inverse variation between the two variables.

Correlation is often quantified using a correlation coefficient, which ranges from -1 to 1. A coefficient close to 1 indicates a strong positive correlation, close to -1 indicates a strong negative correlation, and around 0 indicates no correlation.

💡
In Statistics, you can use either the Spearman or Pearson formula to calculate the correlation coefficient. Feel free to explore them here.

Detecting Outliers

Scatter plots can help identify outliers, which are data points that deviate significantly from the overall pattern of the data. Outliers can indicate errors in data collection, unusual conditions, or interesting phenomena that warrant further investigation.

Understanding Distribution

Scatter plots can provide insights into the distribution of data points. By examining the spread and clustering of points, you can infer the variability and central tendency of the data.

💡
Variability simply implies that the spread of data points on a scatter plot can indicate how much the data varies, while Central Tendency (CT) implies that the clustering of data points around a particular area can give an idea of what the mean or median values of the variables are.

Conclusion

In the last article, I mentioned that a box plot is particularly useful when you need to compare the distribution of multiple data sets side by side. The question I have for you is:

How would you use scatter plots and box plots differently to analyze the relationship between study hours and exam scores across multiple classes?

Thanks for taking the time to read this. I hope it has influenced you in some way.

ByteBite Wisdom

Part 1 of 6

This series breaks the mold of our other series with articles under 600 words. These concise pieces offer bitwise information aimed at creating awareness or presenting alternative approaches to common tasks.

Up next

Where Does Box Plot Shines?

While histograms are great, they are inefficient when comparing multiple data sets. Substitute histograms for box plots and new patterns emerge.