To create a scatter diagram in Python using Matplotlib:
import matplotlib.pyplot as plt
x_axis = ["value_1", "value_2", "value_3", ...]
y_axis = ["value_1", "value_2", "value_3", ...]
plt.scatter(x_axis, y_axis)
plt.title("title name")
plt.xlabel("x_axis name")
plt.ylabel("y_axis name")
plt.show()
Steps to Create a Scatter Diagram in Python using Matplotlib
Step 1: Install the Matplotlib package
If you haven’t already done so, install Matplotlib using the following command:
pip install matplotlib
Step 2: Gather the data for the scatter diagram
Next, gather the data to be used for the scatter diagram.
For example, let’s say that you have the following dataset:
unemployment_rate | index_price |
6.1 | 1500 |
5.8 | 1520 |
5.7 | 1525 |
5.7 | 1523 |
5.8 | 1515 |
5.6 | 1540 |
5.5 | 1545 |
5.3 | 1560 |
5.2 | 1555 |
5.2 | 1565 |
The ultimate goal is to depict the relationship between the unemployment_rate and the index_price.
You can accomplish this goal using a scatter diagram.
Step 3: Create the scatter diagram in Python using Matplotlib
For this final step, you may use the template below in order to create a scatter diagram in Python:
import matplotlib.pyplot as plt
x_axis = ["value_1", "value_2", "value_3", ...]
y_axis = ["value_1", "value_2", "value_3", ...]
plt.scatter(x_axis, y_axis)
plt.title("title name")
plt.xlabel("x_axis name")
plt.ylabel("y_axis name")
plt.show()
For our example:
import matplotlib.pyplot as plt
unemployment_rate = [6.1, 5.8, 5.7, 5.7, 5.8, 5.6, 5.5, 5.3, 5.2, 5.2]
index_price = [1500, 1520, 1525, 1523, 1515, 1540, 1545, 1560, 1555, 1565]
plt.scatter(unemployment_rate, index_price, color="green")
plt.title("Unemployment Rate Vs Index Price", fontsize=14)
plt.xlabel("Unemployment Rate", fontsize=14)
plt.ylabel("Index Price", fontsize=14)
plt.grid(True)
plt.show()
Run the code in Python, and you’ll get the scatter diagram.
Optionally: Create the Scatter Diagram using Pandas DataFrame
So far, you have seen how to capture the dataset in Python using lists.
Optionally, you may capture the data using Pandas DataFrame. The result would be the same under both cases.
Here is the Python code using Pandas DataFrame:
import pandas as pd
import matplotlib.pyplot as plt
data = {
"unemployment_rate": [6.1, 5.8, 5.7, 5.7, 5.8, 5.6, 5.5, 5.3, 5.2, 5.2],
"index_price": [1500, 1520, 1525, 1523, 1515, 1540, 1545, 1560, 1555, 1565],
}
df = pd.DataFrame(data)
plt.scatter(df["unemployment_rate"], df["index_price"], color="green")
plt.title("Unemployment Rate Vs Index Price", fontsize=14)
plt.xlabel("Unemployment Rate", fontsize=14)
plt.ylabel("Index Price", fontsize=14)
plt.grid(True)
plt.show()
As before, you’ll get the same scatter diagram.