How to Plot a Histogram in Python using Matplotlib

You may use the following template to plot a histogram in Python using Matplotlib:

import matplotlib.pyplot as plt

x = [value1, value2, value3,....]
plt.hist(x, bins=number of bins)
plt.show()

Next, you’ll see the full steps to plot a histogram in Python using a simple example.

Steps to plot a histogram in Python using Matplotlib

Step 1: Install the Matplotlib package

If you haven’t already done so, install the Matplotlib package using the following command (under Windows):

pip install matplotlib

You may refer to the following guide for the instructions to install a package in Python.

Step 2: Collect the data for the histogram

For example, let’s say that you have the following data about the age of 100 individuals:

Age
1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
75, 77, 81, 83, 84, 87, 89, 90, 90, 91

Later you’ll see how to plot the histogram based on the above data.

Step 3: Determine the number of bins

Next, determine the number of bins to be used for the histogram.

For simplicity, set the number of bins to 10. At the end of this guide, you’ll see another way to derive the bins.

Step 4: Plot the histogram in Python using matplotlib

Finally, plot the histogram based on the following template:

import matplotlib.pyplot as plt

x = [value1, value2, value3,....]
plt.hist(x, bins=number of bins)
plt.show()

For our example:

import matplotlib.pyplot as plt

x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
     10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
     18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
     25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
     29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
     36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
     43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
     51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
     61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
     75, 77, 81, 83, 84, 87, 89, 90, 90, 91
     ]

plt.hist(x, bins=10)
plt.show()

Run the code, and you’ll get the histogram.

If needed, you can further style your histogram. One way to style your histogram is by adding this syntax towards the end of the code:

plt.style.use('ggplot')

For our example:

import matplotlib.pyplot as plt

x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
     10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
     18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
     25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
     29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
     36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
     43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
     51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
     61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
     75, 77, 81, 83, 84, 87, 89, 90, 90, 91
     ]

plt.style.use('ggplot')
plt.hist(x, bins=10)
plt.show()

Run the code, and you’ll get the styled histogram.

Optionally, you can derive the skew in Python using the scipy library as follows:

from scipy.stats import skew

x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
     10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
     18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
     25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
     29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
     36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
     43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
     51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
     61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
     75, 77, 81, 83, 84, 87, 89, 90, 90, 91
     ]

print(skew(x))

Once you run the code in Python, you’ll get the following skew:

0.4575278444409153

Additional way to determine the number of bins

Originally, you set the number of bins to 10 for simplicity.

Alternatively, you may derive the bins using the following formulas:

  • n = number of observations
  • Range = maximum value – minimum value
  • Number of intervals =  √n
  • Width of intervals =  Range / (Number of intervals)

These formulas can then be used to create the frequency table followed by the histogram.

Recall that our dataset contained the following 100 observations:

Age
1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
75, 77, 81, 83, 84, 87, 89, 90, 90, 91

Using our formulas:

  • n = number of observations = 100
  • Range = maximum value – minimum value = 91 – 1 = 90
  • Number of intervals =  √n = √100 = 10
  • Width of intervals =  Range / (Number of intervals) = 90/10 = 9

Based on this information, the frequency table would look like this:

Intervals (bins) Frequency
0-9 9
10-19 13
20-29 19
30-39 15
40-49 13
50-59 10
60-69 7
70-79 6
80-89 5
9099 3

Note that the starting point for the first interval is 0, which is very close to the minimum observation of 1 in our dataset. If, for example, the minimum observation was 20 in another dataset, then the starting point for the first interval should be 20, rather than 0.

For the bins in the Python code below, you’ll need to specify the values highlighted in blue, rather than a particular number (such as 10, which we used before). Don’t forget to include the last value of 99.

This is how the Python code would look like:

import matplotlib.pyplot as plt

x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
     10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
     18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
     25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
     29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
     36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
     43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
     51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
     61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
     75, 77, 81, 83, 84, 87, 89, 90, 90, 91
     ]

plt.hist(x, bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 99])
plt.show()

Run the code, and you’ll get the histogram.

You’ll notice that the histogram is similar to the one you saw earlier with a positive skew.