# Example of K-Means Clustering in Python

K-Means Clustering is a concept that falls under Unsupervised Learning. This algorithm can be used to find groups within unlabeled data.

Topics to be covered:

• Creating a DataFrame for a two-dimensional dataset
• Finding the centroids of 3 clusters, and then of 4 clusters

## Example of K-Means Clustering in Python

To start, here is an example of a two-dimensional dataset:

`import pandas as pddata = {    "x": [25, 34, 22, 27, 33, 33, 31, 22, 35, 34, 67, 54, 57, 43, 50, 57, 59, 52, 65, 47, 49, 48, 35, 33, 44, 45, 38,          43, 51, 46],    "y": [79, 51, 53, 78, 59, 74, 73, 57, 69, 75, 51, 32, 40, 47, 53, 36, 35, 58, 59, 50, 25, 20, 14, 12, 20, 5, 29, 27,          8, 7]    }df = pd.DataFrame(data)print(df)`

Run the code in Python, and you’ll get the following DataFrame:

``````     x   y
0   25  79
1   34  51
2   22  53
3   27  78
4   33  59
5   33  74
6   31  73
7   22  57
8   35  69
9   34  75
10  67  51
11  54  32
12  57  40
13  43  47
14  50  53
15  57  36
16  59  35
17  52  58
18  65  59
19  47  50
20  49  25
21  48  20
22  35  14
23  33  12
24  44  20
25  45   5
26  38  29
27  43  27
28  51   8
29  46   7``````

### Find the Centroids of 3 Clusters

First, install the Matplotlib package. This package will be used to create the chart in Python.

`pip install matplotlib`

Then install the sklearn package. This package will be used to apply the K-Means Clustering in Python.

`pip install scikit-learn`

You can then specify the number of clusters. For example, assign 3 clusters as follows:

`KMeans(n_clusters=3)`

The complete code to find the centroids of 3 clusters:

`import pandas as pdimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansdata = {    "x": [25, 34, 22, 27, 33, 33, 31, 22, 35, 34, 67, 54, 57, 43, 50, 57, 59, 52, 65, 47, 49, 48, 35, 33, 44, 45, 38,          43, 51, 46],    "y": [79, 51, 53, 78, 59, 74, 73, 57, 69, 75, 51, 32, 40, 47, 53, 36, 35, 58, 59, 50, 25, 20, 14, 12, 20, 5, 29, 27,          8, 7]    }df = pd.DataFrame(data)kmeans = KMeans(n_clusters=3)kmeans.fit(df)centroids = kmeans.cluster_centers_print(centroids)plt.scatter(df["x"], df["y"], c=kmeans.labels_.astype(float), s=50, alpha=0.5)plt.scatter(centroids[:, 0], centroids[:, 1], c="red", s=50)plt.show()`

Run the code in Python, and you’ll see 3 clusters with 3 distinct centroids:

``````[[29.6  66.8]
[43.2  16.7]
[55.1  46.1]]``````

Note that the center of each cluster represents the mean of all the observations that belong to that cluster.

Additionally, the observations that belong to a given cluster are closer to the center of that cluster, in comparison to the centers of other clusters.

### Find the Centroids of 4 Clusters

In this case, change the n_clusters from 3 to 4:

`KMeans(n_clusters=4)`

The full Python code for 4 clusters:

`import pandas as pdimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansdata = {    "x": [25, 34, 22, 27, 33, 33, 31, 22, 35, 34, 67, 54, 57, 43, 50, 57, 59, 52, 65, 47, 49, 48, 35, 33, 44, 45, 38,          43, 51, 46],    "y": [79, 51, 53, 78, 59, 74, 73, 57, 69, 75, 51, 32, 40, 47, 53, 36, 35, 58, 59, 50, 25, 20, 14, 12, 20, 5, 29, 27,          8, 7]    }df = pd.DataFrame(data)kmeans = KMeans(n_clusters=4)kmeans.fit(df)centroids = kmeans.cluster_centers_print(centroids)plt.scatter(df["x"], df["y"], c=kmeans.labels_.astype(float), s=50, alpha=0.5)plt.scatter(centroids[:, 0], centroids[:, 1], c="red", s=50)plt.show()`

Run the code, and you’ll now see 4 clusters with 4 distinct centroids:

``````[[27.75       55.        ]
[43.2        16.7       ]
[55.1        46.1       ]
[30.83333333 74.66666667]]``````

That’s it. You can learn more about the application of K-Means Clusters in Python by visiting the sklearn documentation.