# Example of K-Means Clustering in Python

K-Means Clustering is a concept that falls under Unsupervised Learning. This algorithm can be used to find groups within unlabeled data. To demonstrate this concept, I’ll review a simple example of K-Means Clustering in Python.

Topics to be covered:

• Creating the DataFrame for two-dimensional dataset
• Finding the centroids for 3 clusters, and then for 4 clusters
• Adding a graphical user interface (GUI) to display the results

By the end of this tutorial, you’ll be able to create the following GUI in Python: ## Example of K-Means Clustering in Python

To start, let’s review a simple example with the following two-dimensional dataset: You can then capture this data in Python using pandas DataFrame:

```from pandas import DataFrame

Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}

df = DataFrame(Data,columns=['x','y'])
print (df)
```

If you run the code in Python, you’ll get this output, which matches with our dataset: Next you’ll see how to use sklearn to find the centroids for 3 clusters, and then for 4 clusters.

### K-Means Clustering in Python – 3 clusters

Once you created the DataFrame based on the above data, you’ll need to import 2 additional Python modules:

In the code below, you can specify the number of clusters. For this example, assign 3 clusters as follows:

KMeans(n_clusters=3).fit(df)

```from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}

df = DataFrame(Data,columns=['x','y'])

kmeans = KMeans(n_clusters=3).fit(df)
centroids = kmeans.cluster_centers_
print(centroids)

plt.scatter(df['x'], df['y'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
plt.show()
```

Run the code in Python, and you’ll see 3 clusters with 3 distinct centroids: Note that the center of each cluster (in red) represents the mean of all the observations that belong to that cluster.

As you may also see, the observations that belong to a given cluster are closer to the center of that cluster, in comparison to the centers of other clusters.

### K-Means Clustering in Python – 4 clusters

Let’s now see what would happen if you use 4 clusters instead. In that case, the only thing that you’ll need to do is to change the n_clusters from 3 to 4:

KMeans(n_clusters=4).fit(df)

And so, your full Python code for 4 clusters would look like this:

```from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}

df = DataFrame(Data,columns=['x','y'])

kmeans = KMeans(n_clusters=4).fit(df)
centroids = kmeans.cluster_centers_
print(centroids)

plt.scatter(df['x'], df['y'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
plt.show()
```

Run the code, and you’ll now see 4 clusters with 4 distinct centroids: ### Tkinter GUI to Display the Results

You can use the tkinter module in Python to display the clusters on a simple graphical user interface.

This is the code that you can use (for 3 clusters):

```from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import tkinter as tk
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg

Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}

df = DataFrame(Data,columns=['x','y'])

kmeans = KMeans(n_clusters=3).fit(df)
centroids = kmeans.cluster_centers_

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 100, height = 100)
canvas1.pack()

label1 = tk.Label(root, text=centroids, justify = 'center')
canvas1.create_window(70, 50, window=label1)

figure1 = plt.Figure(figsize=(5,4), dpi=100)
ax1.scatter(df['x'], df['y'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
ax1.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
scatter1 = FigureCanvasTkAgg(figure1, root)
scatter1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH)

root.mainloop()
```

And this is what you’ll get when running the code in Python: In the final section of this tutorial, I’ll share the code to create a more advanced tkinter GUI that will allow you to:

• Import an Excel file with two-dimensional dataset
• Type the number of clusters needed
• Display the clusters and centroids

Here is the full Python code:

```import tkinter as tk
from tkinter import filedialog
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 400, height = 300,  relief = 'raised')
canvas1.pack()

label1 = tk.Label(root, text='k-Means Clustering')
label1.config(font=('helvetica', 14))
canvas1.create_window(200, 25, window=label1)

label2 = tk.Label(root, text='Type Number of Clusters:')
label2.config(font=('helvetica', 8))
canvas1.create_window(200, 120, window=label2)

entry1 = tk.Entry (root)
canvas1.create_window(200, 140, window=entry1)

def getExcel ():

global df

browseButtonExcel = tk.Button(text=" Import Excel File ", command=getExcel, bg='green', fg='white', font=('helvetica', 10, 'bold'))
canvas1.create_window(200, 70, window=browseButtonExcel)

def getKMeans ():
global df
global numberOfClusters
numberOfClusters = int(entry1.get())

kmeans = KMeans(n_clusters=numberOfClusters).fit(df)
centroids = kmeans.cluster_centers_

label3 = tk.Label(root, text= centroids)
canvas1.create_window(200, 250, window=label3)

figure1 = plt.Figure(figsize=(4,3), dpi=100)
ax1.scatter(df['x'], df['y'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
ax1.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
scatter1 = FigureCanvasTkAgg(figure1, root)
scatter1.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)

processButton = tk.Button(text=' Process k-Means ', command=getKMeans, bg='brown', fg='white', font=('helvetica', 10, 'bold'))
canvas1.create_window(200, 170, window=processButton)

root.mainloop()
```

Before you run the above code, you’ll need to store your two-dimensional dataset in an Excel file. For example, you may copy the dateset below into an Excel file:

 x y 25 79 34 51 22 53 27 78 33 59 33 74 31 73 22 57 35 69 34 75 67 51 54 32 57 40 43 47 50 53 57 36 59 35 52 58 65 59 47 50 49 25 48 20 35 14 33 12 44 20 45 5 38 29 43 27 51 8 46 7

This is how the data would look like once copied into Excel: Next, run the Python code, and you’ll see the following GUI: Press on the green button to import your Excel file (a dialogue box would open up to assist you in locating and then importing your Excel file).

Once you imported the Excel file, type the number of clusters in the entry box, and then click on the red button to process the k-Means. For instance, I typed 3 within the entry box: And this is the result that I got: That’s it. You can learn more about the application of K-Means Clusters in Python by visiting the sklearn documentation.