In this guide, you’ll see how to get from SQL to Pandas DataFrame.
Here are the steps that you may follow.
Steps to get from SQL to Pandas DataFrame
Step 1: Create a database and table
For demonstration purposes, let’s create a database in Python using the sqlite3 package, where:
- The database name would be: test_database
- The database would contain a single table called: products
- The ‘products’ table would have 3 columns with the following information:
product_id | product_name | price |
1 | Computer | 800 |
2 | Printer | 200 |
3 | Tablet | 300 |
4 | Desk | 450 |
5 | Chair | 150 |
Here is the complete script to create the database and table in Python:
import sqlite3 conn = sqlite3.connect('test_database') c = conn.cursor() c.execute(''' CREATE TABLE IF NOT EXISTS products ([product_id] INTEGER PRIMARY KEY, [product_name] TEXT, [price] INTEGER) ''') c.execute(''' INSERT INTO products (product_id, product_name, price) VALUES (1,'Computer',800), (2,'Printer',200), (3,'Tablet',300), (4,'Desk',450), (5,'Chair',150) ''') conn.commit()
Once you run the above script in Python, a new database called ‘test_database‘ would be created at the same location where you saved your Python script.
Step 2: Get from SQL to Pandas DataFrame
Now you should be able to get from SQL to Pandas DataFrame using pd.read_sql_query:
import sqlite3
import pandas as pd
conn = sqlite3.connect('test_database')
sql_query = pd.read_sql_query ('''
SELECT
*
FROM products
''', conn)
df = pd.DataFrame(sql_query, columns = ['product_id', 'product_name', 'price'])
print (df)
When applying pd.read_sql_query, don’t forget to place the connection string variable at the end. In our case, the connection string variable is conn.
Once you run the script in Python, you’ll get the following DataFrame:
product_id product_name price
0 1 Computer 800
1 2 Printer 200
2 3 Tablet 300
3 4 Desk 450
4 5 Chair 150
Alternatively, you may use the approach below to get from SQL to a DataFrame:
import sqlite3 import pandas as pd conn = sqlite3.connect('test_database') c = conn.cursor() c.execute(''' SELECT * FROM products ''') df = pd.DataFrame(c.fetchall(), columns = ['product_id', 'product_name', 'price']) print (df)
You’ll now get the same DataFrame:
product_id product_name price
0 1 Computer 800
1 2 Printer 200
2 3 Tablet 300
3 4 Desk 450
4 5 Chair 150
Step 3 (optional): Find the maximum value using Pandas
After you got your DataFrame, you may apply different computations.
For example, you can find the maximum price by adding the following syntax:
max_price = df['price'].max() print (max_price)
Putting everything together:
import sqlite3 import pandas as pd conn = sqlite3.connect('test_database') c = conn.cursor() c.execute(''' SELECT * FROM products ''') df = pd.DataFrame(c.fetchall(), columns = ['product_id', 'product_name', 'price']) max_price = df['price'].max() print (max_price)
As you can see, the maximum price is 800:
800