Need to create a database in Python? If so, I’ll show you an example with the steps to create a database in Python using sqlite3.
But before we begin, here is a template that you can use to create a database in Python using sqlite3:
import sqlite3 sqlite3.connect('Type your DataBase name here.db')
Steps to Create a Database in Python using sqlite3
Step 1: Create the Database and Tables
In this step, you’ll see how to create:
- A new database called: TestDB
- 3 tables called: CLIENTS, COUNTRY, and DAILY_STATUS
Here are the fields to be added for each of the 3 tables:
Table Name | Field Name | Field Format |
CLIENTS | generated_id | Integer – Primary Key |
CLIENTS | Client_Name | Text |
CLIENTS | Country_ID | Integer |
CLIENTS | Date | Date |
COUNTRY | generated_id | Integer – Primary Key |
COUNTRY | Country_ID | Integer |
COUNTRY | Country_Name | Text |
DAILY_STATUS | Client_Name | Text |
DAILY_STATUS | Country_Name | Text |
DAILY_STATUS | Date | Date |
Below is the code that you can use in order to create the database and 3 tables using sqlite3. Additional comments are available within the code itself to provide further insights.
import sqlite3 conn = sqlite3.connect('TestDB.db') # You can create a new database by changing the name within the quotes c = conn.cursor() # The database will be saved in the location where your 'py' file is saved # Create table - CLIENTS c.execute('''CREATE TABLE CLIENTS ([generated_id] INTEGER PRIMARY KEY,[Client_Name] text, [Country_ID] integer, [Date] date)''') # Create table - COUNTRY c.execute('''CREATE TABLE COUNTRY ([generated_id] INTEGER PRIMARY KEY,[Country_ID] integer, [Country_Name] text)''') # Create table - DAILY_STATUS c.execute('''CREATE TABLE DAILY_STATUS ([Client_Name] text, [Country_Name] text, [Date] date)''') conn.commit() # Note that the syntax to create new tables should only be used once in the code (unless you dropped the table/s at the end of the code). # The [generated_id] column is used to set an auto-increment ID for each record # When creating a new table, you can add both the field names as well as the field formats (e.g., Text)
Once you run the above code in Python, a new file, called TestDB, will get created at the same location where you saved your Python script.
Step 2: Import the Data using Pandas
For this step, let’s assume that you have 2 CSV files that you’d like to import into Python:
(1) The first CSV file is called Client_14-JAN-2019 and has the following client data (on 14-Jan-2019):
Client_Name | Country_ID | Date |
Jon Smith | 1 | 14012019 |
Bill Martin | 2 | 14012019 |
Maria Blue | 3 | 14012019 |
Rita Yu | 4 | 14012019 |
Jack Mo | 5 | 14012019 |
(2) The second CSV file is called Country_14-JAN-2019 and has this country data (on 14-Jan-2019):
Country_ID | Country_Name |
1 | Japan |
2 | US |
3 | Canada |
4 | Brazil |
5 | UK |
6 | Spain |
7 | China |
8 | Italy |
9 | Peru |
10 | Russia |
The imported client data will be stored in the CLIENTS table. While the imported country data will be stored in the COUNTRY table. You’ll then be able to link the data from the CLIENTS table to the data from the COUNTRY table via the Country_ID field.
The ultimate goal is to display the daily status (of the client and country names for a given date) after connecting the CLIENTS and COUNTRY tables together.
To accomplish this goal, you’ll also need to install the pandas package. The pandas package is an essential component that will be used to:
- Import the CSV files using the read_csv command
- Assign the values imported from the CSV files into the tables using the to_sql command
- Assign the SQL fields into the DataFrame
- Export the final results into a CSV file using the to_csv command
The 2 files to be imported into Python are currently stored on my machine under the following paths:
C:\Users\Ron\Desktop\Client\Client_14-JAN-2019.csv
C:\Users\Ron\Desktop\Client\Country_14-JAN-2019.csv
You’ll need to adjust the Python code below to the paths where your CSV files are stored on your machine:
import sqlite3 import pandas as pd from pandas import DataFrame conn = sqlite3.connect('TestDB.db') c = conn.cursor() read_clients = pd.read_csv (r'C:\Users\Ron\Desktop\Client\Client_14-JAN-2019.csv') read_clients.to_sql('CLIENTS', conn, if_exists='append', index = False) # Insert the values from the csv file into the table 'CLIENTS' read_country = pd.read_csv (r'C:\Users\Ron\Desktop\Client\Country_14-JAN-2019.csv') read_country.to_sql('COUNTRY', conn, if_exists='replace', index = False) # Replace the values from the csv file into the table 'COUNTRY' # When reading the csv: # - Place 'r' before the path string to read any special characters, such as '\' # - Don't forget to put the file name at the end of the path + '.csv' # - Before running the code, make sure that the column names in the CSV files match with the column names in the tables created and in the query below # - If needed make sure that all the columns are in a TEXT format c.execute(''' INSERT INTO DAILY_STATUS (Client_Name,Country_Name,Date) SELECT DISTINCT clt.Client_Name, ctr.Country_Name, clt.Date FROM CLIENTS clt LEFT JOIN COUNTRY ctr ON clt.Country_ID = ctr.Country_ID ''') c.execute(''' SELECT DISTINCT * FROM DAILY_STATUS WHERE Date = (SELECT max(Date) FROM DAILY_STATUS) ''') #print(c.fetchall()) df = DataFrame(c.fetchall(), columns=['Client_Name','Country_Name','Date']) print (df) # To display the results after an insert query, you'll need to add this type of syntax above: 'c.execute(''' SELECT * from latest table ''') df.to_sql('DAILY_STATUS', conn, if_exists='append', index = False) # Insert the values from the INSERT QUERY into the table 'DAILY_STATUS' # export_csv = df.to_csv (r'C:\Users\Ron\Desktop\Client\export_list.csv', index = None, header=True) # Uncomment this syntax if you wish to export the results to CSV. Make sure to adjust the path name # Don't forget to add '.csv' at the end of the path (as well as r at the beg to address special characters)
When you run the code in Python, you’ll get the results that reflect the daily status on 14-Jan-2019:
In the code above, you’ll notice that we used an INSERT INTO statement to store the linked data (i.e., linked client and country data) into the DAILY_STATUS table.
We also used a WHERE condition to display the records that correspond to the maximum date. These results will constitute our daily status.
Step 3: Run the code for a Subsequent Date
Now imagine that you have 2 new CSV files for the subsequent date of 15-Jan-2019:
(1) The first CSV file is called Client_15-JAN-2019 and has the following client data (on 15-Jan-2019):
Client_Name | Country_ID | Date |
Ron Green | 3 | 15012019 |
Jeff Long | 8 | 15012019 |
Carrie Lan | 3 | 15012019 |
Marry Sig | 5 | 15012019 |
Ben Baker | 9 | 15012019 |
(2) The second CSV file is called Country_15-JAN-2019 and has this country data (on 15-Jan-2019):
Country_ID | Country_Name |
1 | Japan |
2 | US |
3 | Canada |
4 | Brazil |
5 | UK |
6 | Spain |
7 | China |
8 | Italy |
9 | Peru |
10 | Russia |
11 | Mexico |
12 | Germany |
Below is the code that you’ll need to run for the second date.
And as before, you’ll need to adjust the paths to the location where your CSV files are stored on your machine. Make sure that you are using the files for 15-JAN_2019:
import sqlite3 import pandas as pd from pandas import DataFrame conn = sqlite3.connect('TestDB.db') c = conn.cursor() read_clients = pd.read_csv (r'C:\Users\Ron\Desktop\Client\Client_15-JAN-2019.csv') read_clients.to_sql('CLIENTS', conn, if_exists='append', index = False) read_country = pd.read_csv (r'C:\Users\Ron\Desktop\Client\Country_15-JAN-2019.csv') read_country.to_sql('COUNTRY', conn, if_exists='replace', index = False) c.execute(''' INSERT INTO DAILY_STATUS (Client_Name,Country_Name,Date) SELECT DISTINCT clt.Client_Name, ctr.Country_Name, clt.Date FROM CLIENTS clt LEFT JOIN COUNTRY ctr ON clt.Country_ID = ctr.Country_ID ''') c.execute(''' SELECT DISTINCT * FROM DAILY_STATUS WHERE Date = (SELECT max(Date) FROM DAILY_STATUS) ''') df = DataFrame(c.fetchall(), columns=['Client_Name','Country_Name','Date']) print (df) df.to_sql('DAILY_STATUS', conn, if_exists='append', index = False) # export_csv = df.to_csv (r'C:\Users\Ron\Desktop\Client\export_list.csv', index = None, header=True)
Once you run the above code in Python, you’ll get the results that reflect the daily status on 15-Jan-2019:
Addressing Common Errors
After you applied the code to create the tables (see step 1), do not include the syntax to create the tables again. Otherwise, you’ll get the following error:
“OperationalError: table CLIENTS already exists.”
In that case, you may want to start over by creating a new database (e.g., by changing the database name from ‘TestDB.db’ to ‘TestDB1.db’) and then rerunning all the codes from the start.
Also make sure that no formatting is applied when copying the data into the CSV files. You may then save the data in the CSV files using a TEXT format, just in case you run into the following error:
“OperationalError: table CLIENTS has no column named Unnamed: 3.”
Additional Sources
You just saw how to create a database in Python using the sqlite3 package. You may want to check the following articles to learn more about connecting Python with different database applications: