Using Python as SQL with Pandas DataFrame

2 min readNov 28, 2020

SQL is very easy to use, but what if you need to achieve the same operations in Python? Here are the corresponding operations in Python using Pandas DataFrame(DF). All examples are written as in SQL Server.

Below including DISTINCT, WHERE, LIKE, IN/NOT IN, ORDER BY, GROUP BY, CASE WHEN, ROW_NUMBER(), JOIN, UNION ALL/UNION, COUNT, DROP COLUMN, and RENAME COLUMN.

Distinct. This operation drops the duplicates in the specified columns in the DF.

SQL:    select distinct col1, col2 from df
Python: df_dedup = df.drop_duplicates(subset= [‘col1’, ‘col2’])

Where

SQL:    select * from df where col1=value1 and col2=value2
Python: df[ (df[‘col1’]==’value1’) & (df[‘col2’]=='value2’)]

Like. Python is case sensitive, SQL is not case sensitive.

SQL:    select * from df where col like '%abc%'
Python: df[df[‘col’].str.contains(‘ABC’, na=False, regex = False)]

In/Not in

SQL: select * from df where col in (value1, value2)
     select * from df where col not in (value1, value2)Python:
in - df[df[‘col’].isin([‘value1’,’value2’])]
not in - df[~df[‘col’].isin([‘value1’,’value2’])]

Order by

SQL:    select * from df order by col1, col2 desc
Python: df.sort_values(['col1’,’col2’],ascending=[True,False])

Group by

SQL:    select avg(*) from df group by col
Python: df.groupby(‘col’).mean().reset_index()

Case When.

SQL: Select 
     case when ConditionCol <= 550 then 'low'
          when ConditionCol>550 and ConditionCol<720 then 'medium'
          when ConditionCol>=720 then 'high' end as newCOl
     from dfPython:
df['newCol'] = df['Col']
df.loc[df['ConditionCol']<=550, 'newCol'] = 'low'
df.loc[df['ConditionCol'].between(550,720, inclusive = False), 'newCol'] = 'medium'
df.loc[df['ConditionCol']>=720, 'newCol'] = 'high'

Window function row_number()

SQL: select row_number() over(partition by partitionCol order by orderCol) as newCol from dfPython: 
df[‘newCol’] = df.groupby(‘partitionCol’)[‘orderCol’].rank(method = ‘first’)

Join. pd.merge has four types in the argument ‘how’: left, right, inner, outer. Use the arguments left_on, right_on when the common columns are named differently.

SQL:
select * from pd1 full join pd2 on pd1.id = pd2.idPython: 
df_join = pd.merge(df1, df2, how = ‘outer’, on = ‘Id’)

Union all/ Union

SQL: 
select * from df1 union all select * from df2
select * from df1 union select * from df2Python:
# union all
df_unionall = pd.concat([df1, df2]).reset_index(drop = True)
# union
df_union = pd.concat([df1, df2]).reset_index(drop = True).drop_duplicates()

Count. Python DF.value_counts will only return the selected column and the count of each value.

SQL     select col, count(*) from df group by col
Python: df[‘col’].value_counts().rename_axis(‘col’).reset_index(name = ‘counts’)

Drop column

SQL: alter table df
     drop column col1, col2Python:
ColtoDrop = [‘col1’,’col2’]
df.drop(columns=ColtoDrop)

Rename column

SQL: select col newName, col2, col3...from df
Python: df.rename(columns={‘col’:’newName’}) # rename returns a new df with the new columns name. Use parameter "inplace=True" to change the columns name inplace, this will return None.

Hope this helps and I will continue to add to this list.

Using Python as SQL with Pandas DataFrame

Written by Ruoxinli