SQL vs Pandas Operation Comparison (en/de/zh)

In data analysis and processing, Pandas and SQL each have their own strengths and ideal use cases.

SQL excels at efficiently querying, filtering, aggregating, and joining large datasets stored in databases,while Pandas offers flexible, Python-based tools for data cleaning, exploration, and visualization.

In real-world workflows, analysts often switch between the two — for example, using SQL to extract raw data, then Pandas to perform deeper analysis. Some people may feel more comfortable with SQL, others with Pandas, but learning to translate commands/functions between them helps bridge that gap. By comparing their syntax and logic, you can not only understand both tools more intuitively but also choose the most efficient one for each task.

Below is a list of the most common operations, showing their names in English (EN), German (DE), and Chinese (ZH), along with the corresponding SQL and Pandas commands/functions.

🧱 Basic Operations

English	Deutsch	中文	SQL	Pandas
Read Data	Daten lesen	读取数据	`SELECT * FROM table;`	`df = pd.read_csv('file.csv')`
View first rows	Erste Zeilen anzeigen	查看前几行	`SELECT * FROM table LIMIT 5;`	`df.head(5)`
View structure	Struktur anzeigen	查看表结构	`DESCRIBE table;`	`df.info()`
Summary statistics	Zusammenfassung	查看统计摘要	`SELECT AVG(col), MIN(col), MAX(col) FROM table;`	`df.describe()`

🎯 Selection and Filtering

English	Deutsch	中文	SQL	Pandas
Select columns	Spalten auswählen	选择列	`SELECT col1, col2 FROM table;`	`df[['col1', 'col2']]`
Filter rows	Zeilen filtern	条件过滤	`SELECT * FROM table WHERE col > 10;`	`df[df['col'] > 10]`
Multiple conditions	Mehrere Bedingungen	多条件过滤	`WHERE col1 > 10 AND col2 = 'A'`	`df[(df['col1'] > 10) & (df['col2'] == 'A')]`
Sort rows	Sortieren	排序	`ORDER BY col DESC`	`df.sort_values('col', ascending=False)`
Remove duplicates	Duplikate entfernen	去重	`SELECT DISTINCT col FROM table;`	`df['col'].drop_duplicates()`

🔄 Aggregation and Grouping

English	Deutsch	中文	SQL	Pandas
Group and aggregate	Gruppieren und Aggregieren	分组聚合	`SELECT col, AVG(val) FROM table GROUP BY col;`	`df.groupby('col')['val'].mean()`
Multiple aggregations	Mehrere Aggregationen	多重聚合	`SELECT col, COUNT(*), SUM(val) FROM table GROUP BY col;`	`df.groupby('col').agg({'val': ['count', 'sum']})`
Count rows	Zeilen zählen	计数	`SELECT COUNT(*) FROM table;`	`len(df)` or `df.shape[0]`

🔗 Joins

English	Deutsch	中文	SQL	Pandas
Inner join	Inner Join	内连接	`SELECT * FROM A INNER JOIN B ON A.id = B.id;`	`pd.merge(A, B, on='id', how='inner')`
Left join	Left Join	左连接	`LEFT JOIN`	`pd.merge(A, B, on='id', how='left')`
Right join	Right Join	右连接	`RIGHT JOIN`	`pd.merge(A, B, on='id', how='right')`
Full outer join	Vollständiger Join	全连接	`FULL OUTER JOIN`	`pd.merge(A, B, on='id', how='outer')`

🧩 Window Functions

English	Deutsch	中文	SQL	Pandas
Ranking	Rangfolge	排名	`RANK() OVER (PARTITION BY col ORDER BY val)`	`df['rank'] = df.groupby('col')['val'].rank()`
Rolling average	Gleitender Durchschnitt	滚动平均	`AVG(val) OVER (ORDER BY date ROWS 3 PRECEDING)`	`df['rolling_avg'] = df['val'].rolling(3).mean()`

🧮 Data Modification

English	Deutsch	中文	SQL	Pandas
Add column	Spalte hinzufügen	新增列	`ALTER TABLE ADD col_new ...`	`df['col_new'] = df['col1'] + df['col2']`
Update values	Werte aktualisieren	更新列	`UPDATE table SET col = ... WHERE ...`	`df.loc[df['col1']>0, 'col2'] = 'Yes'`
Delete column	Spalte löschen	删除列	`ALTER TABLE DROP COLUMN col;`	`df.drop('col', axis=1, inplace=True)`
Delete rows	Zeilen löschen	删除行	`DELETE FROM table WHERE col < 0;`	`df = df[df['col'] >= 0]`

📦 Merging and Stacking

English	Deutsch	中文	SQL	Pandas
Union (distinct)	Vereinigung ohne Duplikate	UNION（去重）	`SELECT * FROM A UNION SELECT * FROM B;`	`pd.concat([A, B]).drop_duplicates()`
Union all	Vereinigung mit Duplikaten	UNION ALL（保留重复）	`SELECT * FROM A UNION ALL SELECT * FROM B;`	`pd.concat([A, B], ignore_index=True)`

🧹 Miscellaneous Operations

English	Deutsch	中文	SQL	Pandas
Rename column	Spalte umbenennen	重命名列	`SELECT col AS new_name FROM table;`	`df.rename(columns={'col': 'new_name'})`
Subquery	Unterabfrage	子查询	`SELECT * FROM (SELECT ...) AS sub;`	Use intermediate variable: `sub = df[df['x']>0]`
Conditional column	Bedingte Spalte	条件列	`CASE WHEN col>0 THEN 'pos' ELSE 'neg' END`	`np.where(df['col']>0, 'pos', 'neg')`
Null handling	Null-Werte behandeln	空值处理	`IS NULL / IS NOT NULL`	`df.isna()` / `df.notna()`
Fill nulls	Null-Werte auffüllen	填充空值	`COALESCE(col, 0)`	`df['col'].fillna(0)`

⚠️ Note

Most SQL and Pandas operations above correspond closely, but a few points to be aware of:

df.describe() returns additional statistics beyond SQL’s AVG, MIN, and MAX. For exact correspondence, use individual aggregation functions.
Pandas rolling() requires sorting (e.g., by date) to fully match SQL window functions.
df.groupby().rank() may use a different ranking method than SQL’s RANK(). Use method='min' or method='dense' to align results if needed.

✅ Tip:

If you often switch between SQL and Pandas, try libraries like pandasql, duckdb, or polars to query DataFrames using SQL syntax :)