python的数据清洗_python合并多个excel

激活谷笔记 • 2025-01-25 13:26 • 阅读 124

在Python中清洗Excel表格数据，通常使用`pandas`库，它提供了丰富的数据清洗功能。以下是一些基本的步骤和示例代码：

安装必要的库

确保你已经安装了`pandas`和`openpyxl`库。如果没有安装，可以使用以下命令安装：

 pip install pandas openpyxl

读取Excel数据

使用`pandas`的`read_excel`函数读取Excel文件。

 import pandas as pd df = pd.read_excel（'example.xlsx'） print（df.head（））

数据清洗

删除重复行

 df.drop_duplicates（inplace=True）

删除包含空值的行

 df.dropna（how='any', inplace=True）

填充空值

 df['age'].fillna（df['age'].mean（）, inplace=True）

清除字段中的空格

 df['name'] = df['name'].str.strip（）

重命名列

 df.rename（columns={'name': 'name_new'}, inplace=True）

删除某一列中的重复值

 df['name'].drop_duplicates（inplace=True）

处理特定字段

提取月薪

 import re def get_salary（salary）: if '-' in salary: low_salary = re.findall（r'-？\d*\.？\d+', salary） high_salary = re.findall（r'-？\d*\.？\d+', salary） low_salary = float（low_salary） / 12 * 10 high_salary = float（high_salary） / 12 * 10 elif '万' in salary and '年' in salary: low_salary = float（salary.replace（'万', ''）） / 12 * 10 high_salary = float（salary.replace（'万', ''）） / 12 * 10 elif '万' in salary and '月' in salary: low_salary = float（salary.replace（'万', ''）） * 10 high_salary = float（salary.replace（'万', ''）） * 10 else: 处理20万以上/年和100/天的情况 pass return f"{low_salary}-{high_salary}" df['salary_range'] = df['salary'].apply（get_salary）

保存清洗后的数据

清洗后的数据可以保存回Excel文件：

 df.to_excel（'cleaned_data.xlsx', index=False）

以上步骤和代码示例展示了如何使用`pandas`进行基本的Excel数据清洗。根据具体需求，你可能需要进一步定制数据清洗流程

python的数据清洗_python合并多个excel

相关推荐