python数据分析处理

Jianghu原创大约 2 分钟

以下是 Python 数据分析处理的基础教程，涵盖了 NumPy、Pandas、Matplotlib、SciPy 和 Seaborn 等库的基本用法。

1. NumPy 基础

NumPy 是 Python 中进行数值计算的核心库，提供了高效的多维数组对象。

1.1. 导入 NumPy

import numpy as np

1.2. 创建数组

# 创建一维数组
arr1 = np.array([1, 2, 3])

# 创建二维数组
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# 创建一个全零的数组
zeros = np.zeros((2, 3))

# 创建一个全一的数组
ones = np.ones((3, 3))

# 创建一个范围数组
range_arr = np.arange(0, 10, 2)

1.3. 数组运算

# 数组的加法运算
sum_arr = arr1 + arr1

# 数组的乘法运算
prod_arr = arr1 * 2

# 数组的点积
dot_product = np.dot(arr1, arr1)

2. Pandas 基础

Pandas 是用于数据处理和分析的强大库，主要数据结构为 Series 和 DataFrame。

2.1. 导入 Pandas

import pandas as pd

2.2. 创建 Series 和 DataFrame

# 创建 Series
series = pd.Series([1, 3, 5, np.nan, 6, 8])

# 创建 DataFrame
data = {'A': [1, 2, 3, 4],
        'B': [5, 6, 7, 8],
        'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)

2.3. 数据处理

# 查看前几行数据
df.head()

# 描述性统计
df.describe()

# 数据选择
df['A']  # 选择一列
df.iloc[0:2, 1:3]  # 按位置选择

# 数据清洗
df.dropna()  # 删除空值
df.fillna(value=5)  # 填充空值

# 数据分组
grouped = df.groupby('A').sum()

3. Matplotlib 和 Seaborn 基础

Matplotlib 是 Python 中最常用的绘图库，而 Seaborn 是一个基于 Matplotlib 的高级绘图库，适用于统计图表的绘制。

3.1. 导入 Matplotlib 和 Seaborn

import matplotlib.pyplot as plt
import seaborn as sns

3.2. 绘制基本图形

# 简单的折线图
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()

# 带标题的散点图
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.scatter(x, y)
plt.title("Scatter Plot")
plt.show()

3.3. Seaborn 可视化

# 数据准备
tips = sns.load_dataset("tips")

# 绘制柱状图
sns.barplot(x="day", y="total_bill", data=tips)
plt.show()

# 绘制热力图
corr = tips.corr()
sns.heatmap(corr, annot=True)
plt.show()

4. SciPy 基础

SciPy 是一个用于科学计算的库，包含许多有用的工具，如优化、积分、插值、傅里叶变换和信号处理等。

4.1. 导入 SciPy

from scipy import stats, optimize, integrate

4.2. 统计分析

# 正态分布的概率密度函数
x = np.linspace(-5, 5, 100)
pdf = stats.norm.pdf(x)
plt.plot(x, pdf)
plt.title("Normal Distribution PDF")
plt.show()

# t检验
t_statistic, p_value = stats.ttest_1samp(a=sample_data, popmean=0)

4.3. 优化与积分

# 优化示例
def f(x):
    return x**2 + 10*np.sin(x)

result = optimize.minimize(f, x0=0)
print(result.x)

# 积分示例
integral, error = integrate.quad(lambda x: np.exp(-x**2), 0, np.inf)
print(integral)