突破K-means，与DTW时间序列聚类分析！！

理论基础

1. 动态时间规整（DTW）

DTW是一种用于比较两个时间序列相似性的算法。它允许时间轴上的非线性对齐，从而处理时间序列中的速度变化。

DTW公式

DTW计算的结果是两个序列的对齐路径以及最小的总对齐代价。

2. K-means聚类

K-means是一种基于划分的聚类算法，目标是将数据分成 K个簇，使每个簇内的数据点与簇中心的距离平方和最小。

K-means目标函数：

结合DTW后，簇内距离采用DTW度量，簇中心定义为簇内所有样本到该中心的DTW距离最小的序列。

完整案例

1. 虚拟数据集生成

生成三个具有不同特征的时间序列簇，每个簇包含50个序列。加入噪声和随机扰动模拟真实场景。

import numpy as np

import matplotlib.pyplot as plt



# 设置随机种子

np.random.seed(42)



# 生成时间序列簇

def generate_series(base_series, noise_level=0.2, count=50):

    series = []

    for _ in range(count):

        noise = np.random.normal(0, noise_level, len(base_series))

        series.append(base_series + noise)

    return np.array(series)



# 基础序列

t = np.linspace(0, 6 * np.pi, 100)

cluster1 = np.sin(t)

cluster2 = np.cos(t)

cluster3 = np.sin(t / 2)



# 生成三个簇

data_cluster1 = generate_series(cluster1)

data_cluster2 = generate_series(cluster2)

data_cluster3 = generate_series(cluster3)



# 合并数据

data = np.vstack([data_cluster1, data_cluster2, data_cluster3])



# 绘制原始数据

plt.figure(figsize=(10, 6))

for series in data_cluster1:

    plt.plot(series, alpha=0.5, color='red')

for series in data_cluster2:

    plt.plot(series, alpha=0.5, color='blue')

for series in data_cluster3:

    plt.plot(series, alpha=0.5, color='green')

plt.title('Original Time Series Data')

plt.xlabel('Time')

plt.ylabel('Value')

plt.show()

原始时间序列分布。红色、蓝色、绿色分别代表三个簇，各自具有不同的波动特征。

2. 动态时间规整（DTW）距离计算

使用fastdtw库实现DTW距离计算：

from fastdtw import fastdtw

from scipy.spatial.distance import euclidean



# 计算 DTW 距离矩阵

def compute_dtw_distance_matrix(data):

    n = len(data)

    distance_matrix = np.zeros((n, n))

    for i in range(n):

        for j in range(i + 1, n):

            # 确保每个时间序列是一维

            series_i = np.squeeze(data[i])  # 使用 np.squeeze() 移除多余的维度

            series_j = np.squeeze(data[j])



            # 确保数据是1维

            assert series_i.ndim == 1, f"Series {i} is not 1D!"

            assert series_j.ndim == 1, f"Series {j} is not 1D!"



            distance, _ = fastdtw(series_i, series_j, dist=euclidean)

            distance_matrix[i, j] = distance

            distance_matrix[j, i] = distance

    return distance_matrix

3. K-means聚类

改进的K-means将DTW作为距离度量。使用tslearn库中的TimeSeriesKMeans实现：

from tslearn.clustering import TimeSeriesKMeans

from tslearn.metrics import dtw



# 使用DTW的K-means聚类

n_clusters = 3

model = TimeSeriesKMeans(n_clusters=n_clusters, metric="dtw", verbose=True)

labels = model.fit_predict(data)



# 绘制聚类结果

plt.figure(figsize=(10, 6))

colors = ['red', 'blue', 'green']

for i in range(n_clusters):

    cluster_data = data[labels == i]

    for series in cluster_data:

        plt.plot(series, alpha=0.5, color=colors[i])

plt.title('Clustered Time Series')

plt.xlabel('Time')

plt.ylabel('Value')

plt.show()

聚类后的时间序列分布。不同颜色表示不同的簇，K-means根据DTW距离成功分离出特征相似的序列。

4. 簇中心与对齐路径可视化

提取每个簇的中心序列，并展示与其他序列的DTW对齐路径。

# 提取簇中心

centers = model.cluster_centers_



# 可视化簇中心

plt.figure(figsize=(10, 6))

for i, center in enumerate(centers):

    plt.plot(center.ravel(), label=f'Cluster {i+1} Center', linewidth=2)

plt.title('Cluster Centers')

plt.xlabel('Time')

plt.ylabel('Value')

plt.legend()

plt.show()



# 对齐路径可视化

from tslearn.metrics import dtw_path



plt.figure(figsize=(10, 6))

series_idx = 0  # 第一簇中的第一条序列

path, _ = dtw_path(data[series_idx], centers[labels[series_idx]])

x_path, y_path = zip(*path)

plt.plot(data[series_idx], label='Sample Series', color='blue')

plt.plot(centers[labels[series_idx]], label='Cluster Center', color='red')

plt.scatter(x_path, y_path, color='black', alpha=0.5, s=10, label='Alignment Path')

plt.title('DTW Alignment Path')

plt.legend()

plt.show()

簇中心的动态变化趋势，各中心代表簇内数据的主特征。

单个序列与簇中心的DTW对齐路径，展示了DTW对时间序列形状的弹性匹配能力。

优化与调参

1. 优化点

计算效率：DTW计算复杂度较高，可使用FastDTW或预先降维（如PCA）。
初始簇中心选择：改进初始化方法，采用K-means++提升收敛速度。
高维特征：对高维时间序列，可结合特征提取（如DWT或FFT）降低计算量。

2. 调参流程

设置簇数：通过手肘法或轮廓系数选择合适的簇数。
距离度量：对比DTW与欧几里得距离在不同数据集上的效果。
初始参数：采用多次运行随机初始化，选择代价最小的结果。

通过结合K-means与DTW算法实现时间序列聚类，成功分离了不同特征的序列簇，并直观展示了聚类效果及对齐路径。后面咱们可以尝试引入深度学习（如LSTM自编码器）进一步提升聚类效果，到时候再给大家分享。

文章转自微信公众号@深夜努力写Python

突破K-means，与DTW时间序列聚类分析！！

理论基础

1. 动态时间规整（DTW）

2. K-means聚类

完整案例

1. 虚拟数据集生成

2. 动态时间规整（DTW）距离计算

3. K-means聚类

4. 簇中心与对齐路径可视化

优化与调参

1. 优化点

2. 调参流程

突破LSTM，消费预测！！

突破LSTM！结合ARIMA时间序列预测！！

我们有何不同？

热门场景实测，选对API

#AI文本生成大模型API

#AI深度推理大模型API

突破K-means，与DTW时间序列聚类分析 ！！

理论基础

1. 动态时间规整（DTW）

2. K-means聚类

完整案例

1. 虚拟数据集生成

2. 动态时间规整（DTW）距离计算

3. K-means聚类

4. 簇中心与对齐路径可视化

优化与调参

1. 优化点

2. 调参流程

突破LSTM，消费预测 ！！

突破LSTM！结合ARIMA时间序列预测 ！！

我们有何不同？

热门场景实测，选对API

#AI文本生成大模型API

#AI深度推理大模型API

突破K-means，与DTW时间序列聚类分析！！

突破LSTM，消费预测！！

突破LSTM！结合ARIMA时间序列预测！！