fix: 全面修复代码质量和报告准确性问题
代码修复 (16 个模块): - GARCH 模型统一改用 t 分布 + 收敛检查 (returns/volatility/anomaly) - KS 检验替换为 Lilliefors 检验 (returns) - 修复数据泄漏: StratifiedKFold→TimeSeriesSplit, scaler 逐折 fit (anomaly) - 前兆标签 shift(-1) 预测次日异常 (anomaly) - PSD 归一化加入采样频率和单边谱×2 (fft) - AR(1) 红噪声基线经验缩放 (fft) - 盒计数法独立 x/y 归一化, MF-DFA q=0 (fractal) - ADF 平稳性检验 + 移除双重 Bonferroni (causality) - R/S Hurst 添加 R² 拟合优度 (hurst) - Prophet 递推预测避免信息泄露 (time_series) - IC 计算过滤零信号, 中性形态 hit_rate=NaN (indicators/patterns) - 聚类阈值自适应化 (clustering) - 日历效应前后半段稳健性检查 (calendar) - 证据评分标准文本与代码对齐 (visualization) - 核心管道 NaN/空值防护 (data_loader/preprocessing/main) 报告修复 (docs/REPORT.md, 15 处): - 标度指数 H_scaling 与 Hurst 指数消歧 - GBM 6 个月概率锥数值重算 - CLT 限定、减半措辞弱化、情景概率逻辑修正 - GPD 形状参数解读修正、异常 AUC 证据降级 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -250,24 +250,34 @@ def _interpret_clusters(df_clean: pd.DataFrame, labels: np.ndarray,
|
||||
print(f"{method_name} 聚类特征均值")
|
||||
print("=" * 60)
|
||||
|
||||
# 自动标注状态
|
||||
# 自动标注状态(基于数据分布的自适应阈值)
|
||||
state_labels = {}
|
||||
|
||||
# 计算自适应阈值:基于聚类均值的标准差
|
||||
lr_values = cluster_means["log_return"]
|
||||
abs_r_values = cluster_means["abs_return"]
|
||||
lr_std = lr_values.std() if len(lr_values) > 1 else 0.02
|
||||
abs_r_std = abs_r_values.std() if len(abs_r_values) > 1 else 0.02
|
||||
high_lr_threshold = max(0.005, lr_std) # 至少 0.5% 作为下限
|
||||
high_abs_threshold = max(0.005, abs_r_std)
|
||||
mild_lr_threshold = max(0.002, high_lr_threshold * 0.25)
|
||||
|
||||
for cid in cluster_means.index:
|
||||
row = cluster_means.loc[cid]
|
||||
lr = row["log_return"]
|
||||
vol = row["vol_7d"]
|
||||
abs_r = row["abs_return"]
|
||||
|
||||
# 基于收益率和波动率的规则判断
|
||||
if lr > 0.02 and abs_r > 0.02:
|
||||
# 基于自适应阈值的规则判断
|
||||
if lr > high_lr_threshold and abs_r > high_abs_threshold:
|
||||
label = "surge"
|
||||
elif lr < -0.02 and abs_r > 0.02:
|
||||
elif lr < -high_lr_threshold and abs_r > high_abs_threshold:
|
||||
label = "crash"
|
||||
elif lr > 0.005:
|
||||
elif lr > mild_lr_threshold:
|
||||
label = "mild_up"
|
||||
elif lr < -0.005:
|
||||
elif lr < -mild_lr_threshold:
|
||||
label = "mild_down"
|
||||
elif abs_r > 0.015 or vol > cluster_means["vol_7d"].median() * 1.5:
|
||||
elif abs_r > high_abs_threshold * 0.75 or vol > cluster_means["vol_7d"].median() * 1.5:
|
||||
label = "high_vol"
|
||||
else:
|
||||
label = "sideways"
|
||||
|
||||
Reference in New Issue
Block a user