2025年DeepSeek-R1Kimi 1.5及类强推理模型开发解读报告➢ In-context Reinforcement Learning with Algorithm Distillation https://arxiv.org/abs/2210.14215 76 拓展文献和资料 强推理 & DS-R1 ➢ https://blog.ml.cmu.edu/2025/01/08/optimizing-llm-test-time-compute-involv10 积分 | 76 页 | 8.39 MB | 7 月前3
共 1 条
- 1
