追蹤
Michel Ma
Michel Ma
PhD candidate, University of Montreal, Mila
在 mila.quebec 的電子郵件地址已通過驗證
標題
引用次數
引用次數
年份
When do transformers shine in rl? decoupling memory from credit assignment
T Ni, M Ma, B Eysenbach, PL Bacon
Advances in Neural Information Processing Systems 36, 2024
82024
Long-term credit assignment via model-based temporal shortcuts
M Ma, P D'Oro, Y Bengio, PL Bacon
Deep RL Workshop NeurIPS 2021, 2021
52021
Counterfactual Policy Evaluation and the Conditional Monte Carlo Method
M Ma, B Pierre-Luc
Offline Reinforcement Learning Workshop, NeurIPS, 2020
12020
Do Transformer World Models Give Better Policy Gradients?
M Ma, T Ni, C Gehring, P D'Oro, PL Bacon
arXiv preprint arXiv:2402.05290, 2024
2024
Bridging State and History Representations: Understanding Self-Predictive RL
T Ni, B Eysenbach, E Seyedsalehi, M Ma, C Gehring, A Mahajan, ...
arXiv preprint arXiv:2401.08898, 2024
2024
A Differentiable Sequence Model Perspective on Policy Gradients
M Ma, P D'Oro, T Ni, C Gehring, PL Bacon
2023
Parsimonious reasoning in reinforcement learning for better credit assignment
M Ma
2022
系統目前無法執行作業,請稍後再試。
文章 1–7