追蹤
Kaivalya Hariharan
Kaivalya Hariharan
MIT CSAIL
在 mit.edu 的電子郵件地址已通過驗證
標題
引用次數
引用次數
年份
Red teaming deep neural networks with feature synthesis tools
S Casper, T Bu, Y Li, J Li, K Zhang, K Hariharan, D Hadfield-Menell
Advances in Neural Information Processing Systems 36, 80470-80516, 2023
102023
Diagnostics for deep neural networks with automated copy/paste attacks
S Casper, K Hariharan, D Hadfield-Menell
arXiv preprint arXiv:2211.10024, 2022
92022
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
TT Wang, M Wang, K Hariharan, N Shavit
arXiv preprint arXiv:2312.08793, 2023
2023
系統目前無法執行作業,請稍後再試。
文章 1–3