논문

Towards Interpretable Reasoning in Large Language Models.
G. Hong, C. Kim, Y. Park.
ACL 2025.
Robust Fine-tuning via Gradient Projection for NLP.
G. Hong, M. Lee, C. Kim.
NeurIPS 2024.
A Survey on Mechanistic Interpretability.
G. Hong, C. Kim.
TMLR 2024.
Understanding Spurious Correlations in Text Classification.
G. Hong, J. Choi, C. Kim.
EMNLP 2023.
On the Robustness of Pre-trained Language Models under Distribution Shift.
G. Hong, C. Kim.
NAACL 2022.