Abstract
Transformer-based language models such as BERT and its variants have found widespread use in natural language processing. A common way of using these models is to fine-tune them to improve their performance on a specific task. However, it is currently unclear how the fine-tuning process affects the underlying structure of the word embeddings from these models. In this talk, I will discuss recent efforts in exploring the topology of these fine-tuned word embeddings.