Pages
Search the website
Jobs
Companies
Events
PhD candidate in Trustworthy and Controllable Large Language Models
Posted onLarge Language Models are rapidly becoming central tools in search, assistance, creative work, and decision-support. Yet even strong models continue to hallucinate, encode hidden biases, or behave inconsistently across tasks and languages. To build systems that are genuinely safe and reliable, we must better understand how LLMs organise information internally, when they succeed or fail, and how their behaviour can be guided.
This PhD project asks a broad but fundamental question: How can we analyse and influence the internal behaviour of LLMs to make them more truthful, safe, and robust?
- A Masterβs degree (completed or near completion) in Computer Science, Artificial Intelligence, Data Science, or another relevant field.
- A solid foundation in machine learning and interest in working with large language models.
- Strong programming skills, preferably in Python, and familiarity with modern deep learning tools.
- Good analytical and problem-solving abilities, with curiosity about topics such as reliability, safety, reasoning, or interpretability in LLMs.
- Very good written and spoken English, as required for scientific communication.
- Motivation to publish in leading NLP/ML venues (e.g., ACL, EMNLP, NAACL, TACL, NeurIPS, ICML, ICLR).
- Previous publications in top-tier NLP or ML venues are a significant advantage and will strengthen the application considerably.
- Ability to work both independently and collaboratively in an international research environment.
The PhD will focus on one of the following directions, or ideally, on their intersection:
- Understanding internal model representations: exploring how LLMs structure knowledge, how task- or language-specific behaviour emerges, andβwhere relevantβwhat internal signals reveal about a modelβs likelihood to reason correctly.
- Detecting and diagnosing failure modes: developing techniques to identify hallucinations, instability, bias, or unreliable reasoning early, before the model produces problematic outputs.
- Steering and controlling model behaviour: designing intervention methods (e.g., activation-level steering, representation editing, lightweight safety techniques) that make models more aligned and reliable while preserving their capabilities.
The aim is to produce high-impact research suitable for top-tier NLP and ML venues (e.g., ACL, EMNLP, NAACL, TACL, NeurIPS, ICML, ICLR) and to contribute fundamental insights into building safer, more trustworthy LLMs.
Employed PhD candidates are expected to spend 10% of their working hours on teaching and/or supervising candidates.
Within the Faculty of Science and Engineering, a 4-year PhD position is available to start as soon as possible at the Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence with the topic of Trustworthy & Controllable Large Language Models. The successful candidate will become a member of the Machine Learning Group, Artificial Intelligence Department and will work under the supervision of Dr. Yftah Ziser.