Research Fellow
Max Planck Institute for Software Systems (MPI-SWS)
Kaiserslautern, Germany
B.S. Computer Science, LUMS (2025)
I am an AI researcher and Computer Science graduate focused on the intersection of Machine Learning, Scientific AI, and Computational Biology. My research experience includes LLMs, speech technologies for low-resource languages, benchmark development, molecular property prediction, and generative models for drug discovery.
Currently, I am a Research Fellow at the Max Planck Institute for Software Systems (MPI-SWS) in Kaiserslautern, Germany, exploring Agentic Systems and Scientific AI. Previously, at DFKI, I worked on bioinformatics and state-space modeling techniques for molecular representation learning.
I am driven by the goal of leveraging AI to advance science, healthcare, and multilingual technologies. For my final year thesis, I created PakBBQ, a bias benchmarking QA dataset with 17,180 Urdu/English examples across 8 sociocultural categories. I evaluated 6 multilingual LLMs, revealing cross-linguistic bias gaps and effective bias mitigation strategies. This work culminated in a paper accepted at EMNLP 2025 main conference (Core: A*).
June 2026
Joined the Max Planck Institute for Software Systems (MPI-SWS) as a Research Fellow
November 2025
Paper accepted at EMNLP 2025 Main Conference (Core: A*)
June 2025
Started as Research Assistant at DFKI Germany
May 2025
Graduated from LUMS with B.S. in Computer Science
January 2025
Appointed as Teaching Assistant for CS5302 Foundations of Generative AI
January 2025
Appointed as Teaching Assistant for AI600 Machine Learning (MS in AI)
September 2024
Appointed as Teaching Assistant for CS535 Machine Learning
September 2023
Joined AI in Healthcare Initiative (AIHI) as a Research Assistant
September 2023
Joined Center for Speech and Language Technologies (CSaLT) as a Research Assistant
September 2023
Appointed as Teaching Assistant for CS100 Computational Problem Solving
EMNLP 2025 (Main Conference, Core: A*)
With the widespread adoption of Large Language Models (LLMs), it's crucial to ensure fairness across all user communities. Most LLMs are trained on Western-centric data, neglecting low-resource languages and regional contexts. To address this, we introduce PakBBQ — an extension of the Bias Benchmark for Question Answering (BBQ) dataset — featuring 214 templates and 17,180 QA pairs in English and Urdu across 8 bias dimensions: age, disability, appearance, gender, socio-economic status, religion, regional affiliation, and language formality.
Our experiments show: (i) a 12% accuracy gain with disambiguation, (ii) stronger counter-bias behaviors in Urdu, and (iii) framing effects reducing stereotypes in negatively posed questions.
Culturally adapted QA benchmark for multilingual LLM bias evaluation. 17,180 examples in Urdu/English across 8 bias categories.
View Project →
Confidence-aware knowledge distillation achieving balanced shape/texture bias. Improved OOD generalization on vision models.
View Project →
Evaluating LLMs in Sindhi, Pashto, and Urdu. Bias and jailbreak testing on lightweight transformers.
View Project →
AWS-based platform for stock analysis. S3, Lambda, ECS, Postgres with LLM-powered chatbot.
View Project →
End-to-end AI chatbot for EDA and data cleaning. Fine-tuned GPT-3.5, deployed on Hugging Face.
View Project →
2D platformer in Unity with 10+ levels. Hand-drawn pixel art, dynamic lighting, puzzle mechanics.
View Project →