Research Assistant
DFKI Germany & RPTU Kaiserslautern-Landau
B.S. Computer Science, LUMS (2025)
I am a Computer Science graduate from LUMS specializing in model compression, NLP for low-resource languages, and LLM bias & jailbreak evaluations. Currently working as a Research Assistant for DFKI Germany in collaboration with RPTU in the domain of bioinformatics, exploring the use of LLMs and ML models for molecular property prediction.
My research interests include model compression, AI fairness & Interpretability, reasoning, jailbreaking LLMs, and developing AI solutions for speech processing and analytics.
For my final year thesis, I created PakBBQ, a bias benchmarking QA dataset with 17,180 Urdu/English examples across 8 sociocultural categories. I evaluated 6 multilingual LLMs, revealing cross-linguistic bias gaps and effective bias mitigation strategies. Thus work culminated in a paper accepted at EMNLP 2025 main conference (Core: A*).
November 2025
Paper accepted at EMNLP 2025 Main Conference (Core: A*)
June 2025
Started as Research Assistant at DFKI Germany
May 2025
Graduated from LUMS with B.S. in Computer Science
January 2025
Appointed as Teaching Assistant for CS5302 Foundations of Generative AI
January 2025
Appointed as Teaching Assistant for AI600 Machine Learning (MS in AI)
September 2024
Appointed as Teaching Assistant for CS535 Machine Learning
September 2023
Joined AI in Healthcare Initiative (AIHI) as a Research Assistant
September 2023
Joined Center for Speech and Language Technologies (CSaLT) as a Research Assistant
September 2023
Appointed as Teaching Assistant for CS100 Computational Problem Solving
EMNLP 2025 (Main Conference, Core: A*)
With the widespread adoption of Large Language Models (LLMs), it's crucial to ensure fairness across all user communities. Most LLMs are trained on Western-centric data, neglecting low-resource languages and regional contexts. To address this, we introduce PakBBQ — an extension of the Bias Benchmark for Question Answering (BBQ) dataset — featuring 214 templates and 17,180 QA pairs in English and Urdu across 8 bias dimensions: age, disability, appearance, gender, socio-economic status, religion, regional affiliation, and language formality.
Our experiments show: (i) a 12% accuracy gain with disambiguation, (ii) stronger counter-bias behaviors in Urdu, and (iii) framing effects reducing stereotypes in negatively posed questions.
Culturally adapted QA benchmark for multilingual LLM bias evaluation. 17,180 examples in Urdu/English across 8 bias categories.
View Project →
Confidence-aware knowledge distillation achieving balanced shape/texture bias. Improved OOD generalization on vision models.
View Project →
Evaluating LLMs in Sindhi, Pashto, and Urdu. Bias and jailbreak testing on lightweight transformers.
View Project →
AWS-based platform for stock analysis. S3, Lambda, ECS, Postgres with LLM-powered chatbot.
View Project →
End-to-end AI chatbot for EDA and data cleaning. Fine-tuned GPT-3.5, deployed on Hugging Face.
View Project →
2D platformer in Unity with 10+ levels. Hand-drawn pixel art, dynamic lighting, puzzle mechanics.
View Project →