Omar Mahmoud

About

Who I Am

5+ Years of Experience

PhD Candidate, Trustworthy AI

12+ Publications

500+ Users Served

I'm a Senior AI Engineer and Applied Scientist specialising in taking ideas from experimentation through to real-world deployment — fine-tuning LLMs, designing evaluation frameworks, and building reliable AI systems that work at scale.

My research focuses on AI safety, privacy, and model alignment — specifically LLM memorisation, dememorisation, hallucination mitigation, and multilingual behaviour dynamics — with publications at NAACL, EMNLP, EACL, and LREC.

Currently completing my PhD at Deakin University's Applied AI Institute (A2I2) while taking on freelance AI engineering work.

Expertise

Technical Skills

LLMs & GenAI

Fine-tuning (PEFT, LoRA, QLoRA) RAG System Design Agentic AI Workflows LLM Evaluation (RAGAS, TruLens) Prompt Engineering

Trustworthy AI

AI Alignment Privacy-Preserving ML LLM Dememorisation Hallucination Mitigation Adversarial Robustness Harmful Content Detection

NLP & Speech

Text Classification NER Semantic Search Summarisation Multilingual NLP ASR (Whisper, wav2vec2)

ML & Deep Learning

PyTorch TensorFlow HuggingFace Transformers Scikit-learn Computer Vision Experiment Design

Data & Retrieval

FAISS Pinecone ChromaDB Elasticsearch MLflow Weights & Biases

Engineering & Cloud

Python FastAPI Docker AWS (S3, SageMaker, Lambda) CI/CD REST APIs SQL

Career

Experience

Designed and shipped a production RAG system (LangChain, LlamaIndex) used by 500+ students and researchers, combining hybrid retrieval with re-ranking and achieving a 20% accuracy improvement over baseline.
Fine-tuned open-source LLMs using PEFT/LoRA on domain corpora, reducing inference latency by 15% and improving task accuracy by 10% across classification and summarisation.
Built LLM evaluation pipelines (RAGAS, TruLens) benchmarking hallucination, faithfulness, and alignment — establishing reusable protocols for responsible model deployment.
Developed and evaluated dememorisation techniques to prevent LLMs from leaking training data, contributing findings to EMNLP 2023 and NAACL 2025.
Deployed and managed models via Docker and AWS SageMaker with experiment tracking and automated testing.

Designed and shipped 10+ production AI systems for clients across legal, healthcare, and e-commerce — including document Q&A, semantic search, and content classification pipelines on real-world multilingual data.
Built end-to-end RAG applications with custom chunking, embedding, and retrieval tuning (FAISS, Pinecone), reducing irrelevant results by ~30% over naive baselines.
Fine-tuned and deployed LLMs and NLP models via FastAPI on AWS; implemented monitoring, auth, and data-handling best practices for production reliability.
Delivered ASR pipelines (Whisper, wav2vec2), achieving 15–20% WER improvement over off-the-shelf solutions on client audio data.

Built a real-time NLP-driven recommendation system using embeddings and text classification, increasing user engagement by 25%.
Engineered automated web scraping and preprocessing pipelines for structured and unstructured data, reducing manual effort by 40%.

Designed and executed experimental protocols for model evaluation across Computer Vision (COVID-19 chest X-ray detection — 1st place, UGRF) and NLP tasks using PyTorch and TensorFlow.
Led large-scale data acquisition and preprocessing for image and text datasets, ensuring integrity and feature readiness for model training.

Research

Publications

View all on Google Scholar →

2026

Omar Mahmoud*, Ali Khalil, Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana

Omar Mahmoud*, Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana

2025

Aly M. Kassem, Omar Mahmoud*, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana

2023

Aly M. Kassem, Omar Mahmoud*, Sherif Saad

Omar Mohamed*, Aly M. Kassem, Ali Ashraf, Salma Jamal, Ensaf Hussein Mohamed

2022

Aly Mostafa, Omar Mohamed Ahmed*, Ali Ashraf

Aly Mostafa, Omar Mohamed Ahmed*

Salma Jamal, Aly Mostafa, Omar Mohamed Ahmed*, Ali Ashraf

2021

Aly Mostafa, Ahmed Elbehery, Ali Ashraf, Omar Mohamed Ahmed*, Ali Mahmoud

Omar Mohamed Ahmed*, Salah Aly Ahmed

Aly Mostafa, Omar Mohamed Ahmed*, Ali Ashraf, Ahmed Elbehery, Salma Jamal, Ghada Khoriba, Amr S. Ghoneim

* Corresponding / lead author

Who I Am

Technical Skills

LLMs & GenAI

Trustworthy AI

NLP & Speech

ML & Deep Learning

Data & Retrieval

Engineering & Cloud

Experience

AI Engineer

AI Engineer

Data Scientist

Undergraduate Research Assistant

Publications

The Unintended Trade-Off of AI Alignment: Balancing Hallucination Mitigation and Safety in LLMs

Aligning Multilingual Representations: Unveiling Multilingual Behavior Dynamics

Alpaca Against Vicuna: Using LLMs to Uncover Memorization of LLMs

Preserving Privacy Through Dememorization: An Unlearning Technique for Mitigating Memorization Risks in Language Models

An Ensemble Transformer-Based Model for Arabic Sentiment Analysis

GoF at Arabic Hate Speech 2022: Breaking the Loss Function Convention for Data-Imbalanced Arabic Offensive Text Detection

GoF at Qur'an QA 2022: Efficient Question Answering for the Holy Qur'an Using Deep Learning

On the Arabic Dialects' Identification: Overcoming Challenges of Geographical Similarities and Imbalanced Datasets

COVID-19 Patient Chest X-Rays Automatic Detection Using Deep Learning

Arabic Speech Emotion Recognition Employing Wav2Vec2.0 and HuBERT

OCFormer: A Transformer-Based Model for Arabic Handwritten Text Recognition

An End-to-End OCR Framework for Robust Arabic Handwriting Recognition (270M-word corpus)

Education