Master's Thesis in Data & AI: Privacy preserving RAG
Challenging assignment with €1000 compensation or €500 + lease car or €600 + housing, professional guidance, training sessions, knowledge events, brainstorming with colleagues and 2 vacation days p/m.
We usually respond within three days
Privacy is a critical challenge in deploying Retrieval-Augmented Generation (RAG) systems in sensitive domains. This thesis investigates how privacy-preserving techniques, such as differential privacy and synthetic data, can be integrated into RAG pipelines without degrading output quality. You will analyze trade-offs, enhance a promising method, and validate your approach with a Proof of Concept focused on real-world utility and privacy guarantees.
💡Areas of Interest: Information retrieval, AI, data privacy, NLP, differential privacy
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating related external knowledge into prompts. This mitigates hallucinations and improves output quality, especially when the information falls outside the model’s original training data. However, RAG systems currently offer no guarantees that privacy-sensitive content will remain protected in their outputs, posing significant compliance and ethical risks. Consequently, such sources are often excluded from RAG applications, limiting their effectiveness in privacy-critical sectors like healthcare, legal services, finance, and government. To fully leverage RAG's potential in these domains, we need robust, scalable methods to preserve privacy without compromising performance. This thesis addresses the challenge of preserving privacy in RAG systems.
The Assignment
Your research will include two components:
- Literature Study
- Review state-of-the-art methods for privacy-preserving RAG.Focus areas include:
- Differentially Private In-Context Learning (e.g., DP-ICL2)
- Synthetic document generation (e.g., SAGE)
- Private fine-tuning (e.g., DP-SGD, masking techniques)
- Analyze trade-offs between privacy guarantees and model utility.
- Proof of Concept (PoC)
- Select one promising technique and enhance it.
- Ensure your improvement addresses gaps identified in the literature.
- Build and evaluate a PoC integrating your privacy method into a RAG pipeline.
- Evaluation metrics:
- Privacy: Differential Privacy parameters (ε, δ)
- Utility: Accuracy, BLEU/ROUGE scores, latency
Research Question
You will start with the following broad research question, which you can tailor to your most promising approach later on.
"How can privacy be preserved in Retrieval-Augmented Generation systems without sacrificing model utility?"
Materials
- Baseline project: https://github.com/sarus-tech/dp-rag
Paper: RAG with Differential Privacy https://www.arxiv.org/pdf/2412.19291
Medium article: https://medium.com/sarus/introducing-dp-rag-9d4edf3f51c8
- Paper: Privacy-Preserving In-context Learning with Differentially Private Few-shot Generation: https://arxiv.org/pdf/2309.11765
- Paper: Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data https://arxiv.org/pdf/2406.14773
About Info Support
Info Support specializes in custom software, data/AI solutions, management, and training and is active in the Finance, Industry, Agriculture, Food & Retail, Mobility & Public, and Healthcare sectors. We provide solid and innovative solutions for complex and critical software issues. Our headquarters are located in Veenendaal (NL) and Mechelen (BE). At present, approximately 500 employees are employed by Info Support.
Info Support's working method is characterized by a number of core values: solidity, integrity, craftsmanship, and passion. These core values are intertwined in our work and the way we interact with each other.
To ensure that all employees are always up to date with the latest developments, Info Support has an in-house knowledge center that eagerly satisfies the hunger for more or different knowledge and skills.
B2 language proficiency in Dutch is required.
- Department
- Student Master
- Role
- Data & AI
- Locations
- Info Support Nederland
- Remote status
- Hybrid
Why graduate with Info Support?
-
🧑🏫 Engaged guidance
» Personal mentors
» Weekly sessions with experts
» Training and knowledge-sharing evenings -
💰 Choose your compensation p/m
€ 1000,00 euro compensation
€ 500,00 euro + a lease car
€ 600,00 euro + living space -
⚖️ Flexibility & balance
» Hybrid working
» Flexible working hours
» Sole focus on your graduation
Behind the scenes
CodeDocent
In this episode of CodeDocent, Nico Jansen, instructor at the Info...
Josse @ Info Support
Josse talks about his experience as a beginner at Info Support.
Customer case KPN
KPN was guided playfully towards DevOps by Info Support.
Growing in an environment full of knowledge and joy
-
🌞 Welcoming company culture
» An informal and open atmosphere
» You’re part of the team from day one
» Weekly knowledge-sharing sessions
» Engaging community events
» An unforgettable New Year’s party! -
❤️ Passion for IT & Craftsmanship
» Colleagues with a true passion for their craft
» Learn from teammates who love to share their knowledge
» Work alongside experts who challenge and inspire you -
🌱 Room to grow
» Graduating is the starting point of your career
» Opportunity to seamlessly transition into a job after graduation
» Clear development paths and growth opportunities
Your journey to Info Support
-
🖥️ Digital introduction
During the digital introduction, you'll share who you are and what you're looking for. We'll tell you more about who we are and what we can offer you. That way, we can discover together whether there's a connection.
-
🔍 Online assessments
Through two short online assessments, we gain a clear picture of who you are and what you're capable of. They cover your personality and motivations, as well as your technical knowledge.
-
🏢 Meeting at our office
Based on the assessments, we gain insight into your profile. We’ll discuss your personality, have a sparring session with a fellow professional, and take the time to truly get to know the person behind the results.
-
✍️ Finishing touches
After the interview, we’ll fine-tune the assignment and make the right match. This way, we lay the foundation for a successful collaboration. The final step is a personal signing moment with our director.
Master's Thesis in Data & AI: Privacy preserving RAG
Challenging assignment with €1000 compensation or €500 + lease car or €600 + housing, professional guidance, training sessions, knowledge events, brainstorming with colleagues and 2 vacation days p/m.
Loading application form