Projects

Generative AI

AI-Powered Document Summarizer

As the ML/AI engineer, I developed the document summarization pipeline leveraging RAG, Large Language Models (LLMs), and NLP techniques for unstructured corpora. Focused on tuned chunking, retrieval, and prompt templates.

Highlights include traceable prompts, retrieval tuning, and evaluation loops to keep summaries concise and consistent across large document sets. Result: faster review cycles and clearer outputs for stakeholders.

Scope: multi-source ingestion, semantic retrieval, and summarization pipeline.
Quality: spot checks, prompt guardrails, and error analysis for drift.
Impact: reduced manual synthesis and improved summary consistency.

Python AWS Snowflake RAG Docker S3 Step Functions AWS SAM LangChain LlamaIndex

Generative AI

AI-Code Copilot

Enterprise AI copilot built for internal infrastructure and knowledge bases. Designed the RAG architecture, data pipelines, and model tuning to deliver accurate, context-aware assistance.

Focused on secure retrieval over internal code/docs, access controls, and reliable preparation workflows. Result: reduced time spent searching documentation and improved developer throughput.

Scope: code and docs retrieval with contextual grounding.
Reliability: permission-aware access and response grounding.
Impact: faster onboarding and issue triage for teams.

Python Redis Snowflake RAG AI Agents LangChain Azure OpenAI Azure DevOps Grafana Fine-Tuning KubeFlow Docker PostgreSQL Milvus

Generative AI

RiskFocus (RiskSummariser Enhancement)

RiskFocus feature for the RiskSummariser application to strengthen risk assessment workflows, including multi-document data lineage and risk factor management across lines of business.

Implemented a generalized prompt strategy backed by an admin-managed knowledge base and added Q&A functionality to improve decision-making and engagement.

Scope: integrate RiskFocus with a multi-document data lineage UI.
Admin: manage risk factors by line of business and knowledge base content.
Experience: Q&A workflows for guided user decisions.

Python LangChain AWS Lambda ECS S3 AWS Bedrock Step Functions RAG

ML/AI

Document Chatbot

Enterprise chatbot application with optimized data ingestion pipelines and document retrieval. Evaluated and fine-tuned LLMs to improve accuracy and answer consistency.

Emphasis on reliable ingestion, high-quality retrieval, and consistent responses for business users. Result: quicker access to trusted answers and reduced support burden.

Scope: ingestion, indexing, retrieval ranking, and response flow.
Quality: curated QA set for evaluation and regressions.
Impact: fewer repeated questions and faster response times.

Python FastAPI Azure Search Azure Functions Docker Azure OpenAI Azure Pipelines Data Lake Snowflake RAG

Data Engineering

Auto Market Data Platform

Built a multi-source ingestion pipeline for auto market analytics, pulling from production databases and web scraping sources into a centralized data lake.

Processed and normalized data with Databricks and Python, then loaded curated datasets into analytics databases for downstream reporting.

Scope: ingestion from databases and web scraping, staging in Data Lake, and curated loads for analytics.
Architecture: Azure Data Factory orchestration with containerized Python/FastAPI services.
Impact: analytics-ready datasets for market insights and performance tracking.

Python Azure Docker Data Lake Snowflake Databricks Azure Data Factory FastAPI

Cloud / AWS

Large-Scale Document ETL Pipelines

Large-scale document ingestion and transformation pipelines on AWS with serverless orchestration. Designed for high-volume, resilient processing.

Focused on document load, parse, transform, and load to S3 workflows. Result: higher throughput, more stable runs, and lower operational overhead.

Scope: event-driven document ingestion, parsing, transformation, and S3 load workflows.
Reliability: retries, monitoring, and failure isolation.
Impact: reduced cost per run and faster delivery.

Python AWS Lambda Step Functions DynamoDB S3 Snowflake AWS SAM ECS EC2

Software Engineering

Document Data Extractor Service

Built data extraction and document classification services from scratch. Created deep learning algorithms for unstructured text and led a team of 6 engineers.

Delivered production-ready services that automated document workflows and improved classification reliability. Result: reduced manual tagging effort and faster document intake.

Scope: extraction and classification services with clear APIs.
Ops: containerized deployment and batch processing.
Impact: higher consistency and lower manual workload.

Python Azure Data Factory Synapse Snowflake Docker Flask PyTorch Embeddings ETL Data Lake

Research

SILICOFCM - EU Horizon 2020

Computational platform for in silico clinical trials of familial cardiomyopathies (FCMs), funded under EU Horizon 2020 grant agreement No. 777204. The multi-modular system models how sarcomeric protein mutations affect heart function and drug responses, integrating patient-specific data across genetic, biological, pharmacologic, and clinical dimensions.

Key contributions included ECG and CPET analysis, statistical modeling, and development of patient-specific 3D heart models. The platform aims to optimize treatment strategies while minimizing adverse effects and reducing reliance on animal and human clinical trials.

Scope: ECG/CPET signal analysis, patient-specific cardiac modeling, disease progression tracking, and drug effectiveness assessment.
Outputs: peer-reviewed publications, conference presentations, video tutorials, and clinical insights supporting personalized medicine.
Impact: reproducible analysis pipelines enabling evidence-based treatment optimization for cardiomyopathy patients.

Python PyTorch Data Analysis Biostatistics ECG CPET In Silico Trials

Project Website

Conversational AI

Voice-Enabled Support Chatbot

End-to-end conversational AI system for telephone support with speech-to-text and text-to-speech capabilities. Built to handle customer inquiries, policy questions, and claims status through natural voice interactions.

The system transcribes caller speech in real-time, processes intent and entities using Azure LUIS, generates contextual responses, and synthesizes natural-sounding voice replies. Designed for high availability and seamless integration with existing telephony infrastructure.

Scope: speech recognition, intent classification, entity extraction, dialogue management, and speech synthesis.
Architecture: containerized microservices with Azure Cognitive Services (Speech, LUIS, Bot Service) and telephony integration.
Impact: reduced call center wait times, 24/7 availability, and improved customer satisfaction for routine inquiries.

Python Docker Azure LUIS Azure Speech Services Azure Bot Service Telephony

Data Engineering

Data Extraction Service

Comprehensive data extraction and transformation platform built on Azure. The pipeline orchestrates ETL workflows from raw data ingestion through classification, embedding generation, and analytics-ready outputs for business intelligence.

Designed to handle diverse data sources with automated pipelines that clean, classify, and enrich data before loading into Data Lake storage. Integrated with Power BI for real-time dashboards and reporting.

Scope: ETL pipeline design, data classification, embedding generation, and BI integration.
Architecture: Azure Data Factory orchestration with Data Lake storage and Power BI visualization layer.
Impact: unified data platform enabling self-service analytics and faster insight delivery.

Python FastAPI Azure Data Factory Synapse Data Lake Power BI ETL Embeddings

Software Engineering

Document Classification Service

End-to-end document processing service designed to extract, classify, and store relevant data from unstructured documents. Built containerized microservices architecture on Azure for scalable document intake and automated categorization.

The system ingests documents from multiple sources, applies ML-based classification models, and persists structured outputs to downstream storage. Result: streamlined document workflows with consistent labeling and reduced manual intervention.

Scope: document ingestion, text extraction, ML classification, and structured storage.
Architecture: containerized services with clear separation of extraction, classification, and persistence layers.
Impact: automated document categorization reducing manual tagging and improving data consistency.

Python Docker Azure Flask REST API ML Classification

End-to-end ML systems, data platforms, and research tooling

Projects.

AI-Powered Document Summarizer

AI-Code Copilot

RiskFocus (RiskSummariser Enhancement)

Document Chatbot

Auto Market Data Platform

Large-Scale Document ETL Pipelines

Document Data Extractor Service

SILICOFCM - EU Horizon 2020

Voice-Enabled Support Chatbot

Data Extraction Service

Document Classification Service

Get in Touch.

Let's collaborate on something great