Project portfolio
Detailed breakdowns of delivery scope, architecture, and applied tooling.
Full list with expanded context and outcomes
As the ML/AI engineer, I developed the document summarization pipeline leveraging RAG, Large Language Models (LLMs), and NLP techniques for unstructured corpora. Focused on tuned chunking, retrieval, and prompt templates.
Highlights include traceable prompts, retrieval tuning, and evaluation loops to keep summaries concise and consistent across large document sets. Result: faster review cycles and clearer outputs for stakeholders.
Enterprise AI copilot built for internal infrastructure and knowledge bases. Designed the RAG architecture, data pipelines, and model tuning to deliver accurate, context-aware assistance.
Focused on secure retrieval over internal code/docs, access controls, and reliable preparation workflows. Result: reduced time spent searching documentation and improved developer throughput.
RiskFocus feature for the RiskSummariser application to strengthen risk assessment workflows, including multi-document data lineage and risk factor management across lines of business.
Implemented a generalized prompt strategy backed by an admin-managed knowledge base and added Q&A functionality to improve decision-making and engagement.
Enterprise chatbot application with optimized data ingestion pipelines and document retrieval. Evaluated and fine-tuned LLMs to improve accuracy and answer consistency.
Emphasis on reliable ingestion, high-quality retrieval, and consistent responses for business users. Result: quicker access to trusted answers and reduced support burden.
Built a multi-source ingestion pipeline for auto market analytics, pulling from production databases and web scraping sources into a centralized data lake.
Processed and normalized data with Databricks and Python, then loaded curated datasets into analytics databases for downstream reporting.
Large-scale document ingestion and transformation pipelines on AWS with serverless orchestration. Designed for high-volume, resilient processing.
Focused on document load, parse, transform, and load to S3 workflows. Result: higher throughput, more stable runs, and lower operational overhead.
Built data extraction and document classification services from scratch. Created deep learning algorithms for unstructured text and led a team of 6 engineers.
Delivered production-ready services that automated document workflows and improved classification reliability. Result: reduced manual tagging effort and faster document intake.
Computational platform for in silico clinical trials of familial cardiomyopathies (FCMs), funded under EU Horizon 2020 grant agreement No. 777204. The multi-modular system models how sarcomeric protein mutations affect heart function and drug responses, integrating patient-specific data across genetic, biological, pharmacologic, and clinical dimensions.
Key contributions included ECG and CPET analysis, statistical modeling, and development of patient-specific 3D heart models. The platform aims to optimize treatment strategies while minimizing adverse effects and reducing reliance on animal and human clinical trials.
End-to-end conversational AI system for telephone support with speech-to-text and text-to-speech capabilities. Built to handle customer inquiries, policy questions, and claims status through natural voice interactions.
The system transcribes caller speech in real-time, processes intent and entities using Azure LUIS, generates contextual responses, and synthesizes natural-sounding voice replies. Designed for high availability and seamless integration with existing telephony infrastructure.
Comprehensive data extraction and transformation platform built on Azure. The pipeline orchestrates ETL workflows from raw data ingestion through classification, embedding generation, and analytics-ready outputs for business intelligence.
Designed to handle diverse data sources with automated pipelines that clean, classify, and enrich data before loading into Data Lake storage. Integrated with Power BI for real-time dashboards and reporting.
End-to-end document processing service designed to extract, classify, and store relevant data from unstructured documents. Built containerized microservices architecture on Azure for scalable document intake and automated categorization.
The system ingests documents from multiple sources, applies ML-based classification models, and persists structured outputs to downstream storage. Result: streamlined document workflows with consistent labeling and reduced manual intervention.
© 2026 Stefan Seman. All rights reserved.