Data Science & AI Projects

Explore my portfolio of machine learning, AI, and data engineering projects

19 projects

Detoxifying FLAN-T5 with RLHF (PPO + Hate-Speech Reward Model)
Generative AI, Responsible AI

Detoxifying FLAN-T5 with RLHF (PPO + Hate-Speech Reward Model)

Fine-tuned an instruction-tuned FLAN-T5 summarization model with Reinforcement Learning from Human Feedback (RLHF) using PPO and a hate-speech reward model to reduce toxicity in generated dialogue summaries.

RLHFPPOFLAN-T5Transformers+1
Read more
DeFtunes Music Analytics: End-to-End Data Pipeline on AWS
Data Engineering

DeFtunes Music Analytics: End-to-End Data Pipeline on AWS

Designed an end-to-end data platform for a music streaming company (DeFtunes), ingesting purchase events from APIs and a PostgreSQL catalog into an AWS data lake, transforming them with Glue/Spark into Iceberg tables, modeling a star schema in Redshift with dbt, and orchestrating daily runs and data-quality checks with Airflow.

AWSData LakeS3Redshift+1
Read more
API & Streaming Ingestion: Spotify REST + AWS Kinesis
Data Engineering

API & Streaming Ingestion: Spotify REST + AWS Kinesis

Designed and implemented two complementary ingestion pipelines: a batch/API pipeline that pulls music metadata from the Spotify REST API with authenticated, paginated requests, and a streaming ETL pipeline on AWS Kinesis that processes user activity events in real time and routes them to S3 for downstream recommendation models.

Spotify APIRESTHTTPOAuth+1
Read more
DataOps: Terraform Bastion Host on AWS & Data Quality with Great Expectations
Data Engineering

DataOps: Terraform Bastion Host on AWS & Data Quality with Great Expectations

Designed a DataOps-focused project that uses Terraform to provision a secure bastion-host architecture on AWS (VPC, private RDS PostgreSQL, public EC2, SSH keys) and Great Expectations to implement automated data quality checks and documentation on a PostgreSQL dataset.

TerraformAWSVPCEC2+1
Read more
AWS Data Engineering: DynamoDB, RDS, EC2 & S3
Data Engineering

AWS Data Engineering: DynamoDB, RDS, EC2 & S3

End-to-end AWS data engineering project designing a NoSQL data layer in DynamoDB and a relational store in RDS PostgreSQL, connected via EC2 bastion hosts and ingesting CSV data from S3 while solving real-world networking, security group, and IAM issues.

AWSDynamoDBPythonBoto3+1
Read more

Flan-T5 Summarization Fine-Tuning with PEFT (LoRA) on AWS SageMaker

Fine-tuned the Flan-T5 language model for dialogue summarization on AWS SageMaker using two strategies: full instruction fine-tuning and parameter-efficient fine-tuning (PEFT) with LoRA. Compared model quality with ROUGE metrics and demonstrated how LoRA adapters can cut model size and compute cost while retaining near-full-tuning performance.

NLPLLMFlan-T5Summarization+1
Read more

Energy Data Pipeline

End-to-end data pipeline processing European electricity grid data using cloud-native architecture and modern data engineering practices.

AWSTerraformApache IcebergAirflow+1
Read more

Dialogue Summarization with FLAN-T5 & Prompt Engineering

Built a dialogue summarization pipeline using Hugging Face’s FLAN-T5 model, experimenting with zero-shot, one-shot, and few-shot prompt engineering on the DialogueSum dataset to improve summaries of multi-turn conversations

NLPLLMHugging FaceFLAN-T5+1
Read more

Audio to text using Gen AI

Explore this comprehensive data science project with detailed analysis and implementation.

LangChainOpenAIChromaDBHuggingFace+1
Read more

Predict whether or not the patients in the dataset have diabetes

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Image Classification using AWS SageMaker

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Tell bicycles apart from motorcycles(Computer Vision)

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Bike Sharing Demand

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Atrial Fibrillation

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Motion Compensated Pulse Rate Estimation

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Pneumonia Detection From Chest X-Rays

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Quantifying Hippocampus Volume for Alzheimer's Progression

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Patient Selection for Diabetes Drug Testing Project

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more

Vibration Sensing For Engine Condition Monitoring and Predictive Maintance

Explore this comprehensive data science project with detailed analysis and implementation.

AWSTerraformApache IcebergAirflow+1
Read more