Projects

Audio to text using Gen AI

Stack used AssemblyAI: https://www.assemblyai.com/ for transcription LangChain: https://www.langchain.com/ Hugging Face: https://huggingface.co/ for embeddings Chromadb: https://www.trychroma.com/ as a vector database OpenAI: https://openai.com/ language model APIs: AssemblyAI API key: https://www.assemblyai.com/dashboard/… OpenAI API key: https://openai.com/blog/openai-api Introduction Retrieval Augmented Generation (RAG) is a method to augment the relevance and transparency of Large Language Model (LLM) responses. In this approach, the LLM query retrieves relevant documents from a database and passes these into the LLM as additional context....

Predict whether or not the patients in the dataset have diabetes

Abstract Electronic Health records or Electronic Medical Records data is the data being collected when we see a doctor, pick up a prescription at the pharmacy, or even from a visit to the dentist. This data is used for a variety of use-cases. From personalizing healthcare to discovering novel drugs and treatments to helping providers diagnose patients better and reduce medical errors. Diabetes mellitus, or simply diabetes, is a leading non-communicable disease (NCD) globally, almost doubling in cases since 1980....

Image Classification using AWS SageMaker

Abstract In this project, we will be using AWS Sagemaker to finetune a pretrained model that can perform image classification. We will use Sagemaker profiling, debugger, hyperparameter tuning and other good ML engineering practices to finish this project. The dataset will be dog breed classification Dataset The dataset contains images from 133 dog breeds divided into training, testing and validation datasets. Dataset is a set of pictures of different dog types (Golden Retriever, Akita, etc)....

Tell bicycles apart from motorcycles(Computer Vision)

Abstract The aim of this project is to build ML Workflow on AWS Sagemaker Background Image Classifiers are used in the field of computer vision to identify the content of an image and it is used across a broad variety of industries, from advanced technologies like autonomous vehicles and augmented reality, to eCommerce platforms, and even in diagnostic medicine. The image classification model can help the team in a variety of ways in their operating environment: detecting people and vehicles in video feeds from roadways, better support routing for their engagement on social media, detecting defects in their scones, and many more!...

Bike Sharing Demand

Abstract Bike-sharing demand is highly relevant to related problems companies encounter, such as Uber, Lyft, and DoorDash. Predicting demand not only helps businesses prepare for spikes in their services but also improves customer experience by limiting delays Project Environment AWS Sagemaker Studio was used for this project as it can be seen in the project repo. Data set Data was downloaded from kaggle. !kaggle competitions download -c bike-sharing-demand !unzip -o bike-sharing-demand....

Atrial Fibrillation

Abstract Atrial Fibrillation Atrial fibrillation (AFib) is a condition where the heartbeat becomes erratic and the upper chambers of the heart quiver and shake. If there’s a lurking blood clot, this shaking can break it loose and send it up through the blood vessels to the brain, where it can cause a stroke. People with AFib have a 5 times greater likelihood of having a stroke, and doctors typically put patients with the condition on blood-thinning medications that prevent clots....

Motion Compensated Pulse Rate Estimation

Abstract Introduction A core feature that many users expect from their wearable devices is pulse rate estimation. Continuous pulse rate estimation can be informative for many aspects of a wearer’s health. Pulse rate during exercise can be a measure of workout intensity and resting heart rate is sometimes used as an overall measure of cardiovascular fitness. Background Physiological Mechanics of Pulse Rate Estimation Pulse rate is typically estimated by using the PPG sensor....

Pneumonia Detection From Chest X-Rays

Abstract Dataset NIH Chest X-rays National Institutes of Health Chest X-Ray Dataset Chest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays....

Quantifying Hippocampus Volume for Alzheimer's Progression

Abstract Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that results in impaired neuronal (brain cell) function and eventually, cell death. AD is the most common cause of dementia. Clinically, it is characterized by memory loss, inability to learn new material, loss of language function, and other manifestations. For patients exhibiting early symptoms, quantifying disease progression over time can help direct therapy and disease management. A radiological study via MRI exam is currently one of the most advanced methods to quantify the disease....

Patient Selection for Diabetes Drug Testing Project

Abstract EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical industry and regulators to make decisions on clinical trials. You are a data scientist for an exciting unicorn healthcare startup that has created a groundbreaking diabetes drug that is ready for clinical trial testing. It is a very unique and sensitive drug that requires administering the drug over at least 5-7 days of time in the hospital(X number of days based off of distribution that I will see in data and cutoff point) with frequent monitoring/testing and patient medication adherence training with a mobile application....