I'm
AI . ML . Data Science
Passionate about building intelligent systems and leveraging machine learning to solve real-world problems. Currently pursuing Master's in Data Science at UC San Diego with expertise in Generative AI, Computer Vision, Deep Learning and MLOps.
My journey into AI began with purpose, not just passion. In 2020, amid COVID-19, I asked: How can I build something that matters right now? That question led me to develop machine learning pipelines for identifying drug targets. What started as a crisis response evolved into a career engineering AI systems that improve lives, from predicting power outages for 3.6 million SDG&E customers to protecting endangered wildlife through computer vision at San Diego Zoo.
I build AI that ships. My expertise spans deep learning, Generative AI, computer vision, and MLOps, but more importantly, I know how to take models from research to production. I've built multi-agent GenAI platforms with RAG architectures, deployed neural networks on AWS at scale, and optimized inference pipelines to run faster. I thrive in the messy middle ground where research meets engineering, where you need to understand both gradient descent and Docker containers.
But my impact extends beyond the technical. I'm deeply committed to making tech an inclusive space for women. I'm seeking teams that share my vision: AI that's ethical, impactful, scalable, and built by people who represent the world we're building for. I bring production-ready ML skills (PyTorch, LangChain, AWS, RAG systems), published research, real-world deployment experience, and an unwavering commitment to using AI as a force for good.
Ready to discuss how cutting-edge AI research meets production engineering? Find me at GHC 2025!
GPA: 3.82 | San Diego, CA, USA
GPA: 3.99 | Rank: 2nd | Kolkata, India
Building intelligent systems with cutting-edge technologies
LLMs, RAG Systems, Prompt Engineering & AI Integration
Feature Engineering, Statistical Analysis, Data Mining & ETL
Deep Learning, Computer Vision, NLP & Model Optimization
Deploying scalable ML systems with modern DevOps practices
Creating insights through interactive dashboards and analytics
A resilient, autonomous AI platform for supply chain management that processes over 10,000 events per second using four specialized machine learning agents. Built with FastAPI, PostgreSQL, and Kafka for high-throughput data flow, with MLflow and Weights & Biases for comprehensive monitoring. Deployed using microservices architecture with Docker and Kubernetes to simulate and handle supply chain disruptions at scale.
A personal health platform combining computer vision and conversational AI. Uses GPT4-Vision for food image analysis to automatically calculate nutritional values, making meal logging effortless. Integrated GPT-4 and Qwen-VL with a RAG system using FAISS vector search for real-time nutrition coaching, achieving a 70% reduction in API costs. Built on FastAPI with SQLite caching and asyncio pipeline for smooth, personalized experiences.
A scalable audio classifier trained on 50,000+ bird call samples. Engineered 206 parallel XGBoost classifiers to handle class imbalance, achieving 0.911 AUC-ROC. Accepted to CLEF 2025.
A deep dive into audio classification using PyTorch and XGBoost across 50,000+ audio samples. Engineered log-mel and frequency features from raw audio and designed 206 parallel XGBoost binary classifiers to handle severe class imbalance. Achieved significant F1-score improvements for rare species through targeted data augmentation and an overall 0.911 AUC-ROC score. This work was accepted as a paper at CLEF 2025.
My first research venture and introduction to data science's real-world impact. Developed a machine learning pipeline to identify potential drug targets by analyzing 20,000+ genes, RNA-seq data, and protein networks. Used feature engineering techniques like PCA and Node2Vec to reduce dimensionality by 60%. The final Random Forest classifier achieved 89.4% recall and 75.4% accuracy, outperforming baseline classifiers by 7%.
Ever been mid-game, arguing about a rule? As a board game enthusiast, I built PlaySense to solve exactly that. It's a RAG-powered question-answering system using LangChain that lets you query game manuals instantly. The system parses 20+ structured and unstructured manuals stored in ChromaDB, with automated embedding of 500+ pages using Ollama. Through optimization and a custom validation pipeline, I improved retrieval latency by 28% and achieved 91% QA precision.
A novel cascaded U-Net architecture for medical imaging that segments arteries and veins from retinal scans to aid diabetes detection. Achieved 0.74 sensitivity with 2% improvement over baselines.
A deep dive into medical imaging where I developed RavNet—a novel architecture with two cascaded U-Nets and three decoders in the second network. Built in PyTorch for artery-vein segmentation from retinal scans, helping identify conditions like diabetes. I curated and preprocessed 400+ retinal scans with advanced augmentations to enhance generalization and reduce overfitting. The model achieved 0.74 sensitivity and 0.98 specificity, outperforming U-Net baselines by 2%.
Ever wondered if you're being paid fairly? Built PayLens to provide data-driven compensation insights. Created an automated ETL pipeline to scrape 50,000+ salary records from Levels.fyi with 99%+ accuracy. Developed a regression model identifying the five primary drivers of compensation. Statistical analysis (t-tests, ANOVA) revealed a 12% gender pay gap after controlling for variables. Presented findings in an interactive Tableau dashboard to help people make informed career decisions.
An interactive dashboard exploring global suicide data to destigmatize mental health discussions. Built during COVID-19 to visualize temporal and demographic patterns for public health insights.
Developed during the COVID-19 pandemic when mental health became critically important, especially coming from India where it's often a taboo topic. I wanted to build a visualization to vocalize how crucial these conversations are. Curated and cleaned global suicide datasets from WHO and World Bank, performed feature engineering and outlier handling, then built an interactive Streamlit dashboard allowing exploration by country, age, and gender. Conducted statistical analysis to identify temporal and demographic patterns for actionable public health insights.
As a huge anime fan unable to find good movie recommendation platforms, I created my own. Built a scalable content-based recommendation engine using TF-IDF vectorization and cosine similarity for precise anime recommendations. Scraped 19,000 anime entries using BeautifulSoup and tokenized synopses to power the model.
As a lifelong Potterhead, I wanted to analyze the complex character relationships throughout the books. Used NetworkX to map character relationships from Harry Potter data. Scraped datasets and applied NLP with SpaCy for Named Entity Recognition to identify and connect characters. This project represents where my hobby and passion for data intersected.
Suthraye, S., Das, C., Gaikwad, A., Senthilnathan, K., & Sawant, S. P.(2025). University of California, San Diego.
CLEF 2025, LCNS, Springer, Singapore.
Das, C., Swarnendu, G. (2024)
COMSYS 2024, LNNS, Springer, Singapore.
Das, C., Saha, S. (2024)
COMSYS 2023, LNNS, vol 974, Springer, Singapore.
Some things that define me!
some unfinished artworks as a painting is never complete
People who matter
Think of coffee as my gradient descent optimizer. It minimizes my grogginess loss function and helps me converge toward productivity faster than any Adam optimizer ever could.
Coming from India, nothing makes me feel more gorgeous than being in a saree. Its complex, beautiful, and somehow making everything work elegantly together.
Passionate about debugging society's biased algorithms, because equality isn't a feature, it's the foundation every system should be built on.
chdas@ucsd.edu
+1 (619) 953-8848
San Diego, CA, USA
linkedin.com/in/foyie/
github.com/foyie