tharun18
Data Scientist
• Data Scientist with 4+ years of experience in Healthcare and Finance industry, skilled in developing LSTM, Transfer learning, imaging
methods for healthcare research and GLM, Anomaly Detection, Demand Forecasting for finance.
• Data Science and Analytics graduate with a focus on Big Data and Machine Learning with a GPA of 3.9.
• Experienced in Python, Machine Learning, Data Science, SQL, ETL pipelines, CI/CD, EDA, Azure, AWS. 3x Winner of hackathons, certified
in Tableau, Azure AI. Plays with Large Language Models (LLM), QLoRA, LoRA, Vector Databases, Fine-tuning LLMs, RAGs, Gen AI.
Experience: 4 years
Yearly salary: $136,000
Hourly rate: $0
Nationality: 🇺🇸 United States
Residency: 🇺🇸 United States
Experience
Data Scientist
ELEPHAS 2024 - 2024
Contributed to medical immunology research at a dynamic startup specializing in advanced imaging and data-driven immunotherapy. Utilized MPM (Multi Photon Microscopy) data to evaluate the performance of medical immunology treatment, PCA for dimensionality reduction and Binning to create custom buckets of similar values which improved data analysis efficiency. Performed Data analysis on cytokine data to assess the response levels on protein level. Experimented with feature selection techniques such as Recursive Feature Elimination with Random Forest and Gradient Boosting, and Genetic Algorithms with auto encoders and GMM, leading to improved model performance. Applied k-means, DB-SCAN, Hierarchical clustering, and Spectral clustering on assay samples to analyze drug responses based on protein levels, resulting in more accurate clustering of response patterns.
Data Science Intern
HONEYWELL 2023 - 2023
Analyzed diverse data tables with MS SQL Server within industrial repair solutions, pinpointing faults, errors. Leveraged Tableau to visualize and communicate insights to cross-functional stakeholders, enhancing collaboration and decision-making processes. Employed SAS for advanced statistical analysis and predictive modeling to identify trends and anomalies in repair data, leading to improved maintenance strategies and operational efficiency. Used SAS Viya for cloud-based analytics and machine learning, enhancing the scalability and performance of data analysis workflows. Utilized Support Vector Machines (SVMs) to detect anomalies in smoke sensor data, improving the accuracy of fault detection and reducing false alarms. Implemented LSTM models for accurate demand forecasting in critical sensor markets, facilitating proactive inventory management and cost optimization. Leveraged Falcon 7B LLM through Hugging Face to build a responsive answering system, delivering data-driven repair suggestions based on historical records, resulting in streamlined customer support interactions.
Machine Learning Research Assistant - Healthcare
TRENDS Lab (GEORGIATECH UNIVERSITY) 2022 - 2024
Researched on ICA, brain disorders using fMRIs Functional Connectivity, Static FNC, Dynamic FNC. Designed, analyzed, and visualized high-dimensional time-series data to understand neurobiology for neurological diseases. Built a LSTM model for the classification of Alzheimer’s (AD) and Schizophrenia (SZ) on rs-fMRI Time Series. Employed stratified k-fold on an imbalanced dataset finding best hyperparameters and model, achieving an accuracy of 80%, sensitivity of 79%, and specificity of 78% with the best model. Utilized ensembling methods to improve model performance, resulting in enhanced prediction accuracy for neurological disease classification. Presented research findings at OHBM 2023 in Canada, contributing to the community's understanding of neurological disease modeling. Built a U-Net model leveraging ImageNet models like ResNet and DenseNet as encoders on mammographic images. Optimized model performance through extensive hyperparameter tuning, achieving an AUC-ROC score of 0.94.
Data Scientist
ACCENTURE 2021 - 2022
Developed regression models to predict the time of recovery of a patient and Customer Lifetime Value using Lasso, Ridge, Support Vector Regression (SVR), and XGBoost. Designed and implemented ETL pipelines leveraging Azure Data Factory and Azure Data Lake Storage Gen2. Established CI/CD pipelines with Docker containers for seamless model deployment and monitoring, guaranteeing efficient scaling and reproducibility on Azure Monitor. Applied machine learning models, including Generalized Linear Models (GLMs) and XGBoost, achieving 85% accuracy in predicting insurance claim costs, enhancing risk assessment precision and financial forecasting reliability. Developed an anomaly detection system using K-Nearest Neighbors algorithm, successfully identifying 90% of fraudulent insurance claims, enhancing the company's fraud prevention capabilities. Annotated physician notes with corresponding medical codes to train BERT-based NLP models for accurate extraction. Integrated BERT-based NLP models into the claims processing pipeline on Azure Databricks, achieving 92% accuracy in extracting medical codes from physician notes, significantly reducing manual workload and errors. Built a real-time fraud dashboard using Power BI and Azure SQL Database, providing insights into suspicious activity patterns.
Data Scientist
TECHCITI TECHNOLOGIES 2019 - 2021
Developed a personalized news recommendation system using collaborative filtering techniques, including user-based and item-based methods, to offer tailored content suggestions. Implemented advanced similarity algorithms (cosine similarity, Pearson correlation) to enhance recommendation accuracy, resulting in a 15% increase in user retention. Designed and managed ETL workflows using Apache Airflow, processing large-scale data sets efficiently for model training and updates, Apache Spark for scalable model training and recommendation generation, reducing processing time by 40% for large datasets. Implemented a CI/CD pipeline using Azure DevOps for version control and automated model deployment, reducing deployment time by 30% and improving overall development efficiency. Integrated MongoDB for real-time updates to the recommendation engine, enabling dynamic content personalization. Conducted rigorous A/B testing to optimize recommendation strategies, leading to a 20% increase in article click rates. Leveraged Azure Synapse Analytics for comprehensive data analysis and Azure Databricks for scalable model deployment and monitoring. Incorporated SHAP (Shapley Additive explanations) for interpretability and understanding of the recommendation model's decision-making process, enhancing transparency and trustworthiness in recommendations.
Skills
big-data
ci-cd
data viz
data-science
docker
kubernetes
machine-learning
nosql
python
pytorch
sql
stats
english