Projects
Life Expectancy Prediction Using Machine Learning
Built predictive models to estimate life expectancy based on socioeconomic and health indicators across countries. Achieved 91% accuracy using regression models and explored key factors influencing public health.
Tools:
Python · Pandas · Seaborn · scikit-learn · Jupyter Notebook

Clinical Concept Normalization with BioBERT & SapBERT
Built a semantic normalization pipeline to map noisy clinical phrases (e.g., misspellings, shorthand) to standardized SNOMED CT concepts. Compared the performance of BioBERT and SapBERT using cosine similarity and curated ground truth sets, guiding improvements in terminology alignment for clinical NLP applications. Key methods used are Cosine Similarity · Concept Embedding · SNOMED CT Normalization
Tools:
Python · Hugging Face Transformers · scikit-learn · Google Colab · pandas · matplotlib

Heart Disease Risk Modeling with Logistic Regression
Developed an interpretable logistic regression model using clinical variables. Applied statistical tests and ROC analysis to evaluate model performance and guide data-driven decisions in cardiovascular care.
Tools:
R · RMarkdown · Power BI · Statistical Testing (Chi-square, Mann-Whitney, Spearman)

Breast Cancer Outcomes & Clinical DBMS Design
Designed a normalized 3NF database for analyzing protein expression and treatment outcomes in breast cancer patients. Wrote optimized SQL queries and mapped clinical concepts for deeper insight.
Tools:
SQL · Python · Jupyter Notebook · MySQL · Clinical Terminologies
