• Projects (Complete List)

  • Predictive Analysis of Hospital Costs

    Project Link
    University of Massachusetts Amherst, Spring 2024

    • Implemented 20 different statistical learning techniques in R to predict hospital costs and to compare methods.
    • Estimated test error using 10-fold cross validation for each method.
    • Conducted a simulation study to assess the impact of the number of neighbors used in the k nearest neighbors algorithm.

  • Predicting Auto Insurance Claim Costs Using Historical Claim Data

    Travelers Analytics Case Competition 2023
    University of Massachusetts Amherst, Fall 2023

    • Built an ensemble Machine Learning Model to predict claim costs using Python.
    • Assessed contribution of each feature using feature importance plots and SHAP Beeswarm plots.
    • Communicated the business impact of our findings via a presentation to the Travelers team.

  • The Effects of Cement Floors on Maternal Wellbeing

    Project Link
    University of Massachusetts Amherst, Fall 2023

    • Utilized Instrumental Variable estimation to assess the causal effect of floor quality on stress and depression levels of mothers who participated in the Piso Firme Mexican government initiative to install cement floors in homes which previously had dirt floors.
    • Found that there is a statistically significant causal effect, even after adjusting for relevant covariates.

  • Analysis of Flight Delays for Tampa Airport

    Project Link
    University of Massachusetts Amherst, Fall 2023

    • Constructed a logistic regression model to predict whether or not a flight departing from Tampa Airport would be delayed using publicly available Bureau of Transportation Statistics data.
    • Employed lasso for variable selection and compared the performance of different models using ROC curves and AUC.
    • Incorporated the optimal model into an R shiny app as well as an R package geared toward facilitating data-informed decision-making for customers booking flights.

  • Analysis of Potential Predictive Factors on post-grad Job Status

    University of Massachusetts Amherst, Fall 2023

    • Fit logistic regression (binary response: has job 3 months post-grad or not) and poisson (count response: number of months taken to find job post-grad) regression models to assess the relationship between demographics, field of study, and degree type on post-grad career status among students who have received a higher education degree in a STEM field using publicly available IPUMS Higher Ed Data.
    • Found that field of study and degree type are strong predictors of career outcomes, while demographic factors are not as associated with career outcomes as we expected.

  • Using Simulation to Assess Strategy Effectiveness in UNO

    Project Link
    University of Massachusetts Amherst, Spring 2023

    • Developed a Python algorithm to simulate the popular card game UNO.
    • Conducted Z-tests to assess effectiveness of four different strategies as compared to a random strategy.
    • Found that several of the strategies were effective at increasing winning probability, although the most effective only increased chances of winning by at most a couple of percentage points.

  • Analysis of Access to Emergency Funds in Sub-Saharan Countries: A Human-Rights Based Approach

    Sponsored by Women at the Table Project Link
    Smith College, Fall 2022

    • Trained a Decision Tree Classifier Machine Learning model to predict whether or not an individual has access to emergency funds with 68% accuracy. The model makes predictions using demographic and other financial data sourced from The 2017 Global Findex Database published by The World Bank.
    • Assessed fairness of the model based on gender using a variety of group and individual fairness metrics and implemented de-biasing techniques to improve the fairness.
    • Documented the full analysis in an iPython Google Colab notebook structured as an educational resource for more ethical machine learning including full explanations of each step of the analysis oriented toward a non-technical audience.

  • Trends in Students Studying Early Childhood Education in The Pioneer Valley, MA

    In partnership with Community Action Pioneer Valley Head Start and Early Learning Programs
    Smith College, Fall 2022

    • Collected and analyzed data on the numbers and demographics of students studying Early Childhood Education (ECE) from post-secondary institutions within the Pioneer Valley, MA.
    • Integrated data from IPEDS, the Census Bureau Household Pulse Survey, and the Bureau of Labor Statistics in order to contextualize the survey data into the larger story of the ECE labor shortage and the impacts of the pandemic on the same.
    • Summarized findings into a report for Community Action Pioneer Valley Head Start and Early Learning Programs in order to inform their funding plans for the upcoming funding cycle.

  • Sex Differences in Depression and Sleep Disturbance as Int`er-Related Risk Factors of Diabetes

    Published in Frontiers in Clinical Diabetes and Healthcare Publication Link
    Smith College, Spring 2020-Summer 2022

    • Used multiple logistic regression and publicly available U.S. census data from IPUMS NHIS to analyze depression and sleep as inter-related predictors of diabetes.
    • Submitted report to Undergraduate Statistics Class Project Competition (USCLAP) and received first place in the Intermediate Statistics Division, Spring 2020
    • Presented at the Electronic Undergraduate Statistics Research Conference (eUSR) Fall 2020: https://www.causeweb.org/usproc/eusrc/2020/program/10
    • Revised report substantially and published in peer-reviewed journal Frontiers in Clinical Diabetes and Healthcare, Summer 2022

  • censusviz R package

    Project Link
    Smith College, Spring 2022

    • Built an R package which provides an interface for exploring and visualizing historical racial demographic census data (1950-2020) sourced from IPUMS for any region in the United States (by county).
    • The package provides functionality for visualizing the data on leaflet maps as well as for accessing the data in an accessible, tidy format such that the user can then create their own visualizations.