To develop the best medical insurance products, the insurer need access to historical data to approximate the medical costs of each user. With this data, a medical insurer can develop more accurate pricing models, plan a particular insurance outcome, or manage a big portfolios. For all these cases, the objective is to accurately predict insurance costs.
This dataset contains 1,339 medical insurance records. The individual medical costs billed by health insurance are the target variable charges, and the rest of columns contain personal information such as age, gender, family status, and whether the patient smokes among other features.
The objective is to train a ML regression model that generates the target column charges more accurately. Being a regression model problem, metrics such as the coefficient of determination and the mean squared error are used to evaluate the model.
This dataset can boost up the financial performance of a medical insurer, bit it has some issues that complicate its usage. Luckily, Synthesized data generation tools can solve these problems in a fast and intuitive way.
This dataset is publicly available in Kaggle's Medical Cost Personal dataset.