Fun with data

Exploratory data analysis

Clustering asian cuisines

EDA with Tableau

Ames housing dataset

5000 movies from the IMDB database

Pima Indians Diabetes Database

My kernels on Kaggle

A simple linear regression on the Ames housing dataset (see also the EDA above), a random forest model, and a support vector regression model.

A linear model with lots of feature engineering for the Prudential Life Insurance challenge (worth a silver medal in the kernels category).

Extracting plot keywords and genres from the 5000 IMDB movie dataset.

Open data

I published a dataset of star-cluster simulations for your plotting pleasure. More is coming!