We have a classification problem. Our data set has in total 8 independent variables, out of which one is a factor and 7 our continuous. This means we should have at-least 8 plots.
The target variable Outcome should be plotted against each independent variable if we want to derive any inferences and leave no stones unturned for it.
So if we need to plot 2 factor variables, we should preferably use a stacked bar chart or mosaic plot.
For one numeric and other factor bar plots seem like a good option.
And for two numeric variables we have out faithful scatter plot to the rescue.
In this blog I post I will not be stressing much on words but more on code and inferences made which is well explained and documented in my code.
I strongly suggest you view the code below, which has inferences and a well documented structure.
You can download the data from
DATA-> https://github.com/mmd52/Pima_R (A file named as diabetes.csv is the one)
R Code -> https://github.com/mmd52/Pima_R/blob/master/EDA.R (A fair warning to execute the EDA code in R you will first need to execute the https://github.com/mmd52/Pima_R/blob/master/Libraries.R and https://github.com/mmd52/Pima_R/blob/master/Data.R)
Python Code-> https://github.com/mmd52/Pima_Python/blob/master/EDA.ipynb (Its a Jupyter Notebook)