So its been a long time. We have finally got the data just as how we want it.
Great so data is ready and we already have a bit of knowledge on logistic Regression and Random Forest.
So going ahead first with Logistic Regression-
on executing this magic line I lie with an accuracy of 80% . Naaaaah , not what we wanted.
so going ahead with Random Forest
bestmtry <- tuneRF(training_data[,-14], as.factor(training_data[,14]),
ntreeTry=100, stepFactor=1.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE)
rf.fit <- randomForest(income ~ ., data=training_data,
mtry=4, ntree=1000, keep.forest=TRUE, importance=TRUE, test=x_test)
this returned finally an 86 % , it looks like we are doing great. We finally did it!!!!!!
Trying out SVM now.
But wait what is SVM- Support vector machines?
Think of all the data points plotted in space that we cant visualise.But imagine if we had 2D data, then in very vague terms SVM would make lines for us that would help us clearly classify whether a data point belongs to the group 50K and above or 50K and below.
So SVM has hyperplanes these planes are calculated in such a way that they are equidistant from both the classes.
In SVM a plane with maximum margin is a good plane and a plane with minimum margin is a bad plane.
With that said you can find the code for random forest and logistic regression here ->
and for svm here->