Running Decision Tree On UCI ADULT Data set With R

One of the most simple used supervised machine learning algorithm is a Decision tree.

Decision trees get to become very complex with very large data sets but that’s fine as its application lies where there is a small data set or we want to explain the customer/business how we landed upon a decision.

Decision trees are used to make a yes or no / 1 or 0 decision. In the case of UCI adult data set we want to predict if the individual has an income above or below 50K. Which is nothing but a factor variable.

Decision trees work amazing when all the explanatory variables are categorical and numerical.Rather numerical variables just cant be used here.Hence I have converted the entire data set into categorical variables.

To run the model i made use of the package RWeka in R which has the function j48 and achieved an accuracy of -> 83.8472 %

For the code and method please visit my GitHub link below

https://github.com/mmd52/UCI_ADULT_DATSET_PROJECT

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s