Midterm

For the titanic data set try the following machine learning classification algorithms.

Use the training and test datasets from the titanic R package.

You should note that the titanic_train has the Survived variable and the titanic_test does not. So to select your best model you need to use the titanic_train dataset to train and test your models. So that means you will need to select a training dataset from titanic_train and select a testing dataset (this would be a validation dataset) from titanic_train to evaluate the models you try.

I have not demonstrated the use of cross-validation, once you are confortable running all of the models see if you can figure out how to use cross-validation to pick the best model.

Once you have picked the best model you should do the following:

  1. Re-run your chosen model on the full titanic_train dataset.
  2. Then produce predictions for the titanic_test dataset. This is what you would submit in a .csv to kaggle in a competition.

Build classification models for the Survived variable. Pick a model scoring function and determine which model is the best. I would suggest making a confusion matrix and computing the accuracy or kappa.

  1. Null Model
  2. kNN (the sample code given did not scale or normalize, if you use this model you need to do that.)
  3. Boosted C5.0
  4. Random Forest
  5. Logistic Regression using regularization
  6. Naive Bayes

Extra Credit:

Make one plot containing all of the ROC curves for the algorithms trained.

Data

library(titanic)

data(titanic_train)
data(titanic_test)

head(titanic_train)
head(titanic_test)

Final

For the Ozone data from the R package mlbench try the following machine learning prediction algorithm that is useful for feature selection.

Read the paper Feature Selection with the Boruta Package and implement the algorithm.

Which features are most important as determined by the Boruta RandomForest Algorithm?

LS0tCnRpdGxlOiAiU3RhdGlzdGljcyA2NTIgLSBNaWR0ZXJtLUZpbmFsIgphdXRob3I6ICJQcm9mLiBFcmljIEEuIFN1ZXNzIgpkYXRlOiAiYHIgZm9ybWF0KFN5cy50aW1lKCksICclQiAlZCwgJVknKWAiCm91dHB1dDoKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKICBodG1sX25vdGVib29rOiBkZWZhdWx0Ci0tLQoKIyMjIE1pZHRlcm0KCkZvciB0aGUgW3RpdGFuaWNdKGh0dHBzOi8vd3d3LmthZ2dsZS5jb20vYy90aXRhbmljL2RhdGEpIGRhdGEgc2V0IHRyeSB0aGUgZm9sbG93aW5nIG1hY2hpbmUgbGVhcm5pbmcgY2xhc3NpZmljYXRpb24gYWxnb3JpdGhtcy4KClVzZSB0aGUgdHJhaW5pbmcgIGFuZCB0ZXN0IGRhdGFzZXRzIGZyb20gdGhlICoqdGl0YW5pYyoqIFIgcGFja2FnZS4KCllvdSBzaG91bGQgbm90ZSB0aGF0IHRoZSAqdGl0YW5pY190cmFpbiogaGFzIHRoZSAqU3Vydml2ZWQqIHZhcmlhYmxlIGFuZCB0aGUgKnRpdGFuaWNfdGVzdCogZG9lcyBub3QuICBTbyB0byBzZWxlY3QgeW91ciBiZXN0IG1vZGVsIHlvdSBuZWVkIHRvIHVzZSB0aGUgKnRpdGFuaWNfdHJhaW4qIGRhdGFzZXQgdG8gdHJhaW4gYW5kIHRlc3QgeW91ciBtb2RlbHMuICBTbyB0aGF0IG1lYW5zIHlvdSB3aWxsIG5lZWQgdG8gc2VsZWN0IGEgdHJhaW5pbmcgZGF0YXNldCBmcm9tICp0aXRhbmljX3RyYWluKiBhbmQgc2VsZWN0IGEgdGVzdGluZyBkYXRhc2V0ICh0aGlzIHdvdWxkIGJlIGEgdmFsaWRhdGlvbiBkYXRhc2V0KSBmcm9tICp0aXRhbmljX3RyYWluKiB0byBldmFsdWF0ZSB0aGUgbW9kZWxzIHlvdSB0cnkuCgpJIGhhdmUgbm90IGRlbW9uc3RyYXRlZCB0aGUgdXNlIG9mIGNyb3NzLXZhbGlkYXRpb24sIG9uY2UgeW91IGFyZSBjb25mb3J0YWJsZSBydW5uaW5nIGFsbCBvZiB0aGUgbW9kZWxzIHNlZSBpZiB5b3UgY2FuIGZpZ3VyZSBvdXQgaG93IHRvIHVzZSBjcm9zcy12YWxpZGF0aW9uIHRvIHBpY2sgdGhlIGJlc3QgbW9kZWwuCgpPbmNlIHlvdSBoYXZlIHBpY2tlZCB0aGUgYmVzdCBtb2RlbCB5b3Ugc2hvdWxkIGRvIHRoZSBmb2xsb3dpbmc6CgoxLiBSZS1ydW4geW91ciBjaG9zZW4gbW9kZWwgb24gdGhlIGZ1bGwgKnRpdGFuaWNfdHJhaW4qIGRhdGFzZXQuCjIuIFRoZW4gcHJvZHVjZSBwcmVkaWN0aW9ucyBmb3IgdGhlICp0aXRhbmljX3Rlc3QqIGRhdGFzZXQuICBUaGlzIGlzIHdoYXQgeW91IHdvdWxkIHN1Ym1pdCBpbiBhIC5jc3YgdG8ga2FnZ2xlIGluIGEgY29tcGV0aXRpb24uCgpCdWlsZCAqKmNsYXNzaWZpY2F0aW9uIG1vZGVscyoqIGZvciB0aGUgKlN1cnZpdmVkKiB2YXJpYWJsZS4gUGljayBhIG1vZGVsIHNjb3JpbmcgZnVuY3Rpb24gYW5kIGRldGVybWluZSB3aGljaCBtb2RlbCBpcyB0aGUgYmVzdC4gSSB3b3VsZCBzdWdnZXN0IG1ha2luZyBhIGNvbmZ1c2lvbiBtYXRyaXggYW5kIGNvbXB1dGluZyB0aGUgYWNjdXJhY3kgb3Iga2FwcGEuIAoKMC4gTnVsbCBNb2RlbAoxLiBrTk4gKHRoZSBzYW1wbGUgY29kZSBnaXZlbiBkaWQgbm90IHNjYWxlIG9yIG5vcm1hbGl6ZSwgaWYgeW91IHVzZSB0aGlzIG1vZGVsIHlvdSBuZWVkIHRvIGRvIHRoYXQuKQoyLiBCb29zdGVkIEM1LjAgCjMuIFJhbmRvbSBGb3Jlc3QKNC4gTG9naXN0aWMgUmVncmVzc2lvbiB1c2luZyByZWd1bGFyaXphdGlvbgo1LiBOYWl2ZSBCYXllcwoKRXh0cmEgQ3JlZGl0OgoKTWFrZSBvbmUgcGxvdCBjb250YWluaW5nIGFsbCBvZiB0aGUgUk9DIGN1cnZlcyBmb3IgdGhlIGFsZ29yaXRobXMgdHJhaW5lZC4KCiMgRGF0YQoKYGBge3IsIGV2YWwgPSBGQUxTRX0KbGlicmFyeSh0aXRhbmljKQoKZGF0YSh0aXRhbmljX3RyYWluKQpkYXRhKHRpdGFuaWNfdGVzdCkKCmhlYWQodGl0YW5pY190cmFpbikKaGVhZCh0aXRhbmljX3Rlc3QpCmBgYAoKIyMjIEZpbmFsCgpGb3IgdGhlICpPem9uZSogZGF0YSBmcm9tIHRoZSBSIHBhY2thZ2UgKm1sYmVuY2gqIHRyeSB0aGUgZm9sbG93aW5nIG1hY2hpbmUgbGVhcm5pbmcgcHJlZGljdGlvbiBhbGdvcml0aG0gdGhhdCBpcyB1c2VmdWwgZm9yIGZlYXR1cmUgc2VsZWN0aW9uLgoKUmVhZCB0aGUgcGFwZXIgW0ZlYXR1cmUgU2VsZWN0aW9uIHdpdGggdGhlIEJvcnV0YSBQYWNrYWdlXShodHRwczovL3d3dy5qc3RhdHNvZnQub3JnL2FydGljbGUvdmlldy92MDM2aTExKSBhbmQgaW1wbGVtZW50IHRoZSBhbGdvcml0aG0uCgpXaGljaCBmZWF0dXJlcyBhcmUgbW9zdCBpbXBvcnRhbnQgYXMgZGV0ZXJtaW5lZCBieSB0aGUgQm9ydXRhIFJhbmRvbUZvcmVzdCBBbGdvcml0aG0/CgoKCg==