Workflow 2

  • read data
  • determine percentage of missing values per attribute
  • discard attributes for which percentage of missing values is larger than a user-specified threshold
  • fill in missing values for remaining attributes using method appropriate for each attribute, depending on the missing data mechanism
  • decide which normalization technique to use
  • normalize the data
  • construct training and test set using sampling with replacement (out-of-bag estimation)
  • determine parameters for ANN (number of hidden layers, number of nodes in hidden layers, etc)
  • build model using training set
  • evaluate model using test set
  • determine which samples predicted incorrectly
  • store these samples only for a later investigation
  • save model
  • discard intermediate datasets