AM by submitting on blackboard Guideline This homework is an “individual” data mining experiment. Plagiary is definitely not allowed. If any classmate or other person helps you on doing this homework, you need to specify who and which portion you got help from. You credit will be given to the helper (it is fair, right).
The helper should also mention who get your assistant on this homework. Zero point will be given if your homework is found to be the same with others without any mention. You are required to use computer language (C, C++, or Java) or computer software (matlab or Weka) to do the data mining experiment and analysis. Other software or language is allowed based on the approval of the instructor. You need to specify which software or program you are using for this homework.
If you use other person’s program or any program downloaded from internet, you need to address where you get it, and who is the author. If you decide to write your own program, please submit your source code. Extra credit will be given if you write your own program on any portion of this homework. No matter which kind of method you choose for this homework, you need to be careful on adjusting the parameter, if there is. Please do an experiment on how to obtain the better parameters and write down you analysis on this homework. You need to submit your homework written by MS Word through blackboard system.
The homework should not longer than 10-page limit (source code should put in the appendix).
As the times change, so must an educator's style of teaching. Computer technology can play a large role in this change. There are many reasons and ways schools can introduce this technology into their curriculum. There will be three of these reasons and ways discussed in the following pages. In today's society, many people believe it is time for school reform. The problem is employers are ...
No late homework is allowed! I. Congressional Voting Records (50%) http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records Go to the UCI Machine Learning Repository to download the “Congressional Voting Records Data Set” or download house-votes-84.csv file from blackboard. Then, chose at least two different classification methods (decision tree, rule-based, Bayesian, ANN, SVM, Ensemble) to predict party affiliation (democrat or republican).
You can use any kind of statistical software (such as mintab) or Excel to show the data exploration. Please PLOT it! How do you handle the missing values? The reasons of choosing classification methods Classification method implementation or software usage Specify how you do the experiment? Which software package you are using? Or, you write your own program? Also, you need to specify all the parameters you are using for the chosen methods, and explain how you make the adjustment. Result of 10-fold cross validation for each method Show your best result!
II. Wisconsin Diagnostic Breast Cancer (WDBC) (50%) http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) Go to the UCI Machine Learning Repository to download “Wisconsin Diagnostic Breast Cancer (WDBC)” dataset or download wdbc-data.csv file from blackboard. Please make sure you download wdbc.data not wpbc.data. Then, chose at least two different classification methods (decision tree, rule-based, Bayesian, ANN, SVM, Ensemble) to predict diagnostic result (malignant or benign).
You homework should contains following sections. 1. Data exploration You can use any kind of statistical software (such as mintab) or Excel to show the data exploration. Please PLOT it!
2. The reasons of choosing classification methods 3. Apply one dimension reduction technique on the dataset 4. Classification method implementation or software usage Specify how you do the experiment? Which software package you are using? Or, you write your own program? Also, you need to specify all the parameters you are using for the chosen methods, and explain how you make the adjustment. 5. Result of 10-fold cross validation for each method Show your best result! 6. Model Comparison III. Extra credit (20%) Review some classification papers (at least one paper for each dataset) which use these two dataset for their experiment. Compare your result with them. Summarize what you found.
The article, Decomposing the Education Wage Gap: Everything but the Kitchen Sink” talks in length about the erosion of “wages ranging from both time and educational status. Their results confirm the importance of investments in and use of technology” (Hotchkiss & Shiferaw, 2011, p. 1). The authors Hotchkiss and Shiferaw also show that demand and supply factors played very different roles in ...