MSMIT CSC550: Data Mining
Quiz 1
1. Which of the following is an example of an unsupervised learning algorithm?
A) Affinity analysis
B) Data reduction method
C) Clustering technique
D) All of the above
2. The
...
MSMIT CSC550: Data Mining
Quiz 1
1. Which of the following is an example of an unsupervised learning algorithm?
A) Affinity analysis
B) Data reduction method
C) Clustering technique
D) All of the above
2. The main criticism to using the same dataset to both build and validate a model is that it:
A) Requires the collection of more data.
B) Doesn’t account for missing data.
C) Requires data normalization.
D) Introduces bias.
3. When considering the relationship between the number of variables and the number of records in a dataset, a good rule of thumb for data mining activities is _______.
A) the more data, the better
B) at least ten variables for each record
C) two records for each variable
D) at least ten records for each variable
4. After a model has been initially developed and validated, it is desirable to revalidate the model using a separate _______ dataset before deploying the model.
A) test
B) research
C) validation
D) parallel
5. The discovery that the purchase of tooth paste and dental floss are commonly found together is an example of a(n) _______.
A) structured learning
B) classification
C) association rule
D) validation
6. Which of the following would be best described as a data mining task?
A) Looking up an employee’s current vacation and sick leave record from an HR file.
B) Analyzing supermarket checkout data to see if relationships exist between items purchased and the day of the week.
C) Generating a monthly report of the sales revenues of a chain of bookstores in the northeastern region of the country.
D) Reconciling your monthly checking account.
7. In supervised learning, once an algorithm has learned from the training data, the algorithm is then applied to another sample of data that is known as the _______ data.
A) Validation
B) Test
C) Master
D) Population
8. __________ occurs when a statistical model describes random error or noise instead of the underlying relationship.
A) Validation
B) Cluttering
C) Overfitting
D) Smoothing
9. The process of providing an algorithm (procedure) with records in which an output variable is known and the algorithm “learns” how to predict the value with new records is known as:
A) Regression Analysis
B) Clutter Analysis
C) Supervised learning
D) Unsupervised learning
10. In general, which of the following statements is true about the amount of variables in a model and the data requirements?
A) The more variables that a model includes, the more data will be required to validate the model.
B) The more variables that a model includes, the less data will be required to validate the model.
C) There is no relationship between the number of variables in a model and the amount of data required to validate the model.
D) None of the above.
[Show More]