Data Mining Questions and Answers | DM | MCQ

Data Mining Questions and Answers | DM | MCQ

In this Data Mining MCQ , we will cover these topics such as data mining, techniques for data mining, techniques data mining, what is data mining, define data mining, definition data mining, data mining and analysis, process of data mining, data analysis and mining, data mining techniques, software data mine, data mining processes, data mining in research, concept of data mining and so on.

Question 1
This  clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration
Select one:
a. K-Means clustering
b. conceptual clustering
c. expectation maximization
d. agglomerative clustering

Feedback: K-Means clustering

Question 2
This clustering approach initially assumes that each data instance represents a single cluster.
Select one:
a. expectation maximization
b. K-Means clustering
c. agglomerative clustering
d. conceptual clustering
Feedback:

Question 3
The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
Select one:
a. The attributes are not linearly related.
b. As the value of one attribute decreases the value of the second attribute increases.
c. As the value of one attribute increases the value of the second attribute also increases.
d. The attributes show a linear relationship
Feedback: As the value of one attribute decreases the value of the second attribute increases.

Question 4
Time Complexity of k-means is given by
Select one:
a. O(mn)
b. O(tkn)
c. O(kn)
d. O(t2kn)
Feedback: O(tkn)

Question 5
Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that
Select one:
a. Y is false when X is known to be false.
b. Y is true when X is known to be true.
c. X is true when Y is known to be true
d. X is false when Y is known to be false.
Feedback: Y is true when X is known to be true.


Question 6
Chameleon is
Select one:
a. Density based clustering algorithm
b. Partitioning based algorithm
c. Model based algorithm
d. Hierarchical clustering algorithm
Feedback: Hierarchical clustering algorithm

Question 7
In _________ clusterings, points may belong to multiple clusters
Select one:
a. Non exclusivce
b. Partial
c. Fuzzy
d. Exclusive
Feedback: Fuzzy

Question 8
Find odd man out
Select one:
a. DBSCAN
b. K mean
c. PAM
d. K medoid
Feedback: DBSCAN

Question 9
Which statement is true about the K-Means algorithm?
Select one:
a. The output attribute must be cateogrical.
b. All attribute values must be categorical.
c. All attributes must be numeric
d. Attribute values may be either categorical or numeric
Feedback: All attributes must be numeric




Question 10
This data transformation technique works well when minimum and maximum values for a real-valued attribute are known.
Select one:
a. z-score normalization
b. min-max normalization
c. logarithmic normalization
d. decimal scaling
Feedback: min-max normalization

Question 11
The number of iterations in apriori ___________
Select one:
a. increases with the size of the data
b. decreases with the increase in size of the data
c. increases with the size of the maximum frequent set
d. decreases with increase in size of the maximum frequent set
Feedback: increases with the size of the maximum frequent set

Question 12
Which of the following are interestingness measures for association rules?
Select one:
a. recall
b. lift
c. accuracy
d. compactness
Feedback: lift

Question 13
Which one of the following is not a major strength of the neural network approach?
Select one:
a. Neural network learning algorithms are guaranteed to converge to an optimal solution
b. Neural networks work well with datasets containing noisy data.
c. Neural networks can be used for both supervised learning and unsupervised clustering
d. Neural networks can be used for applications that require a time element to be included in the data
Feedback: Neural network learning algorithms are guaranteed to converge to an optimal solution

Question 14
Find odd man out
Select one:
a. K medoid
b. K mean
c. DBSCAN
d. PAM
Feedback: DBSCAN

Question 15
Given a frequent itemset L, If |L| = k, then there are
Select one:
a. 2k   – 1 candidate association rules
b. 2k   candidate association rules
c. 2k   – 2 candidate association rules
d. 2k -2 candidate association rules
Feedback: 2k -2 candidate association rules

Question 16
. _________ is an example for case based-learning
Select one:
a. Decision trees
b. Neural networks
c. Genetic algorithm
d. K-nearest neighbor
Feedback: K-nearest neighbor

Question 17
The average positive difference between computed and desired outcome values.
Select one:
a. mean positive error
b. mean squared error
c. mean absolute error
d. root mean squared error
Feedback: mean absolute error

Question 18
Frequent item sets is
Select one:
a. Superset of only closed frequent item sets
b. Superset of only maximal frequent item sets
c. Subset of maximal frequent item sets
d. Superset of both closed frequent item sets and maximal frequent item sets
Feedback: Superset of both closed frequent item sets and maximal frequent item sets

Question 19
1. Assume that we have a dataset containing information about 200 individuals.  A supervised data mining session has discovered the following rule:
IF  age < 30 & credit card insurance = yes   THEN life insurance = yes
Rule Accuracy:    70%   and  Rule Coverage:   63%
How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old?
Select one:
a. 63
b. 30
c. 38
d. 70
Feedback: 38




Question 20
Use the three-class confusion matrix below to answer percent of the instances were correctly classified?

Computed Decision

Class 1
Class 2
Class 3
Class 1
10
5
3
Class 2
5
15
3
Class 3
2
2
5
Select one:
a. 60
b. 40
c. 50
d. 30
Feedback: 60


Question 21
Which of the following is cluster analysis?
Select one:
a. Simple segmentation
b. Grouping similar objects
c. Labeled classification
d. Query results grouping
Feedback: Grouping similar objects

Question 22
A good clustering method will produce high quality clusters with
Select one:
a. high inter class similarity
b. low intra class similarity
c. high intra class similarity
d. no inter class similarity
Feedback: high intra class similarity

Question 23
Which two parameters are needed for DBSCAN
Select one:
a. Min threshold
b. Min points and eps
c. Min sup and min confidence
d. Number of centroids
Feedback: Min points and eps

Question 24
Which statement is true about neural network and linear regression models?
Select one:
a. Both techniques build models whose output  is determined by a  linear sum of weighted input attribute values.
b. The output of both models is a categorical attribute value.
c. Both models require numeric attributes to range between 0 and 1.
d. Both models require input attributes to be numeric.
Feedback: Both models require input attributes to be numeric.

Question 25
In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are
Select one:
a. 100
b. 4950
c. 200
d. 5000
Feedback: 4950

Question 26
Significant Bottleneck in the Apriori algorithm is
Select one:
a. Finding frequent itemsets
b. Pruning
c. Candidate generation
d. Number of iterations
Feedback: Candidate generation

Question 27
The concept of core, border and noise points fall into this category?
Select one:
a. DENCLUE
b. Subspace clustering
c. Grid based
d. DBSCAN
Feedback: DBSCAN

Question 28
The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
Select one:
a. The attributes show a linear relationship
b. The attributes are not linearly related.
c. As the value of one attribute increases the value of the second attribute also increases.
d. As the value of one attribute decreases the value of the second attribute increases.
Feedback: As the value of one attribute decreases the value of the second attribute increases.

Question 29
Machine learning techniques differ from statistical techniques in that machine learning methods
Select one:
a. are better able to deal with missing and noisy data
b. typically assume an underlying distribution for the data
c. have trouble with large-sized datasets
d. are not able to explain their behavior.
Feedback: are better able to deal with missing and noisy data




Question 30
The probability of a hypothesis before the presentation of evidence.
Select one:
a. a priori
b. posterior
c. conditional
d. subjective
Feedback: a priori

Question 31
KDD represents extraction of
Select one:
a. data
b. knowledge
c. rules
d. model
Feedback: knowledge

Question 32
Which statement about outliers is true?
Select one:
a. Outliers should be part of the training dataset but should not be present in the test data.
b. Outliers should be identified and removed from a dataset.
c. The nature of the problem determines how outliers are used
d. Outliers should be part of the test dataset but should not be present in the training data.
Feedback: The nature of the problem determines how outliers are used

Question 33
The most general form of distance is
Select one:
a. Manhattan
b. Eucledian
c. Mean
d. Minkowski
Feedback: Minkowski

Question 34
Arbitrary shaped clusters can be found by using
Select one:
a. Density methods
b. Partitional methods
c. Hierarchical methods
d. Agglomerative
Feedback: Density methods

Question 35
Which Association Rule would you prefer
Select one:
a. High support and medium confidence
b. High support and low confidence
c. Low support and high confidence
d. Low support and low confidence
Feedback: Low support and high confidence

Question 36
With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as
Select one:
a. a conditional probability
b. an a priori probability
c. a bidirectional probability
d. a posterior probability
Feedback: an a priori probability

Question 37
In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R
Select one:
a. Exhaustive
b. Inclusive
c. Comprehensive
d. Mutually exclusive
Feedback: Exhaustive

Question 38
The apriori property means
Select one:
a. If a set cannot pass a test, its supersets will also fail the same test
b. To decrease the efficiency, do level-wise generation of frequent item sets
c. To improve the efficiency, do level-wise generation of frequent item sets
d. If a set can pass a test, its supersets will fail the same test
Feedback: If a set cannot pass a test, its supersets will also fail the same test

Question 39
If  an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item set are

Select one:
a. Undefined
b. Not frequent
c. Frequent
d. Can not say

Feedback: Frequent




Question 40
Clustering is ___________ and is example of ____________learning
Select one:
a. Predictive and supervised
b. Predictive and unsupervised
c. Descriptive and supervised
d. Descriptive and unsupervised
Feedback: Descriptive and unsupervised

Question 41
The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%.  We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%.  Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
Select one:
a. 0.0368
b. 0.0396
c. 0.0389
d. 0.0398
Feedback: 0.0396

Question 42
Simple regression assumes a __________ relationship between the input  attribute and output attribute.
Select one:
a. quadratic
b. inverse
c. linear
d. reciprocal
Feedback: linear

Question 43
Which of the following algorithm comes under the classification
Select one:
a. Apriori
b. Brute force
c. DBSCAN
d. K-nearest neighbor
Feedback: K-nearest neighbor

Question 44
Hierarchical agglomerative clustering is typically visualized as?
Select one:
a. Dendrogram
b. Binary trees
c. Block diagram
d. Graph
Feedback: Dendrogram

Question 45
The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support
Select one:
a. Partitioning
b. Candidate generation
c. Itemset eliminations
d. Pruning
Feedback: Pruning

Question 46
To determine association rules from frequent item sets
Select one:
a. Only minimum confidence needed
b. Neither support not confidence needed
c. Both minimum support and confidence are needed
d. Minimum support is needed
Feedback: Both minimum support and confidence are needed

Question 47
What is the final resultant cluster size in Divisive algorithm, which is one of the hierarchical clustering approaches?
Select one:
a. Zero
b. Three
c. singleton
d. Two
Feedback: singleton

Question 48
If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is
Select one:
a. C –> A
b. D –>ABCD
c. A –> BC
d. B –> ADC
Feedback: D –>ABCD

Question 49
Which Association Rule would you prefer
Select one:
a. High support and low confidence
b. Low support and high confidence
c. Low support and low confidence
d. High support and medium confidence
Feedback: Low support and high confidence



Question 50
The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%.  We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%.  Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
Select one:
a. 0.0398
b. 0.0389
c. 0.0368
d. 0.0396
Feedback: 0.0396
Question 51
This  clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iterationSelect one:
a. conceptual clustering
b. K-Means clustering
c. expectation maximization
d. agglomerative clustering
Feedback: K-Means clustering

Question 52
Simple regression assumes a __________ relationship between the input  attribute and output attribute.Select one:
a. reciprocal
b. quadratic
c. inverse
d. linear
Feedback: linear

Question 53
The distance between two points calculated using Pythagoras theorem is
Select one:
a. Supremum distance
b. Eucledian distance
c. Linear distance
d. Manhattan Distance
Feedback: Eucledian distance

Question 54
Classification rules are extracted from _____________
Select one:
a. decision tree
b. root node
c. branches
d. siblings
Feedback: decision tree

Question 55
What does K refers in the K-Means algorithm which is a non-hierarchical clustering approach?
Select one:
a. Complexity
b. Fixed value
c. No of iterations
d. number of clusters

Feedback: number of clusters