Statistical Analysis Expert Quiz
1) In an experiment to determine if antibiotics increase the final dressed weight of cattle,the following were measured on each animal in the study: Sex, weight gain, grade of meat Where grade is recorded as (A, B, or C) The scales of measurement of these variables are:
- Nominal, interval, nominal
- Nominal, ratio, nominal
- Nominal, ratio, ordinal
- Ordinal, ratio, ordinal
2) Which of the following is NOT CORRECT?
- The scatterplot is the basic graphical tool for investigating relationships between two continuous interval or ratio scaled variables.
- The frequency table is useful for summarizing data from a nominal scaled variable.
- Means and standard deviations of nominal or ordinal scaled variables are useful summary measures.
- Boxplots perform well for comparing groups because it is relatively straightforward to see how the mean and median change over the groups.
3) If most of the measurements in a large data set are of approximately the same magnitude except for a few measurements that are quite a bit larger, how would the mean and median of the data set compare and what shape would a histogram of the data set have?
- The mean would be smaller than the median and the histogram would be skewed with a long left tail.
- The mean would be larger than the median and the histogram would be skewed with a long right tail.
- The mean would be larger than the median and the histogram would be skewed with a long left tail.
- The mean would be equal to the median and the histogram would be symmetrical.
4) In measuring the centre of the data from a skewed distribution, the median would be preferred over the mean for most purposes because
- The median is the most frequent number while the mean is most likely
- The mean may be too heavily influenced by the larger observations and this gives too high an indication of the centre
- The median is less than the mean and smaller numbers are always appropriate for the centre
- The median measures the arithmetic average of the data excluding outliers
5) The chances that you will be ticketed for illegal parking on campus are about 1/3. During the last nine days, you have illegally parked every day and have NOT been ticketed! Today, on the 10th day, you again decide to park illegally. The chances that you will be caught are:
- less than 1/3 because you were not caught in the last nine days.
- still equal to 1/3 because the last nine days do not affect the probability
- equal to 1/10 because you were not caught in the last nine days.
- equal to 9/10 because you were not caught in the last nine days.
6) Cans of soft drinks cost $0.30 in a certain vending machine. What is the expected value and variance of daily revenue (Y) from the machine, if X, the number of cans sold per day has E(X) = 125, and Var(X) = 50?
- E(Y ) = 37.5, Var(Y ) = 50
- E(Y ) = 37.5, Var(Y ) = 4.5
- E(Y ) = 37.5, Var(Y ) = 15
- E(Y ) = 125, Var(Y ) = 4.5
7) There is an approximate linear relationship between the heights of females and their ages (from 5 to 18 years) described by:Height = 50.3 + 6.01 (Age) where height is measured in cm and age in years. Which of the following is NOT correct?
- The estimated slope is 6.01 which implies that children increase by about 6 cm for each year they grow older.
- The estimated height of a child who is 10 years old is about 110 cm.
- The estimated intercept is 50.3 cm which implies that children reach this height when they are 50.3/6.01=8.4 years old.
- The average height of children when they are 5 years old is about 50% of the average height when they are 18 years old.
8) x+y+z=12 then the maximum value of x2yz is
- 64
- 324
- 256
- None of the above
9) Let 20.5 , 32, 52.7 , 70.09 , 43.1 are observations of a random variable X . Define Y=0.5X-100 and Z=300-3X then
- Var(Y)>Var(Z)
- Var(Y)=Var(Z)
- Var(Y)<Var(Z)
- Can’t say
10) If E(X) is negative then
- E(X2)<E(X)2
- E(X2)≥E(X)2
- E(X)>E(X2)
- Can’t say
11) If rho is equal to .5 what is DW statistics?
- 1
- 5
- 2
- 5
12) the geometric mean of 10 observations on a certain variable was calculated as 16.2. It was later discovered that an observation was wrongly recorded as 12.9, instead of 21.9. Apply appropriate correction and calculate correct geometric mean.
- 08
- 9
- 2
- 9
13) if the price of a commodity doubles in a period of 4 yrs, wht is the average percentage increase per year?
- 7% (approx)
- 25%(approx)
- 100%(approx)
- 10%(approx)
14) The middle value of an ordered array of numbers is the
- mean
- median
- mode
- midpoint
15) which of the following is correct?
- a distribution is skewed when the mean and the median fall at different point in the distribution
- skewness actually refers to the symmetry in data
- a positively skewed distribution has mean<meadian<mode
- in a skwed distribution 2 quantiles will still be equidistant from median.
16) if covariance between X and Y variables is 10, the variance of X and Y variable is 16,19 respectively. Find the coefficient of correlation.
- 433
- 574
- 938
- 554
17) The ratio of the standard deviation of a distribution to the mean of that distribution is referred to as
- probability distribution
- expected return
- standard deviation
- coefficient of variation
18) The mean of a distribution is 23, the median is 24, and the mode is 25.5. It is most likely that this distribution is:
- negatively skewed
- positively skewed
- symmetrical
- asymptotic
19) the probability that the boy will get the scholarship is 0.90 and that a girl will get is 0.80. what is the probability that at least one of them will get the scholarship?
- 12
- 56
- 98
- 78
20) a box contains 3 red and 7 white balls. One ball is drawn at random and in its place ball of the other color is placed in the box. Now one ball is drawn at random. Finmd the probablity that it is red.
- 34
- 87
- 11
- 52
21) 7 new balls and 3 old are kept in identical boxes on a shelf in a store. Evaluate the probability that the box chosen at random will contain-1. an old ball. 2. a new ball
- 3/10 and 7/10
- 7/10 and 3/10
- 7/10 and 7/10
- 3/10 and 3/10
22) Which is not a assumption of Linear Regression?
- The errors are normally distributed.
- The independent variables are mutually uncorrelated.
- The errors are distributed with same mean 0 and variance 1.
- None of them.
23) The regression equation for healing time( H) in days vs. Dose Level( D) in mililitre is: H = 1.8 – 0.18* D. Then if the Dose Level is increased by 1.5ml then Healing Time will decrease by:
- 227 days
- 29 Days
- 273 Days
- 27 Days
24) Durbin-Watson Statistic is calculated for detecting:
- The multicollinearity between two explanatory variables in a regression
- Autoregression
- How much correlated is the residual series with itself.
- None of the above
25) The identifier of Poisson Distribution is:
- Mean is greater than variance
- Variance is less than mean
- Third order moment is not defined
- None of the above
26) Mahalnobis D2 is a measure of:
- Prediction
- Discrimination
- Dimension Reduction
- All of above
27) The difference between CHAID and CR&T is that:
- CHAID can split a parent node in multiple child nodes but C&RT can split only in binary
- CHAID can only classify, but C&RT can predict
- The tree growing is stopped by using Chi-Square distribution in CHAID, but CART can use information gain
- All of above
28) Which one of the following can be used as a Non-Parametric Regression method:
- Kruskal-Wallis Method
- Kolmogorov-Smirnov Method
- Logistic Regression
- Spline Regression
29) In CHAID the entry of a object in a node is driven by:
- Chi-Square Value
- Information Value
- Confusion Matrix
- None of the above
30) Autocorrelation Function helps to identify the:
- Order of Moving Average Component in the model
- Order of AR component in the model
- Order of Seasonal Variation in the data
- All of the above
31) The Logistic Regression is an example of :
- Non-Linear Regression
- Non-Parametric Regression
- Unsupervised Modelling
- All of the above
32) In Regression the Saturated Model means:
- The model with all significant variables
- The model with no variable
- The model with the as many variables as the number of observations
- None of the above.
33) Suppose there are two models with AIC 200.14 and 312.22. Which one you should use:
- The model with Higher AIC
- The model with lower AIC
- Both the models are equivalent
- Can’t say anything
34) The Augmented Dickey Fuller Test is used to check the presence of:
- Heteroscedasticity
- Non-Normality
- Non-Stationarity
- White Noise
35) K Means Clustering is an example of a:
- Non-Hierarchical Clustering
- Hierarachical Clustering
- Supervised Learning
- None of the above
36) Two events are mutually exclusive if:
- they are exclusively connected.
- they cannot occur together.
- they exclusively include mutuality.
- none of the above
37) Exponential smoothing is:
- a method to use number exponents to smooth the time series.
- one of the forecasting methods.
- a method of testing linearity.
- none of the above
38) Another expression for constant variance is:
39) What is mean
- mean is a measure of central tendancy
- mean is a measure of variation
- mean is the number of extreme values
- Mean is a measure of data richness
40) A series of readings was taken of the body temperature of a subject. The mean reading was found to be 30˚C with a standard deviation of 0.3˚C. When converted to ˚F (˚F = ˚C (1.8) + 32), the standard deviation is:
- 3
- 54
- 97
- 8
41) Weighing of 30 fish resulted in a mean of 30 g and a standard deviation of 2 g. After completing the weighing the scale was found to be misaligned that under reported every weight by 2 g. Standard deviation will change by
- 2 g
- 1 g
- 0 g
- 5 g
42) In general, which of the following statements is FALSE?
- The sample mean is more sensitive to extreme values than the median.
- The sample range is less sensitive to extreme values than the standard deviation.
- The sample standard deviation is a measure of spread around the sample mean.
- If a distribution is symmetric, then the mean will be equal to the median.
43) A sample of 99 distances has a mean of 24 feet and a median of 24.5 feet. Unfortunately, it has just been discovered that an observation which was erroneously recorded as “30” actually had a value of “35”. If we make this correction to the data, then:
- The mean remains the same, but the median is increased
- The mean and median remain the same
- The median remains the same, but the mean is increased
- The mean and median are both increased
44) From tax records, it is relative easy to determine the amount of liquor consumed per capita and the number of cigarettes consumed per capita for each of the 10 provinces of Canada. These are plotted on a scatter plot and a high positive correlation is found. Which of the following is correct?
- This implies that heavy smoking causes people to drink more
- This implies that heavy drinking causes people to smoke more
- This could be an example of a correlation caused by a common cause because both activities are highly correlated with average family income and average income varies widely among the provinces.
- We cannot conclude cause and effect, but this also implies that the same individuals both smoke and consume liquor.
45) A research study has reported that there is a correlation of r = −0.59 between the eye color (brown, green, blue) of an experimental animal and the amount of nicotine that is fatal to the animal when consumed. This indicates:
- nicotine is less harmful to one eye color than the others
- the lethal dose of nicotine goes down as the eye color of the animal changes
- the researchers need to do further study to explain the causes of this negative correlation
- the research is worthless because correlation is not an appropriate measure of association in this situation
46) The best way to recognize whether or not a variable is growing exponentially over time is by
- plotting the variable against time and looking for a straight-line pattern
- calculating the least squares regression line of the variable against time and examining the residuals
- plotting the logarithm of the variable against time and looking for a straight line pattern
- smoothing the time series by running medians of three or five
47) A random sample of 15 bank customers is selected to see how long (in minutes) they waited in line during the lunch hour. There results are as follows:
4 15 3 8 4 9 3 2 11 1 5 6 6 13 3 Calculate Median
- 11
- 5
- 3
- 2
48) Suppose that the 45th percentile for weight is 80 kg. This means
- 45 percent weigh more than 80 kg
- 45 percent weigh less than 80 kg
- 55 percent weigh less than 80 kg
- 80 percent weigh more than 45 kg
49) A series of readings was taken of the body temperature of a subject. The mean reading was found to be 30˚C with a standard deviation of 0.3˚C. When converted to ˚F (˚F = ˚C (1.8) + 32), the mean and standard deviation are:
- 54, 0.30
- 86, 0.54
- 86, 0.97
- 86, 1.80
50) Weighing of 30 fish resulted in a mean of 30 g and a standard deviation of 2 g. After completing the weighing the scale was found to be misaligned that under reported every weight by 2 g. What is mean and standard deviation after correcting for the error in the scale?
- 28 g, 2 g
- 30 g, 4 g
- 32 g, 2 g
- 28 g, 4 g
51) You are allowed to choose four whole numbers from 1 to 10 (inclusive, without repeats). Which of the following is FALSE?
- The numbers 7, 8, 9, 10 have the smallest possible standard deviation.
- The numbers 1, 2, 3, 4 have the smallest possible standard deviation.
- The numbers 1, 5, 6, 10 have the largest possible standard deviation.
- The numbers 1, 2, 9, 10 have the largest possible standard deviation.
52) If the standard deviation of a given data set is equal to zero, what can we say about the data values included in the given data set?
- Data sets have value only positive values
- Data set contains only negative values
- data set contains both positive and negative value
- data set contains observation which are same
53) Height of students is positivley correlated with the yield of wheat in a given year, what does this indicate
- there is a positive correlation between Wheat yield and students height
- spurious correlation
- can’t say
- the more the crop yield taller the students
54) A politician who is running for the office of governor of a state with 7 million registered voters commissions a survey. In the survey, 55% of the 10,000 registered voters interviewed say they plan to vote for her. The population of interest is the:
- 45%, or 4,500 voters interviewed who plan not to vote for her
- 10,000 registered voters interviewed
- 7 million registered voters in the state
- 55% of the 7 million registered voters in the state
55) What is Range
- The range is the difference between the highest and lowest score in a distribution
- Provides a measure of the spread of the middle 50% of the scores
- a measure based on the deviations of individual scores from the mean
- most frequent or common score in the distribution
56) Gender is a categorical variable
- FALSE
- TRUE
57) ___________is a measure of the strength of the linear association between two quantitative variables
- Relation coefficient
- Corelation coefficient
- All of the above
- None of the above
58) most frequent occurance in the distribution is
- Mean
- Median
- Mode
- None of the above
59) Which one of these statistics is unaffected by outliers?
- Mean
- Inter Quartile Range
- Standard Deviation
- Range
60) If the mean, median and mode of a distribution are 5, 6, 7 respectively, then the distribution is:
- not skewed
- skewed positively
- bimodal
- skewed negatively