MMPC-005
MANAGEMENT PROGRAMME (MP)
Term-End Examination December, 2022
MMPC–005 : QUANTITATIVE ANALYSIS FOR MANAGERIAL APPLICATIONS
Time : 3 Hours Maximum Marks : 100 Note : Section A has six questions, each carrying 15 marks. Attempt any four questions. Section B is compulsory and carries 40 marks. Attempt both questions. Use of calculator is permissible.
Section—A
Q1. “The arithmetic mean is the most commonly used and readily understood measure of central tendency.” Do you agree ? Comment. Also, explain the mathematical properties of arithmetic mean.
Ans. The arithmetic mean is indeed one of the most commonly used measures of central tendency in statistics. It is widely used in many fields, including economics, finance, and science, to summarize a set of data by a single value. The arithmetic mean is often used because it is straightforward to calculate and easy to understand.
However, it is important to note that the arithmetic mean may not always be the most appropriate measure of central tendency, particularly in cases where the data is skewed or has extreme values. In such situations, other measures like the median or mode may be more appropriate.
Now, let's move on to the mathematical properties of the arithmetic mean:
1. Additive Property: If a constant is added to each value in a set of data, then the arithmetic mean of the new set of data will be equal to the arithmetic mean of the original set of data plus the constant.
2. Multiplicative Property: If each value in a set of data is multiplied by a constant, then the arithmetic mean of the new set of data will be equal to the arithmetic mean of the original set of data multiplied by the constant.
3.Consistency: The arithmetic mean is a consistent estimator of the true mean of the population from which the data was sampled. This means that as the sample size increases, the arithmetic mean will converge to the true mean of the population.
4. Invariance Property: The arithmetic mean is invariant to linear transformations of the data, which means that it will remain the same if the data is scaled or shifted.
5. Differentiability: The arithmetic mean is a differentiable function of the data. This means that if we change a single data point slightly, the arithmetic mean will also change, but only by a small amount.
6. Uniqueness: The arithmetic mean is a unique function of the data, which means that for any given set of data, there is only one arithmetic mean.
7. Continuity: The arithmetic mean is a continuous function of the data, which means that if we change the data slightly, the arithmetic mean will also change slightly.
8. Efficiency: The arithmetic mean is an efficient estimator of the population mean, which means that it has the smallest variance among all unbiased estimators of the mean.
9. Interpretability: The arithmetic mean is easily interpretable as the balance point of the data, which means that it represents the value that divides the data into two equal parts.
The CDF of the exponential distribution is given by:
CDF = 1 - e^(-λx)
where λ is the rate parameter, which is equal to the inverse of the MTBF. In this case, λ = 1/1000.
To find the probability that a bulb will burn more than 1000 hours, we need to subtract the CDF evaluated at x=1000 from 1:
P(X > 1000) = 1 - CDF(1000)
Substituting the values, we get:
P(X > 1000) = 1 - (1 - e^(-1))
Simplifying the expression gives:
P(X > 1000) = e^(-1)
So the probability that a bulb will burn more than 1000 hours is approximately 0.3679 or 36.79%.
Q3. Calculate the standard deviation of the following distribution : Age No. of Persons 20—25 170 25—30 110 30—35 80 35—40 45 40—45 40 45—50 35
Soll: To calculate the standard deviation of the given distribution, we need to first find the mean (average) age. We can do this by using the formula:
mean = (Σfx) / n
where f is the frequency of each age group, x is the midpoint of each age group, and n is the total number of persons.
Using this formula, we get:
mean = ((22.5 * 170) + (27.5 * 110) + (32.5 * 80) + (37.5 * 45) + (42.5 * 40) + (47.5 * 35)) / 480 = 27.19
So the mean age is approximately 27.19 years.
Next, we need to calculate the variance of the distribution using the formula:
variance = (Σf(x - mean)^2) / n
Substituting the values, we get:
variance = ((170 * (22.5 - 27.19)^2) + (110 * (27.5 - 27.19)^2) + (80 * (32.5 - 27.19)^2) + (45 * (37.5 - 27.19)^2) + (40 * (42.5 - 27.19)^2) + (35 * (47.5 - 27.19)^2)) / 480 = 54.31
So the variance of the distribution is approximately 54.31.
Finally, we can calculate the standard deviation by taking the square root of the variance:
standard deviation = sqrt(variance) = sqrt(54.31) = 7.37
So the standard deviation of the distribution is approximately 7.37 years.
Q4. What do you understand by ‘Correlation Coefficient’ ? Discuss the different types of association between variables.
Ans: Correlation coefficient is a statistical measure that quantifies the strength and direction of the relationship between two variables. It ranges from -1 to +1, with values close to -1 indicating a strong negative correlation (when one variable goes up, the other goes down), values close to +1 indicating a strong positive correlation (when one variable goes up, the other goes up), and a value of 0 indicating no correlation between the variables.
There are different types of association between variables:
1. Positive Correlation: This type of association occurs when an increase in one variable is associated with an increase in the other variable. In this case, the correlation coefficient will be positive, with values ranging from 0 to +1. For example, there is a positive correlation between the amount of time spent studying and exam scores.
2. Negative Correlation: This type of association occurs when an increase in one variable is associated with a decrease in the other variable. In this case, the correlation coefficient will be negative, with values ranging from 0 to -1. For example, there is a negative correlation between the amount of sleep and stress levels.
3. Zero Correlation: This type of association occurs when there is no relationship between the two variables. In this case, the correlation coefficient will be 0. For example, there is zero correlation between the height of a person and their favorite color.
4. Curvilinear Correlation: This type of association occurs when the relationship between the two variables is not linear. In this case, the correlation coefficient may be positive or negative, but the relationship between the variables is not a straight line. For example, the relationship between alcohol consumption and reaction time may be curvilinear, with moderate alcohol consumption associated with faster reaction times than either very low or very high levels of consumption.
5. Spurious Correlation: This type of association occurs when two variables appear to be correlated, but the correlation is actually due to a third variable that is related to both of them. For example, there may be a spurious correlation between the number of ice cream sales and the number of drownings, but the real cause of both variables is the temperature.
6. Causal correlation: A causal correlation exists when a change in one variable causes a change in the other variable.
Overall, understanding the type of association between variables is important in determining the strength and direction of the relationship between the variables, and in making meaningful interpretations of statistical results.
Q5. Explain the ‘Chi-square distribution’. Explain how would you use it in testing independence of categorised data and testing the goodness of fit.
Ans. The Chi-square distribution is a probability distribution that is used inhypothesis testing to determine whether observed data differs significantlfrom expected data. It is a continuous probability distribution that takes only positive values and has a single parameter called the degrees of freedom (df).
The Chi-square distribution is used in two types of tests: testing independence of categorized data and testing the goodness of fit.
1. Testing independence of categorized data:
This test is used to determine whether two categorical variables are independent of each other or not. The test involves calculating the Chi-square statistic from the observed and expected frequencies of the categories and comparing it to the critical value of the Chi-square distribution with (r-1)(c-1) degrees of freedom, where r and c are the number of rows and columns in the contingency table, respectively. If the calculated Chi-square statistic is greater than the critical value, then we reject the null hypothesis that the two variables are independent, and conclude that there is a significant association between them.
2. Testing the goodness of fit:
This test is used to determine whether a sample of data fits a particular theoretical distribution or not. The test involves comparing the observed frequencies of the data with the expected frequencies of the data based on a theoretical distribution using the Chi-square statistic. The degrees of freedom for this test are equal to the number of categories minus 1. If the calculated Chi-square statistic is greater than the critical value, then we reject the null hypothesis that the sample fits the theoretical distribution and conclude that there is a significant difference between the observed and expected frequencies.
In
both types of tests, the Chi-square statistic provides a measure of the
difference between the observed and expected frequencies. A higher value of the
Chi-square statistic indicates a greater deviation from the null hypothesis.
The critical value of the Chi-square distribution is used to determine whether
the deviation is significant or due to chance.
6. Write short notes on any three of the following :
(a) Guidelines for choosing the class
(b) Poisson distribution
(c) Two-tailed test
(d) Type I and Type II error
(e) Rank correlation
Ans. (a) Guidelines for choosing the class:
When constructing a histogram, it is important to choose the appropriate number and width of the classes (bins) in order to accurately represent the data. Here are some guidelines for choosing the class:
- The number of classes should be between 5 and 20, depending on the amount of data.
- The class width should be equal for all classes.
- The class width should be chosen such that each observation falls into only one class.
- The class intervals should be easily understandable and not overlapping.
- The class interval should be chosen such that it highlights the important features of the data set.
(b) Poisson distribution:
The Poisson distribution is a probability distribution used to model the number of occurrences of rare events in a fixed interval of time or space. It is characterized by a single parameter, λ, which represents the average rate of occurrence of the event. The probability of observing k events in the interval is given by the formula:
P(k; λ) = (e^-λ * λ^k) / k!
Where e is the mathematical constant approximately equal to 2.71828.
(c) Two-tailed test:
A two-tailed test is a statistical test in which the null hypothesis is rejected if the test statistic falls in either tail of the distribution of the test statistic. This type of test is used when the researcher is interested in whether the sample mean is significantly different from a hypothesized population mean in either direction. The significance level is typically divided equally between the two tails of the distribution.
(d) Type I and Type II error:
In hypothesis testing, a Type I error occurs when the null hypothesis is rejected even though it is true. This error is also known as a false positive. The probability of making a Type I error is denoted by α, which is the significance level of the test.
A Type II error occurs when the null hypothesis is not rejected even though it is false. This error is also known as a false negative. The probability of making a Type II error is denoted by β, which depends on the sample size, the effect size, and the level of significance.
(e) Rank correlation:
Rank correlation is a type of non-parametric correlation that measures the strength of association between two variables based on the ranks of the observations rather than their numerical values. The most commonly used rank correlation coefficient is Spearman's rank correlation coefficient, which ranges from -1 to +1. A coefficient of +1 indicates a perfect positive rank correlation, while a coefficient of -1 indicates a perfect negative rank correlation. A coefficient of 0 indicates no rank correlation. Rank correlation is used when the data do not follow a normal distribution or when there are outliers in the data.
Q7. Calculate the coefficient of correlation from the
following data :
X |
Y |
9 8 7 6 5 4 3 2 1 |
15 16 14 13 11 12 10 8 9 |
Soll: Type 1
To calculate the coefficient of correlation (r), we first need to calculate the mean, standard deviation, and covariance of both X and Y.
Mean of X (x̄) = (9+8+7+6+5+4+3+2+1)/9 = 5
Mean of Y (ȳ) = (15+16+14+13+11+12+10+8+9)/9 = 12
Using these means, we can calculate the deviations for each data point:
X deviation (x - x̄):
4 3 2 1 0 -1 -2 -3 -4
Y deviation (y - ȳ):
3 4 2 1 -1 0 -2 -4 -3
Next, we need to calculate the standard deviation for X (sX) and Y (sY):
sX = sqrt((1/(9-1))*((4-5)^2 + (3-5)^2 + (2-5)^2 + (1-5)^2 + (0-5)^2 + (-1-5)^2 + (-2-5)^2 + (-3-5)^2 + (-4-5)^2)) = 2.738
sY = sqrt((1/(9-1))*((3-12)^2 + (4-12)^2 + (2-12)^2 + (1-12)^2 + (-1-12)^2 + (0-12)^2 + (-2-12)^2 + (-4-12)^2 + (-3-12)^2)) = 2.983
Then, we can calculate the covariance (Sxy):
Sxy = (1/(9-1))*((4-5)*(3-12) + (3-5)*(4-12) + (2-5)*(2-12) + (1-5)*(1-12) + (0-5)*(-1-12) + (-1-5)*(0-12) + (-2-5)*(-2-12) + (-3-5)*(-4-12) + (-4-5)*(-3-12)) = -11.5
Finally, we can calculate the coefficient of correlation (r):
r = Sxy / (sX * sY) = -11.5 / (2.738 * 2.983) = -1.02
Therefore, the coefficient of correlation for the given data is -1.02, indicating a strong negative correlation between X and Y.
Soll Type 2:
1. Calculate the mean of X and Y:
x̄ = (9+8+7+6+5+4+3+2+1)/9 = 5
ȳ = (15+16+14+13+11+12+10+8+9)/9 = 12
2. Calculate the deviation for each data point for both X and Y by subtracting the mean from each value:
X deviation (x - x̄): 4, 3, 2, 1, 0, -1, -2, -3, -4
Y deviation (y - ȳ): 3, 4, 2, 1, -1, 0, -2, -4, -3
3. Calculate the sum of the product of the deviations for each data point:
Σ((x - x̄) * (y - ȳ)) = (4*3) + (3*4) + (2*2) + (1*1) + (0*-1) + (-1*0) + (-2*-2) + (-3*-4) + (-4*-3) = -115
4. Calculate the sum of the squared deviations for X:
Σ(x - x̄)^2 = (4-5)^2 + (3-5)^2 + (2-5)^2 + (1-5)^2 + (0-5)^2 + (-1-5)^2 + (-2-5)^2 + (-3-5)^2 + (-4-5)^2 = 72
5. Calculate the sum of the squared deviations for Y:
Σ(y - ȳ)^2 = (3-12)^2 + (4-12)^2 + (2-12)^2 + (1-12)^2 + (-1-12)^2 + (0-12)^2 + (-2-12)^2 + (-4-12)^2 + (-3-12)^2 = 89
6. Calculate the standard deviation for X:
sX = sqrt(Σ(x - x̄)^2 / (n-1)) = sqrt(72/8) = 2.738
7. Calculate the standard deviation for Y:
sY = sqrt(Σ(y - ȳ)^2 / (n-1)) = sqrt(89/8) = 2.983
8. Finally, calculate the coefficient of correlation (r) as:
r = Σ((x - x̄) * (y - ȳ)) / (sX * sY * (n-1)) = -115 / (2.738 * 2.983 * 8) = -1.02
Therefore, the coefficient of correlation for the given data is -1.02, indicating a strong negative correlation between X and Y.
Q 8. Explain the concept of Bernoulli process along with
Binomial Distribution in detail.
Ans. The Bernoulli process is a sequence of independent, identically distributed (i.i.d.) random variables, each of which takes on one of two possible outcomes with a fixed probability. This process is named after Swiss mathematician Jacob Bernoulli, who studied its properties in the early 18th century. The two possible outcomes are usually referred to as "success" and "failure," although they can represent any pair of complementary events.
The Bernoulli process is characterized by the following properties:
- Each trial has only two possible outcomes, which we denote by "success" and "failure."
- The probability of success, denoted by p, is constant across all trials.
- The trials are independent, meaning that the outcome of each trial does not depend on the outcomes of previous or future trials.
The Bernoulli process is often used to model real-world situations where there are only two possible outcomes, such as flipping a coin, rolling a die, or conducting a binary experiment like success/failure.
The Binomial distribution is a probability distribution that describes the number of successes in a fixed number of independent, identical Bernoulli trials. It is used to calculate the probability of observing a certain number of successes in a given number of trials, given the probability of success on each trial.
The Binomial distribution is characterized by the following parameters:
- n: the number of trials
- p: the probability of success on each trial
- x: the number of successes
The Binomial distribution is defined by the probability mass function:
P(X = x) = (n choose x) * p^x * (1-p)^(n-x)
where "n choose x" is the binomial coefficient, which represents the number of ways to choose x items from a set of n items.
The Binomial distribution has several important properties, including:
- Mean: np
- Variance: np(1-p)
- The shape of the distribution is determined by the values of n and p.
- When n is large and p is close to 0.5, the Binomial distribution can be approximated by a Normal distribution.
Overall, the Bernoulli process and Binomial distribution
are fundamental concepts in probability theory, and they are widely used in
various fields such as statistics, finance, and engineering.
No comments:
Post a Comment