|
Statistics
ANOVA:
(i): Three types of computer disks are
selected and the number of defects in each is as
follows. Type A has 0,1,0,2,3,2,0,1,1, and
0. Type B has 2,0,3,5,3,4,6,0,2 and
5. Type C has 1,0,1,1,0,2,0,0,1, and
2. Hypothesis: There is no significant
difference in the mean of the number of defects for the
three groups. Solution:
|
Type B |
Type B |
Type C |
| 0 |
2 |
1 |
|
1 |
0 |
0 |
| 0 |
3 |
1 |
| 2 |
5 |
1 |
| 3 |
3 |
0 |
| 2 |
4 |
2 |
| 0 |
6 |
0 |
|
1 |
0 |
0 |
| 1 |
2 |
1 |
| 0 |
5 |
2 |
|
Xi = 10 30 8 |
Here we have to test
the hypothesis
Ho: There is no significant
difference between the mean numbers of defects of the
three groups.
H1: There is significant difference
between the mean numbers of defects of the three
groups.
Grant total, G = 10+30+8 =
48
Correction Factor (CF) = G2/n = 48*48/30 =
76.8
Total Sum of squares (TSS ) = åXij2 –
CF
= 160 – 76.8 (ie, sum of the squares of
all the observations – CF)
TSS =
83.2
Treatment sum of squares (tss) = åXi.2/ni
–CF
= 102/10 + 302/10 + 82/10 – 76.8
tss =
106.4 – 76.8 = 29.6
Error sum of squares (ESS) =
TSS – tss = 83.2 – 29.6 = 53.6
ESS =
53.6
ANOVA table:
Source df SS MSS Cal
F Tab F Treatment 2 29.6 29.6/2 = 14.8 14.8/1.985 =
7.4559 F0.05 (2, 27) = 3.35 Error 27 53.6 53.6/27 =
1.985 Total 29 83.2
Value of the
test statistic, F = 7.4559
Critical value F0.05
(2, 27) = 3.35, which is obtained from the F –table for
0.05 level of significance and for (2, 27) degrees of
freedom.
We have the decision rule that if
calculated F > F0.05 [2, 27] we reject the null
hypothesis.
Since the calculated value of F is
greater than the table value we reject the null
hypothesis with 95% confidence. Hence we conclude that
there is significant difference between the mean numbers
of defects of the three
groups.
(ii): To study the
effect of hematodes (microscopic worms) on plant growth
an OSU botanist prepared 16 identical planting pots. A
different number of hematodes was introduced into each
pot. In each pot a tomato seedling was planted and its
height measured (in centimeters) after 16 weeks. The
results were:
| Hematodes
|
1
|
2
|
3
|
4
|
| 0
|
10.8 |
9.1
|
13.5
|
9.2
|
| 1,000
|
11.1
|
11.1
|
8.2
|
11.3
|
| 5,000
|
5.4
|
4.6
|
7.4
|
5.0
|
| 10,000
|
5.4
|
4.6
|
7.4
|
5.0
|
1. What is the F-value from the
test? 2. What is the p-value from the
test? 3. From the test results your conclusion
is. . . ? Solution:
|
I
|
II
|
III
|
IV
|
| 10.8
|
11.1
|
5.4
|
5.4
|
| 9.1
|
11.1
|
4.6
|
4.6
|
| 13.5
|
8.2
|
7.4
|
7.4
|
| 9.2
|
11.3
|
5
|
5
|
Xi. 42.6 41.7 22.4 22.4
Here we have
to test the hypothesis
Ho: there is no
significant difference in the mean height of
plants.
Ha: there is significant difference in
the mean height of plants.
Grant total, G =
42.6+41.7+22.4+22.4 = 129.1
Correction Factor
(CF) = G2/n = 129.1*129.1/16 =
1041.68
Total Sum of squares (TSS ) =
åXij2 – CF
= 10.82 + 9.12 + 13.52 + … +7.42
+ 52 -CF
TSS = 1167.85- 1041.68
=
126.17
Treatment sum of squares (tss) = åXi.2/ni
–CF
= 42.62/4 + 41.72/4 + 22.42/4 + 22.42 /4 –
1041.68
= 1139.29 -1041.68
=
97.61
tss = 97.61
Error sum of
squares (ESS) = TSS – tss = 126.17 - 97.61
ESS =
28.56
ANOVA table:
Source df SS MSS Cal F
Tab F Treatment 3 97.61 97.61/3 = 32.5367
32.5367/2.38 = 13.67 F0.05 (3, 12) = 3.49 Error 12
28.56 28.56/12 = 2.38 Total 15
126.17
Value of the test statistic, F =
13.67
Critical value F0.05 (3, 12) = 3.49 , which
is obtained from the F –table for 0.05 level of
significance and for (3, 12) degrees of
freedom.
P-value = 0.0004, from the p-value
calculator.
We have the decision rule that if
calculated F > F0.05 [3, 12] we reject the null
hypothesis.
Since the calculated value of F is
greater than the table value we reject the null
hypothesis with 95% confidence. Hence we conclude that
there is significant difference in the mean height of
plants. That is hematodes are effective on plant
growth.
Testing of
hypothesis:
(1): An insurance company, based on
past experience, estimates the mean damage for a natural
disaster in its area is $5,000. After
introducing several plans to prevent loss, they randomly
sample 200 policyholders and find the mean
amount per claim was $4,800 with a standard deviation
of $1,300. Does it appear the prevention plans
were effective in reducing the mean amount of
a claim? Use the .05 significance
level.
1. H0 and H1; 2. the
critical variable(s); 3. the test
statistic; 4. the decision to reject or to fail
to reject; 5. the p-value, if
requested
Solution:
Given m = 5000, n =
200, X(bar) = 4800, S = 1300 and a = 0.05
The
hypothesis to be tested is
Ho: The
prevention plan does not make any change is the mean
amount (ie, m = 5000)
H1: The prevention plans
were effective in reducing the mean amount (ie, m <
5000)
The test statistic is
Z = [X(bar) -
m]/(S/Ön)
= [4800-5000]/(
1300/Ö200)
= - 2.1757
From the standard
normal table for 0.05 level of significance we get the
critical value Za = -1.645 (since left tailed
test)
The decision rule is Reject Ho if Z < -
Za
Since Z < - Za we reject the null
hypothesis with 95% confidence. Hence we conclude that
the prevention plans were effective in reducing the mean
amount.
P-value = P(Z < - 2.1757)
= 0.5
– P(-2.1757 < Z < 0) = 0.5 – P(0 < Z <
2.1757), since normal curve is symmetric.
= 0.5 –
0.4854 = 0.0146
P-value =
0.0146
(ii): A sample of 65 observations
is selected from one population. The sample mean is 2.67
and the sample standard deviation is 0.75. A sample of
50 observations is selected from a second population.
The sample mean is 2.59 and the sample standard
deviation is 0.66. Conduct the following test of
hypothesis using the .08 significance
level: Ho: µ1 is less than or equal to
µ2 H1: µ1 is greater than µ2
a. Is
this a one-tailed or two tailed test? b. State
the decision rule c. Compute the value of the
test statisitc d.What is your decision rearding
Ho? e. What is the
p-value?
Solution:
Given n1 = 65, n2 = 50,
X1(bar) = 2.67,S1 = 0.75, X2(bar) = 2.59 And S2 =
0.66 .
Also a = 0.08
The hypothesis to be
tested is
Ho: µ1 is less than or equal to
µ2 H1: µ1 is greater than µ2
The
test statistic is
Z = [X1(bar)-X2(bar)]/SQRT[S12
/n1 + S22 /n2]
= [2.67-2.59]/SQRT[0.752 /65 +
0.662 /50]
= 0.6070
From the standard
normal table for 0.08 level of significance we get
critical value = 1.405
The decision rule is
Reject Ho if Z > critical value
Since Z <
critical value we accept the null hypothesis with 92%
confidence. Hence we conclude that µ1 is less than or
equal to µ2. That is mean of the first population is
less than or equal to the second
population.
P-value = P(Z > 0.6070)
=
0.5 –P(0 < Z < 0.6070)
= 0.5 –
0.2291
= 0.2709
Note:
(a): It is a
one-tailed test
(b): The decision rule is Reject
Ho if Z > critical value
(C): Z =
[X1(bar)-X2(bar)]/SQRT[S12 /n1 + S22 /n2]
=
[2.67-2.59]/SQRT[0.752 /65 + 0.662 /50]
=
0.6070
(d): Since Z < critical value we accept
the null hypothesis with 92% confidence. Hence we
conclude that µ1 is less than or equal to µ2. That is
mean of the first population is less than or equal to
the second population.
(e): P-value =
P(Z > 0.6070)
= 0.5 –P(0 < Z <
0.6070)
= 0.5 – 0.2291
=
0.2709
(iii): A test of sobriety involves
measuring of the subject's motor skills. A sample of 20
randomly selected sober subjects has a mean of 41 and a
standard deviation of 3.7. At the .01 level
significance, test the claim that the mean score for
this test is equal to 35. For the problem state
H subscipted 0 and H subscripted 1. Calculate the test
statistics and the critical values. State your
conclusion in plain
English
Solution:
Given n = 20, m = 35,
X(bar) = 41, S = 3.7 and a = 0.01
We have to test
the hypothesis
Ho: The mean score for the test is
35 (ie, m = 35)
H1: The mean score of the test is
not equal to 35 (ie, m not equal to 150)
Since n
is small (<30) and the population standard deviation
is not given we can use the test statistic,
t = [X(bar) - m]/[S/Sqrt(n-1)]
=
[41-35]/[3.7/Sqrt(19)]
= 7.0684
From the
t-table for 0.01 level of significance and for 20-1 = 19
degrees of freedom we get the critical value, ta/2 =
2.861
The decision rule is Reject Ho if |t| >
ta/2
Clearly |t| > ta/2 So we reject the
null hypothesis with 99% confidence. Hence we conclude
that the mean score of the test is not equal to
35.
Confidence interval
estimation:
(i): A lawyer researched the average
number of years served by 45 different justices on the
Supreme Court. The average number of years served was
13.8 years with a standard deviation of 7.3 years. What
is the 95% confidence interval for the average number of
years served by all Supreme Court
Justices?
Solution:
Given that n = 45,
X(bar) = 13.8, SD = 7.3 and level of
significance, a= 0.05.
The
confidence interval for population mean is [X(bar) ± Za/2 *
SD/Ön], where Za/2 can be obtained from the
standard normal table for a given level of significance.
Here for 0.05 level of significance we get Za/2 =
1.96.
[X(bar) ± Za/2 * SD/Ön] = [13.8 ± 1.96 *
7.3/Ö45]
= [13.8 ± 2.1]
=(11.7,
15.9)
Hence the 95% confidence interval for the
average number of years served by all Supreme Court
Justices is (11.7, 15.9) years.
(ii): A
sample of 22 American households had an average monthly
cable bill of $56.50, with a standard deviation of
$22.15. Construct a 95% confidence interval for the
average cable bill of all American
households.
Solution:
Given n = 22, X(bar)
= 56.5, sample standard deviation, sd =
22.15
Also a = 0.05
Since here the sample
size is small (<30) and the population standard
deviation is not given we have the 95% confidence
interval for mean is given by,
[X(bar) ± ta/2 *
sd/Ön-1], where ta/2 can be obtained from the t-table
for 0.05 level of significance and for n – 1= 21 degrees
of freedom and is given by, ta/2 = 2.08
[X(bar) ±
ta/2 * sd/Ön-1] = [56.5 ± 2.08 * 22.15/Ö21]
=
[56.5 ± 20.9117]
= (35.5883,
77.4117)
Hence the 95% confidence interval for
the average cable bill of all American households is
($35.5883, $77.4117).
(iii): The mean weight
of trucks traveling on a particular section of I-81 is
not known. A state highway inspector needs an estimate
of the mean. He selects a random sample of 49 trucks
passing the weighing station near Cloverdale and finds
the mean is 15.8 tons, with a standard deviation of the
sample of 4.2 tons. What is the probability that a truck
will weigh less than 14.3 tons? What is the 95 percent
confidence interval for the population
mean?
Solution:
(1): Given n = 49,
mean = 15.8 and SD = 4.2
We have the standard
normal variable,
Z = [X – mean]/SD
P(X
< 14.3) = P{ (X – mean)/SD < (14.3 –
mean)/SD}
= P{ (X – 15.8)/4.2 < (14.3 –
15.8)/4.2}
= P(Z < - 0.3571)
= 0.5 –
P(-0.3571 < Z < 0)
= 0.5 – P(0 < Z <
0.3571), since normal curve is symmetric.
= 0.5 –
0.1406 = 0.3594
P(truck will weigh less than 14.3
tons) = 0.3594
(2): Given n = 49, mean = 15.8 and
SD = 4.2 and a = 0.05
(1-a )% confidence interval
for mean is, [X(bar) ± Za/2 * SD/Ön]
Where Za/2
can be obtained from the standard normal table for 0.05
level of significance. Here it is obtained as Za/2 =
1.96
[X(bar) ± Za/2 * SD/Ön] = [15.8 ± 1.96 *
4.2/Ö49]
= [15.8 ± 1.176]
= (14.624,
16.976)
Hence the 95% confidence interval for
population mean is (14.624, 16.976).
Correlation
and regression:
(i): The manufacturer of Cardio
Glide exercise equipment wants to study the relationship
between the number of months since the glide was
purchased and the length of time the equipment was used
last week.
Person
Months Owned
Hours
Exercised Rupple
12
4 Hall
2
10 Bennett
6
8 Longnecker
9
5 Phillips
7
5 Massa
2
8 Sass
8
3 Karl
4
8 Malrooney
10
2 Veights
5
5
a. Plot the information on a scatter
diagram. Let hours of exercise be the dependent
variable. Comment on the graph. b. Determine the
correlation. Interpret. c. At the .01 significance
level, can we conclude that there is a negative
association between the
variables?
Solution:
Person Months
owned(X) Hours exercised(Y) X*X
Y*Y X*Y Rupple
12
4 144 16
48 Hall
2
10 4
100 20 Bennett
6
8 36
64 48 Longnecker
9
5 81
25 45 Phillips
7
5 49
25 35 Massa
2
8 4
64 16 Sass
8
3 64
9 24 Karl
4
8 16
64 32 Malrooney
10
2 100
4 20 Veights
5
5 25
25 25 Total
65
58 523 396
313
(a):
From
the scatter diagram we can say that there will be a
negative correlation between the two variables namely
months owned and the hours exercised. That is as an
increase in the number of months since the glide was
purchased there will be a corresponding decrease in the
hours of exercise.
(b): We have the Karl
Pearson’s coefficient of
correlation,
Substituting and
simplifying we get, r = - 0.8269
That is there is
a very high negative correlation between the two
variables. That is there is a negative correlation
between the two variables namely months owned and the
hours exercised.
(C): The hypothesis to be tested
is,
Ho: r = 0 H1: r < 0 (hypothesizing a
significant negative correlation between the two
variables - a one tailed test) The test
statistic is, . t = r Ö[(n-2)/(1-r2 )] = - 0.8269
* Ö[(10-2)/(1-(- 0.8269)2 )] = - 4.1593 From the
t-table for 0.01 level of significance and for 10-2 = 8
degrees of freedom we get the critical value =
2.896 Since the numerical value of the test statistic
is greater than the critical value we reject the null
hypothesis with 99% confidence. Hence we conclude that
there is a negative association between the number of
months since the glide was purchased and the length of
time the equipment was used last week.
(ii): $uper Markets, Inc. is considering expanding into the
Scottsdale, Arizona, area. Ms. Luann Miller, Director of
Planning, must present an analysis of the proposed
expansion to the operating committee of the board of
directors. As a part of her proposal, she needs to
include information on the amount people in the region
spend per month for grocery items. She would also like
to include information on the relationship between the
amount spent for grocery items and income. She gathered
the following sample information.
| Household |
Monthly
amount(y) |
Monthly
income(x) |
Household |
Monthly
amount(y) |
Monthly
income(x) |
| 1 |
555 |
4,388 |
21 |
913 |
688 |
| 2 |
489 |
4,558 |
22 |
918 |
6,752 |
| 3 |
458 |
4,793 |
23 |
710 |
6,837 |
| 4 |
613 |
4,856 |
24 |
1,083 |
7,242 |
| 5 |
647 |
4,856 |
25 |
937 |
7,263 |
| 6 |
661 |
4,899 |
26 |
839 |
7,540 |
| 7 |
662 |
4,899 |
27 |
1,030 |
8,009 |
| 8 |
675 |
5,091 |
28 |
1,065 |
8,094 |
| 9 |
549 |
5,133 |
29 |
1,069 |
8,264 |
| 10 |
606 |
5,304 |
30 |
1,064 |
8,392 |
| 11 |
606 |
5,304 |
31 |
1,015 |
8,414 |
| 12 |
740 |
5,304 |
32 |
1,148 |
8,882 |
| 13 |
592 |
5,346 |
33 |
1,125 |
8,925 |
| 14 |
720 |
5,495 |
34 |
1,1090 |
8,989 |
| 15 |
680 |
5,581 |
35 |
1,208 |
9,053 |
| 16 |
540 |
5,730 |
36 |
1,217 |
9,138 |
| 17 |
693 |
5,943 |
37 |
1,140 |
9,329 |
| 18 |
541 |
5,943 |
38 |
1,265 |
9,649 |
| 19 |
673 |
6,156 |
39 |
1,206 |
9,862 |
| 20 |
676 |
6,603 |
40 |
1,145 |
9,883 |
a. Let the amount spent be the dependent
variable and monthly income the independent variable.
Create a scatter diagram, using a software
package. b. Determine the regression equation.
Interpret the slope value. c. Determine the
coefficient. Can you conclude that it is greater than
0.
Solution:
The data can be
expressed in the following form
| Household |
Monthly amount(y) |
Monthly income(x) |
| 1 |
555 |
4388 |
|
2 |
489 |
4558 |
| 3 |
458 |
4793 |
| 4 |
613 |
4856 |
| 5 |
647
|
4856 |
| 6 |
661
|
4899 |
| 7 |
662 |
4899 |
| 8 |
675 |
5091 |
| 9 |
549 |
5133 |
| 10 |
606 |
5304 |
| 11 |
668 |
5304 |
| 12 |
740 |
5304 |
| 13 |
592 |
5346 |
| 14 |
720 |
5495 |
| 15 |
680 |
5581 |
| 16 |
540 |
5730 |
| 17 |
693 |
5943 |
| 18 |
541 |
5943 |
| 19 |
673 |
6156 |
| 20 |
676 |
6603 |
| 21 |
913 |
6688 |
| 22 |
918 |
6752 |
| 23 |
710 |
6837 |
| 24 |
1083 |
7242 |
| 25 |
937 |
7263 |
| 26 |
839 |
7540 |
| 27 |
1030 |
8009 |
| 28 |
1065 |
8094 |
| 29 |
1069 |
8264 |
| 30 |
1064 |
8392 |
| 31 |
1015 |
8414 |
| 32 |
1148 |
8882 |
| 33 |
1125 |
8925 |
| 34 |
1090 |
8989 |
| 35 |
1208 |
9053 |
| 36 |
1217 |
9138 |
| 37 |
1140 |
9329 |
| 38 |
1265 |
9649 |
| 39 |
1206 |
9862 |
| 40 |
1145 |
9883 |
| |
|
|
Household monthly amount(Y) Monthly income(X) 1 555 4388 2 489
4558 3 458 4793
(a): Here Monthly amount is taken on the
X-axis.
Here monthly income is taken on the X
–axis.
Output-1:
Measure
Value Error Measures Bias (Mean Error)
0 MAD (Mean Absolute Deviation) 67.5928 MSE (Mean
Squared Error) 6,439.08 Standard Error (denom=n-2-0=38) 82.3285 Regression
line Dpndnt var, Y = -74.3665 + 0.1339
* X Statistics Correlation coefficient
0.9447 Coefficient of determination (r^2)
0.8925
(b): The regression equation is Y = -
74.3665 + 0.1339 * X
Here the slope = 0.1339 and
intercept = - 74.3665.
The slope represents the
angle of the line (change in Y per unit X). It can be
interpreted as the amount of change in the dependent
variable (Y) that is associated with a change in one
unit of the independent variable (X). That is we can
interpret it as a unit change in monthly income produces
an increase of $0.1339 on the monthly amount spent on
grocery items.
(C): Here the coefficient of
correlation, r = 0.9447
Since r = 0.9447, we can
say that there is a very high positive correlation
between the two variables namely monthly income and the
monthly amount spent on grocery
items.
Coefficient of determination, r2 =
0.8925
It is the percentage of variation
explained by the variable Y for a unit change in
X.
Probability:
(i): A study
of vehicle flow characteristics on acceleration lanes
found that one out of every six vehicles uses less than
one-third of the acceleration lane before merging into
the traffic. Suppose we monitor the location of the
merge for the next five vehicles that enter the
acceleration lane. a. What is the probability
that none of the vehicles will use less than one-third
of the acceleration lanes? b. What is the
probability that exactly two of the vehicles will use
less than one-third of the acceleration
lane? Solution:
P(vehicles uses less
than one-third of the acceleration lane before merging
into the traffic) = 1/6
Here we can use the
binomial distribution to solve this problem. It is given
by,
P(x) = n Cx Px (1-P)n - x
Here we can
use n = 5 and P = 1/6
P(none of the vehicles will
use less than one-third of the acceleration lanes) = P(X
= 0)
= 5 C0 (1/6)0 (1-1/6)5 - 0
=
0.4019
The probability that none of the vehicles
will use less than one-third of the acceleration lanes =
0.4019
(b): P(exactly two of the vehicles will
use less than one-third of the acceleration lane) = P(X
= 2)
= 5 C2 (1/6)2 (1-1/6)5 - 2
=
0.1608
The probability that exactly two of the
vehicles will use less than one-third of the
acceleration lane = 0.1608
(ii): Assume that
men's heights are normally distributed with a mean of
69.0 inches and a standard deviation of 2.8
inches. a) If 1 man is randomly selected, what
is the probability that his height is less than 78.0
inches? b) If 15 men are randomly selected,
what is the probability that their mean height is less
than 70.0 inches? Solution:
Given mean = 69
and SD = 2.8
(a): Let X denote the height of the
person selected. Then we have to find the
probability
P(X < 78).
P(X<78)
= P[(X-mean)/SD < (78-mean)/SD]
= P(Z <
(78-69)/2.8]
= P(Z < 3.21)
= 0.5 + P(0
< Z < 3.21), using normal curve.
= 0.5 +
0.4993 = 0.9993
P(X<78) = 0.9993
(b):
P[X(bar) < 70] = P{[X(bar) –mean)/(SD/sqrt(n) <
(70-mean)/[SD/sqrt(n)]}
= P{Z <
[(70-69)/(2.8/sqrt(15)]}
= P(Z < -
1.383)
= 0.5 – P(-1.383 < Z < 0)
=
0.5 – P(0 < Z < 1.383)
= 0.5- 0.4162 =
0.0838
P(mean height is less than 70 inches) =
0.0838
(iii): We select a card at random from
a deck of 52 playing cards. Suppose we want to find the
probability that the selected card is either a spade or
a diamond.
P(selected card is either a spade or a
diamond) = P(selected card is a spade) + P(selected card
is a diamond) = 13/52 + 13/52 = 26/52
In this
case the events are mutually exclusive.
Now
suppose we want to find the probability that the
selected card is a spade or a king.
P(selected
card is a spade or a king) = P(spade) + P(king) –
P(spade king)
= 13/52 +4/52 – 1/52 =
16/52
Here the events are not mutually exclusive.
Here we can use the general formula.
|