HomeLoginContact us  

About Us

Subjects

News

Testimonials

FAQ

Login
 

 

Statistics Zone
 

                                                     Statistics

                                                

ANOVA:

(i): Three types of computer disks are selected and the number of defects in each is as follows. Type A has 0,1,0,2,3,2,0,1,1, and 0. 
Type B has 2,0,3,5,3,4,6,0,2 and 5. 
Type C has 1,0,1,1,0,2,0,0,1, and 2. 
Hypothesis: There is no significant difference in the mean of the number of defects for the three groups. 
Solution:

Type B Type B Type C
0 2 1
1 0 0
0 3 1
2 5 1
3 3 0
2 4 2
0 6 0
1 0 0
1 2 1
0 5 2
                                            Xi = 10 30 8



Here we have to test the hypothesis 

Ho: There is no significant difference between the mean numbers of defects of the three groups.

H1: There is significant difference between the mean numbers of defects of the three groups.

Grant total, G = 10+30+8 = 48

Correction Factor (CF) = G2/n = 48*48/30 = 76.8

Total Sum of squares (TSS ) = åXij2 – CF 

= 160 – 76.8 (ie, sum of the squares of all the observations – CF)

TSS = 83.2

Treatment sum of squares (tss) = åXi.2/ni –CF

= 102/10 + 302/10 + 82/10 – 76.8

tss = 106.4 – 76.8 = 29.6

Error sum of squares (ESS) = TSS – tss = 83.2 – 29.6 = 53.6



ESS = 53.6


ANOVA table:

Source df SS MSS Cal F Tab F
Treatment 2 29.6 29.6/2 = 14.8 14.8/1.985 = 7.4559 F0.05 (2, 27) = 3.35
Error 27 53.6 53.6/27 = 1.985 
Total 29 83.2 

Value of the test statistic, F = 7.4559

Critical value F0.05 (2, 27) = 3.35, which is obtained from the F –table for 0.05 level of significance and for (2, 27) degrees of freedom. 

We have the decision rule that if calculated F > F0.05 [2, 27] we reject the null hypothesis.

Since the calculated value of F is greater than the table value we reject the null hypothesis with 95% confidence. Hence we conclude that there is significant difference between the mean numbers of defects of the three groups.




(ii): 
To study the effect of hematodes (microscopic worms) on plant growth an OSU botanist prepared 16 identical planting pots. A different number of hematodes was introduced into each pot. In each pot a tomato seedling was planted and its height measured (in centimeters) after 16 weeks. The results were: 

Hematodes 2 3
0   10.8  9.1 13.5 9.2
1,000 11.1 11.1 8.2 11.3
5,000 5.4 4.6 7.4 5.0
10,000 5.4 4.6 7.4 5.0

1. What is the F-value from the test? 
2. What is the p-value from the test? 
3. From the test results your conclusion is. . . ? 
Solution:

I II III  IV
10.8 11.1 5.4 5.4
9.1 11.1 4.6 4.6
13.5 8.2 7.4 7.4
9.2 11.3 5 5

                                                                 Xi. 42.6 41.7 22.4 22.4

Here we have to test the hypothesis 

Ho: there is no significant difference in the mean height of plants.

Ha: there is significant difference in the mean height of plants.


Grant total, G = 42.6+41.7+22.4+22.4 = 129.1


Correction Factor (CF) = G2/n = 129.1*129.1/16 = 1041.68



Total Sum of squares (TSS ) = åXij2 – CF 

= 10.82 + 9.12 + 13.52 + … +7.42 + 52 -CF

TSS = 1167.85- 1041.68

= 126.17

Treatment sum of squares (tss) = åXi.2/ni –CF

= 42.62/4 + 41.72/4 + 22.42/4 + 22.42 /4 – 1041.68

= 1139.29 -1041.68

= 97.61

tss = 97.61



Error sum of squares (ESS) = TSS – tss = 126.17 - 97.61

ESS = 28.56

ANOVA table:

Source df SS MSS Cal F Tab F
Treatment 3 97.61 97.61/3 = 32.5367 32.5367/2.38 = 13.67 F0.05 (3, 12) = 3.49
Error 12 28.56 28.56/12 = 2.38 
Total 15 126.17 

Value of the test statistic, F = 13.67

Critical value F0.05 (3, 12) = 3.49 , which is obtained from the F –table for 0.05 level of significance and for (3, 12) degrees of freedom. 

P-value = 0.0004, from the p-value calculator.

We have the decision rule that if calculated F > F0.05 [3, 12] we reject the null hypothesis.

Since the calculated value of F is greater than the table value we reject the null hypothesis with 95% confidence. Hence we conclude that there is significant difference in the mean height of plants. That is hematodes are effective on plant growth.



Testing of hypothesis:

(1): An insurance company, based on past experience, estimates the mean damage for a natural disaster 
in its area is $5,000. After introducing several plans to prevent loss, they randomly sample 
200 policyholders and find the mean amount per claim was $4,800 with a standard deviation of 
$1,300. Does it appear the prevention plans were effective in reducing the mean amount of a 
claim? Use the .05 significance level. 

1. H0 and H1; 
2. the critical variable(s); 
3. the test statistic; 
4. the decision to reject or to fail to reject; 
5. the p-value, if requested

Solution:

Given m = 5000, n = 200, X(bar) = 4800, S = 1300 and a = 0.05

The hypothesis to be tested is 

Ho: The prevention plan does not make any change is the mean amount (ie, m = 5000)

H1: The prevention plans were effective in reducing the mean amount (ie, m < 5000)

The test statistic is 

Z = [X(bar) - m]/(S/Ön)

= [4800-5000]/( 1300/Ö200)

= - 2.1757

From the standard normal table for 0.05 level of significance we get the critical value 
Za = -1.645 (since left tailed test)

The decision rule is Reject Ho if Z < - Za

Since Z < - Za we reject the null hypothesis with 95% confidence. Hence we conclude that the prevention plans were effective in reducing the mean amount.

P-value = P(Z < - 2.1757)

= 0.5 – P(-2.1757 < Z < 0)
= 0.5 – P(0 < Z < 2.1757), since normal curve is symmetric.

= 0.5 – 0.4854 = 0.0146

P-value = 0.0146



(ii): A sample of 65 observations is selected from one population. The sample mean is 2.67 and the sample standard deviation is 0.75. A sample of 50 observations is selected from a second population. The sample mean is 2.59 and the sample standard deviation is 0.66. Conduct the following test of hypothesis using the .08 significance level: 
Ho: µ1 is less than or equal to µ2 
H1: µ1 is greater than µ2 

a. Is this a one-tailed or two tailed test? 
b. State the decision rule 
c. Compute the value of the test statisitc 
d.What is your decision rearding Ho? 
e. What is the p-value?

Solution:

Given n1 = 65, n2 = 50, X1(bar) = 2.67,S1 = 0.75, X2(bar) = 2.59
And S2 = 0.66 .

Also a = 0.08

The hypothesis to be tested is

Ho: µ1 is less than or equal to µ2 
H1: µ1 is greater than µ2 

The test statistic is

Z = [X1(bar)-X2(bar)]/SQRT[S12 /n1 + S22 /n2]

= [2.67-2.59]/SQRT[0.752 /65 + 0.662 /50]

= 0.6070

From the standard normal table for 0.08 level of significance we get critical value = 1.405

The decision rule is Reject Ho if Z > critical value

Since Z < critical value we accept the null hypothesis with 92% confidence. Hence we conclude that µ1 is less than or equal to µ2. That is mean of the first population is less than or equal to the second population.

P-value = P(Z > 0.6070)

= 0.5 –P(0 < Z < 0.6070)

= 0.5 – 0.2291

= 0.2709

Note:

(a): It is a one-tailed test

(b): The decision rule is Reject Ho if Z > critical value

(C): Z = [X1(bar)-X2(bar)]/SQRT[S12 /n1 + S22 /n2]

= [2.67-2.59]/SQRT[0.752 /65 + 0.662 /50]

= 0.6070

(d): Since Z < critical value we accept the null hypothesis with 92% confidence. Hence we conclude that µ1 is less than or equal to µ2. That is mean of the first population is less than or equal to the second population.

(e): 
P-value = P(Z > 0.6070)

= 0.5 –P(0 < Z < 0.6070)

= 0.5 – 0.2291

= 0.2709


(iii): A test of sobriety involves measuring of the subject's motor skills. A sample of 20 randomly selected sober subjects has a mean of 41 and a standard deviation of 3.7. At the .01 level significance, test the claim that the mean score for this test is equal to 35. 
For the problem state H subscipted 0 and H subscripted 1. Calculate the test statistics and the critical values. State your conclusion in plain English

Solution:

Given n = 20, m = 35, X(bar) = 41, S = 3.7 and a = 0.01

We have to test the hypothesis

Ho: The mean score for the test is 35 (ie, m = 35)

H1: The mean score of the test is not equal to 35 (ie, m not equal to 150)

Since n is small (<30) and the population standard deviation is not given we can use the test statistic,

t = [X(bar) - m]/[S/Sqrt(n-1)]

= [41-35]/[3.7/Sqrt(19)]

= 7.0684

From the t-table for 0.01 level of significance and for 20-1 = 19 degrees of freedom we get the critical value, ta/2 = 2.861

The decision rule is Reject Ho if |t| > ta/2

Clearly |t| > ta/2
So we reject the null hypothesis with 99% confidence. Hence we conclude that the mean score of the test is not equal to 35.


Confidence interval estimation:

(i): A lawyer researched the average number of years served by 45 different justices on the Supreme Court. The average number of years served was 13.8 years with a standard deviation of 7.3 years. What is the 95% confidence interval for the average number of years served by all Supreme Court Justices?

Solution:

Given that n = 45, X(bar) = 13.8, SD = 7.3 and level of significance, 
a= 0.05. 

The confidence interval for population mean is [X(bar) ± Za/2 * SD/Ön], where Za/2 can be obtained from the standard normal table for a given level of significance. Here for 0.05 level of significance we get Za/2 = 1.96.

[X(bar) ± Za/2 * SD/Ön] = [13.8 ± 1.96 * 7.3/Ö45]

= [13.8 ± 2.1]

=(11.7, 15.9)

Hence the 95% confidence interval for the average number of years served by all Supreme Court Justices is (11.7, 15.9) years.


(ii): A sample of 22 American households had an average monthly cable bill of $56.50, with a standard deviation of $22.15. Construct a 95% confidence interval for the average cable bill of all American households.

Solution:

Given n = 22, X(bar) = 56.5, sample standard deviation, sd = 22.15

Also a = 0.05

Since here the sample size is small (<30) and the population standard deviation is not given we have the 95% confidence interval for mean is given by,

[X(bar) ± ta/2 * sd/Ön-1], where ta/2 can be obtained from the t-table for 0.05 level of significance and for n – 1= 21 degrees of freedom and is given by, ta/2 = 2.08

[X(bar) ± ta/2 * sd/Ön-1] = [56.5 ± 2.08 * 22.15/Ö21]

= [56.5 ± 20.9117]

= (35.5883, 77.4117)

Hence the 95% confidence interval for the average cable bill of all American households is ($35.5883, $77.4117).


(iii): The mean weight of trucks traveling on a particular section of I-81 is not known. A state highway inspector needs an estimate of the mean. He selects a random sample of 49 trucks passing the weighing station near Cloverdale and finds the mean is 15.8 tons, with a standard deviation of the sample of 4.2 tons. What is the probability that a truck will weigh less than 14.3 tons? What is the 95 percent confidence interval for the population mean?

Solution:


(1): Given n = 49, mean = 15.8 and SD = 4.2

We have the standard normal variable,

Z = [X – mean]/SD

P(X < 14.3) = P{ (X – mean)/SD < (14.3 – mean)/SD}

= P{ (X – 15.8)/4.2 < (14.3 – 15.8)/4.2}

= P(Z < - 0.3571)

= 0.5 – P(-0.3571 < Z < 0)

= 0.5 – P(0 < Z < 0.3571), since normal curve is symmetric.

= 0.5 – 0.1406 = 0.3594

P(truck will weigh less than 14.3 tons) = 0.3594

(2): Given n = 49, mean = 15.8 and SD = 4.2 and a = 0.05

(1-a )% confidence interval for mean is, [X(bar) ± Za/2 * SD/Ön]

Where Za/2 can be obtained from the standard normal table for 0.05 level of significance. Here it is obtained as Za/2 = 1.96

[X(bar) ± Za/2 * SD/Ön] = [15.8 ± 1.96 * 4.2/Ö49]

= [15.8 ± 1.176]

= (14.624, 16.976)

Hence the 95% confidence interval for population mean is (14.624, 16.976).

Correlation and regression:

(i): The manufacturer of Cardio Glide exercise equipment wants to study the relationship between the number of months since the glide was purchased and the length of time the equipment was used last week.


Person                     Months Owned                   Hours Exercised
Rupple                                          12                                         4
Hall                                                 2                                       10
Bennett                                          6                                          8
Longnecker                                    9                                           5
Phillips                                            7                                          5
Massa                                            2                                           8
Sass                                               8                                           3
Karl                                                4                                           8
Malrooney                                    10                                           2
Veights                                           5                                           5


a. Plot the information on a scatter diagram. Let hours of exercise be the dependent variable. Comment on the graph.
b. Determine the correlation. Interpret.
c. At the .01 significance level, can we conclude that there is a negative association between the variables? 

Solution:

Person  Months owned(X)  Hours exercised(Y)   X*X    Y*Y   X*Y
Rupple                          12                            4    144    16     48
Hall                                 2                          10        4    100   20
Bennett                          6                            8       36      64  48
Longnecker                    9                            5       81      25   45
Phillips                            7                            5      49       25   35
Massa                             2                             8       4       64  16
Sass                               8                             3     64         9    24
Karl                                4                              8     16       64   32
Malrooney                   10                               2   100        4    20
Veights                          5                               5     25      25    25
Total                            65                             58   523     396 313









(a):



From the scatter diagram we can say that there will be a negative correlation between the two variables namely months owned and the hours exercised. That is as an increase in the number of months since the glide was purchased there will be a corresponding decrease in the hours of exercise.

(b): We have the Karl Pearson’s coefficient of correlation,




Substituting and simplifying we get, r = - 0.8269

That is there is a very high negative correlation between the two variables. That is there is a negative correlation between the two variables namely months owned and the hours exercised.

(C): The hypothesis to be tested is,

Ho: r = 0
H1: r < 0 (hypothesizing a significant negative correlation between the two variables - a one tailed test) 
The test statistic is,
. t = r Ö[(n-2)/(1-r2 )]
= - 0.8269 * Ö[(10-2)/(1-(- 0.8269)2 )]
= - 4.1593
From the t-table for 0.01 level of significance and for 10-2 = 8 degrees of freedom we get the critical value = 2.896
Since the numerical value of the test statistic is greater than the critical value we reject the null hypothesis with 99% confidence. Hence we conclude that there is a negative association between the number of months since the glide was purchased and the length of time the equipment was used last week.

(ii): $uper Markets, Inc. is considering expanding into the Scottsdale, Arizona, area. Ms. Luann Miller, Director of Planning, must present an analysis of the proposed expansion to the operating committee of the board of directors. As a part of her proposal, she needs to include information on the amount people in the region spend per month for grocery items. She would also like to include information on the relationship between the amount spent for grocery items and income. She gathered the following sample information.

Household Monthly amount(y) Monthly income(x) Household Monthly amount(y) Monthly income(x)
1 555 4,388 21 913 688
2 489 4,558 22 918 6,752
3 458 4,793 23 710 6,837
4 613 4,856 24 1,083 7,242
5 647 4,856 25 937 7,263
6 661 4,899 26 839 7,540
7 662 4,899 27 1,030 8,009
8 675 5,091 28 1,065 8,094
9 549 5,133 29 1,069 8,264
10 606 5,304 30 1,064 8,392
11 606 5,304 31 1,015 8,414
12 740 5,304 32 1,148 8,882
13 592 5,346 33 1,125 8,925
14 720 5,495 34 1,1090 8,989
15 680 5,581 35 1,208 9,053
16 540 5,730 36 1,217 9,138
17 693 5,943 37 1,140 9,329
18 541 5,943 38 1,265 9,649
19 673 6,156 39 1,206 9,862
20 676 6,603 40 1,145 9,883


a. Let the amount spent be the dependent variable and monthly income the independent variable. Create a scatter diagram, using a software package. 
b. Determine the regression equation. Interpret the slope value.
c. Determine the coefficient. Can you conclude that it is greater than 0. 


Solution:

The data can be expressed in the following form

Household Monthly amount(y) Monthly income(x)
1 555 4388
2 489 4558
3 458 4793
4 613 4856
5 647 4856
6 661 4899
7 662 4899
8 675 5091
9 549 5133
10 606 5304
11 668 5304
12 740 5304
13 592 5346
14 720 5495
15 680 5581
16 540 5730
17 693 5943
18 541 5943
19 673 6156
20 676 6603
21 913 6688
22 918 6752
23 710 6837
24 1083 7242
25 937 7263
26 839 7540
27 1030 8009
28 1065 8094
29 1069 8264
30 1064 8392
31 1015 8414
32 1148 8882
33 1125 8925
34 1090 8989
35 1208 9053
36 1217 9138
37 1140 9329
38 1265 9649
39 1206 9862
40 1145 9883
     



Household monthly amount(Y) Monthly income(X)
1 555 4388
2 489 4558
3 458 4793

(a): Here Monthly amount is taken on the X-axis.

Here monthly income is taken on the X –axis.




Output-1:

Measure Value
Error Measures 
Bias (Mean Error) 0
MAD (Mean Absolute Deviation) 67.5928
MSE (Mean Squared Error) 6,439.08
Standard Error (denom=n-2-0=38) 82.3285
Regression line 
Dpndnt var, Y = -74.3665 
+ 0.1339 * X 
Statistics 
Correlation coefficient 0.9447
Coefficient of determination (r^2) 0.8925

(b): The regression equation is Y = - 74.3665 + 0.1339 * X

Here the slope = 0.1339 and intercept = - 74.3665.

The slope represents the angle of the line (change in Y per unit X). It can be interpreted as the amount of change in the dependent variable (Y) that is associated with a change in one unit of the independent variable (X). That is we can interpret it as a unit change in monthly income produces an increase of $0.1339 on the monthly amount spent on grocery items.

(C): Here the coefficient of correlation, r = 0.9447

Since r = 0.9447, we can say that there is a very high positive correlation between the two variables namely monthly income and the monthly amount spent on grocery items.

Coefficient of determination, r2 = 0.8925

It is the percentage of variation explained by the variable Y for a unit change in X. 



Probability:

(i): A study of vehicle flow characteristics on acceleration lanes found that one out of every six vehicles uses less than one-third of the acceleration lane before merging into the traffic. Suppose we monitor the location of the merge for the next five vehicles that enter the acceleration lane. 
a. What is the probability that none of the vehicles will use less than one-third of the acceleration lanes? 
b. What is the probability that exactly two of the vehicles will use less than one-third of the acceleration lane? 
Solution:

P(vehicles uses less than one-third of the acceleration lane before merging into the traffic) = 1/6

Here we can use the binomial distribution to solve this problem. It is given by,

P(x) = n Cx Px (1-P)n - x

Here we can use n = 5 and P = 1/6

P(none of the vehicles will use less than one-third of the acceleration lanes) = P(X = 0)

= 5 C0 (1/6)0 (1-1/6)5 - 0

= 0.4019

The probability that none of the vehicles will use less than one-third of the acceleration lanes = 0.4019

(b): P(exactly two of the vehicles will use less than one-third of the acceleration lane) = P(X = 2)

= 5 C2 (1/6)2 (1-1/6)5 - 2


= 0.1608

The probability that exactly two of the vehicles will use less than one-third of the acceleration lane = 0.1608


(ii): Assume that men's heights are normally distributed with a mean of 69.0 inches and a standard deviation of 2.8 inches. 
a) If 1 man is randomly selected, what is the probability that his height is less than 78.0 inches? 
b) If 15 men are randomly selected, what is the probability that their mean height is less than 70.0 inches?
Solution:

Given mean = 69 and SD = 2.8

(a): Let X denote the height of the person selected. Then we have to find the probability 

P(X < 78).

P(X<78) = P[(X-mean)/SD < (78-mean)/SD]

= P(Z < (78-69)/2.8]

= P(Z < 3.21)

= 0.5 + P(0 < Z < 3.21), using normal curve.

= 0.5 + 0.4993 = 0.9993

P(X<78) = 0.9993

(b): P[X(bar) < 70] = P{[X(bar) –mean)/(SD/sqrt(n) < (70-mean)/[SD/sqrt(n)]}

= P{Z < [(70-69)/(2.8/sqrt(15)]}

= P(Z < - 1.383)

= 0.5 – P(-1.383 < Z < 0)

= 0.5 – P(0 < Z < 1.383)

= 0.5- 0.4162 = 0.0838

P(mean height is less than 70 inches) = 0.0838


(iii): We select a card at random from a deck of 52 playing cards. Suppose we want to find the probability that the selected card is either a spade or a diamond.

P(selected card is either a spade or a diamond) = P(selected card is a spade) + P(selected card is a diamond) = 13/52 + 13/52 = 26/52

In this case the events are mutually exclusive.

Now suppose we want to find the probability that the selected card is a spade or a king.

P(selected card is a spade or a king) = P(spade) + P(king) – P(spade king)

= 13/52 +4/52 – 1/52 = 16/52

Here the events are not mutually exclusive. Here we can use the general formula.

 

 
   
   Copyright © 2004 Any Time Tutor. All rights reserved.
Please read our Terms and Conditions and Privacy Policy.