طب مجتمع / د.زهراء / طب عام ثالث
نظري (4) 26/3/2017طب مجتمع / د.زهراء / طب عام ثالث
نظري (4) 26/3/2017
Statistical Inference
At the end of the lecture student will be able to:-Identify the meaning of statistical inference.
-Determine types of error.-Classify Z-Test according to the type of data and
number of samples.-Classify T-Test according to the number of samples and equality of
Variances.
-Practice the use of X2 - Test in different conditions.
Population : Large no. of subject or object with common observable character. (no. of babies born in Iraq in 2004).
N → size
µ → meanσ2 → variance
P → proportionSample:
Is subset ( sub group ) of population and have the samecharacter. (no. of babies born in Mosul city in 2004).
[sample is adopted to reduce time , labor and money].
n → size
x̅ → meanS2 → variance
p → proportion
Statistical Inference :
It is the process in which we infer the properties of the population through the study of the properties of a sample belonging to that population.Statistical hypothesis :
It is statement about the parameters (µ,σ2) of population been sampled which may or may not be true.
Null Hypothesis (Ho) :
In general this term relate to particular hypothesisunder test . It is often a hypothesis of no difference,
other wise it is called alternative hypothesis either
single or plural.
We denote null hypothesis: Ho
We denote alternative hypothesis: H1 or HATest of significance: It is the procedure which enable us to decide whether to accept or reject the hypothesis .
Type of error:
Type │error :
If the hypothesis is rejected when it is true , that is type one error and is denoted by α .
Type ║ error :
If the hypothesis is accepted when it is not true ,that is type two error and is denoted by β .
Level of significance:
It is the maximum probability with which we would be willing to risk type one error ,that is denoted by α .
(in practice α usually is 0.10 , 0.05 , 0.01). e.g. if α=0.05=5% that is there are about 5 chances in 100 attempts that we would reject the hypothesis when it should be accepted (we are about 95% confident that we have made the right decision and wrong with probability 5%).
Test Statistic :
Mathematical expression of sample values which
provide a base for testing a statistical hypo.- Z - Statistic Test (Parametric Test)
- T - Statistic Test (Parametric Test)- X2- Statistic Test (Non-Parametric Test)
One and Two tailed test :
If we decided that :-H0 : µ = x̅ then the alternative hypothesis is either :-
H1 : µ ≠ x̅ that is two tailed test orH1 : µ < x̅ that is one tailed test or
H1 : µ > x̅ and that is one tailed test too .
If we decided that α=0.05 then the tabulated z is 1.96 and the area or region outside the range -1.96 to 1.96 is called the critical region or rejection region or region of significance . While the region inside the range -1.96 to 1.96 is called the region of acceptances of the hypothesis or region of non significance .
Level of significance critical values of Z
α 0.10 0.05 0.01 0.005
For one tailed test
1.281.645
2.33
2.58
For two tailed test
1.645
1.96
2.58
2.81
Z –Test
(For n > 30 large & σ2 known)problem
Is it reasonable to conclude that a sample of nobservation with mean x̅ could have been taken
from population with mean µ & variance σ2 ?
Solution:
1. -State the null hypo. & its alternative where :
HO : The difference between sample mean & pop.
Mean is merely (only) due to sampling error or
(due to chance).
H1 : The difference between sample mean & pop. Mean is real.
H0 : x̅ = µ
H1 : x̅ ≠ µ
2. calculate Z-statistics where:
Zcal=(X-µ)σ√n3. compare the absolute of Zcal (│Zcal│) with Z tab i.e Z(one or two sided test, α or (α /2))and decide:
If (│Zcal│) ˂ Ztab → Accepted H0 the difference nonsignificance.
If (│Zcal│) ≥ Ztab → rejected H0 .4. make conclusion and decision.
Example (1)
In I.Q. test a population with mean 100 and standard deviation 15
was tested by choosing a sample of size 49 persons , where foundthat the sample mean was 106 . Dose this sample represent the
population ? use 0.05 as level of significance .
Solution
µ=100 σ=15 n=49 x̅=106 α=0.05
H0: x̅ = µ
H1: x̅ ≠ µ
n>30 therefore the test is Z where Zcal = (x̅ - µ)/ SE
and SE= σ /√n =15/ √49=2.143
hence Zcal =(106-100)/2.143 =2.80
while Ztab=Z(0.05 two tailed)= 1.96
therefore Zcal >Ztab then we reject H0 and accept H1 at 0.05 level of
significance , so this sample doesn’t represent that population with
5% level of significance.
-Test The Proportion Of One Sample
State the null hypothesis and its alternative. WhereH0 : P = p
H1 : P ≠ p
If P is a proportion in population where P= A / N such that :
A is the number of cases under consideration , while N is the
population size .
Similarly p is a proportion for sample which corresponds to P where
p= a / n and q=1 - p (q is the proportion of cases which doesn’t
have the cause under consideration). Using
SE=pqn
Zcal= (p-P)SE
Then compare Zcal with Ztab after that draw conclusion and make decision.Example (2)
A manufacture claimed that it was 90% a medicine is effective inleaving an allergy for a period of 8 hours . In a sample of 200
persons who had allergy , the medicine provided relieve for 160
persons . Determine whether the manufacture’s claime is true ,
use 10% as a level of significance ?
Solution :
P=0.90 n=200 a=160 α=0.10
Ho: P =p
H1: P ≠p
Because n>30 then the test is Z of two sided , hence: Zcal= (p - P) /SE
where p=160/200=0.8 and SE =s /√nbut s²= pq = 0.80(1-0.80)= 0.16 then s= √0.16= 0.4
so SE= 0.4/√200 = 0.0283 hence:
Zcal= (0.8-0.9)/0.0283 = -3.533
Ztab= -1.645
hence Zcal < Ztab so we reject Ho and accept H1 at 0.10 level of significance .
Therefore the manufacture’s claime is not true with confidence 90%.
Test The Two Independent Means
State the null hypothesis and its alternative. WhereH0 : µ1 = µ2
H1 : µ1 ≠ µ2
If the two populations of size N1,N2 respectively have means µ1 , µ2 and the two populations are uncorrelated then the test is :-
Z cal=(µ1- µ2 )/SEPooled Where SEPooled =σp√(1/N1+1/N2)
also σp² is the pooled variance where σp² =(N1 σ1²+ N2 σ2²)/(N1+N2)
Then compare Zcal with Ztab after that draw conclusion and make decision.
T –Test(For n ≤ 30 large & σ2 unknown)
problemIs it reasonable to conclude that a sample of n
observation with mean x̅ could have been takenfrom population with mean µ & variance σ2 ?
Solution:
1. -State the null hypo. & its alternative where :HO : The difference between sample mean & pop.
Mean is merely (only) due to sampling error or
(due to chance).
H1 : The difference between sample mean & pop. Mean is real.
H0 : x̅ = µ
H1 : x̅ ≠ µ
2. calculate T-statistics where:
tcal=(X-µ)s√n3. compare the absolute of Zcal (│Zcal│) with Z tab of d.f = n-1 and level of significance = α i.e t(α or α/2) and decide:
If (│tcal│) ˂ ttab → Accepted H0 the difference nonsignificance.
If (│tcal│) ≥ ttab → rejected H0 .4. make conclusion and decision.
Example (3)On a test in math which takes 150 min. , a new method of teaching
was used on 25 students and was found that their mean is 135 min.with sd 10 min. .Dose the new method give a real result or the
differences in means are due to chance ? Use 10% level of significance .
Solution
µ=150 n=25 x̅=135 S=10 α=0.10
H0: x̅ = µ
H1: x̅ ≠ µn<30 therefore the test is t where tcal = (x̅ - µ)/ SE
and SE=s /√n =10/ √25=2.0hence tcal =(135-150)/2 =-7.5
for ttab d.f=n-1 =25-1=24 so ttab=t(d.f, α)=t(24,0.05)=1.71
therefore |tcal | >ttab then we reject H0 and accept H1 at
0.10 level of significance , so this sample doesn’t
represent that population with 10% level of significance.
Test The Proportion Of One Sample when n ≤ 30
State the null hypothesis and its alternative. WhereH0 : P = p
H1 : P ≠ P
If P is a proportion in population where P= A / N such that :
A is the number of cases under consideration , while N is thepopulation size .
Similarly p is a proportion for sample which corresponds to P where
p= a / n and q=1 - p (q is the proportion of cases which doesn’t
have the cause under consideration). Using
SE=pqn
tcal= (p-P)SE
Then compare Zcal with Ztab of d.f =n-1 after that draw conclusion and make decision.Test The Two Independent Means n ≤ 30
State the null hypothesis and its alternative. Where
H0 : x̅1 = x̅2H1 : x̅1 ≠ x̅2
If the two populations of size n1,n2 respectively have means x̅1 , x̅2 and the two populations are uncorrelated then the test is :-
t cal=( x̅1- x̅2 )/SEPooled Where SEPooled =Sp√(1/n1+1/n2)
also Sp² is the pooled variance where Sp² =((n1-1) S1²+ (n2-1) S2²)/(n1+n2-2)
Then compare tcal with ttab if n1 and n2 less than 30 while the ttab is found under d.f = n1+ n2 – 2 and α as considered after that draw conclusion and make decision.Example (4)
Two samples were and the data were as following :-
Sample I 2 5 1 2 2 3 0 2 1
Sample II 5 7 3 2 3Are the two samples from the same population with confidence 95%.
Solution:
n1=9 n2=5 α=0.05H0 : x̅1 = x̅2
H1 : x̅1 ≠ x̅2Because we have two samples then the test is t of d.f=n1+n2-2 ,
hence: x̅1= 18/9=2 x̅2= 20/5=4 and S1²= 16/8=2 S2²= 16/4=4Sp²= (8*2 + 4*4)/ (9+5-2) = (16+16)/12 = 32/12 = 2.667
then Sp =√2.667 = 1.633 SEPooled= 1.633 √(1/9 +1/5) =0.894
and tcal= (2 - 4) /0.894 = -2.236While t(d.f , α/2) =t(12, 0.025)=2.179 so ttab=- 2.179
Therefore tcal< ttab hence we reject Ho and accept H1at 5% level of
significance , and the two samples aren’t from the same population
with 95% confidence.
Test The Two Correlated Samples
If a sample of size n was measured in its control condition and then
the same sample was put under experiment then the differencesbetween the cases before and after experiment is tested , hence the
mean of these differences can give an idea about the effect of the
experiment . Some call this the test of two dependent samples .
So if d represent the difference between before and after then d̅
is differences mean , and SEd is the differences standard error
where SEd = σd /√n while σd is the differences standard deviation.
Hence :-
If is large (n > 30) then the test is Z where Zcal = d̅ / SEd
While if n is small (n ≤ 30) then the test is t where tcal = d̅ / SEd
Note that the tabulated value for t is of d.f =n-1 and α as a level of
significance .
Example (5)
An experiment was done on 5 persons after taking their measurementsin their control condition , and the following were the results :-
Control : 3 5 3 2 2
Experiment : 7 8 3 3 4
Give your conclusion about the experiment effect with 0.10 level of
significance ?
Solution :-
n =5 α=0.10
Ho: d̅ = 0H1: d̅ ≠ 0
d : 4 3 0 1 2
d̅ =10/5=2 Sd² =10/4=2.5 Sd=1.581 SEd=1.581/√5= 0.707
tcal= 2 / 0.707 =2.828 while t(4 ,0.05 )= 2.132 hence ttab=2.132
so tcal > ttab that is to reject Ho and accept H1 at 10% level of
significance . Therefore the experiment has an effect on sample (by
increasing its measurements) with 0.10 level of significance .
The Chi Square Test
X² – testIt is one of the non parametric tests , it is used to test the frequencies
Of a frequency table .
Suppose that in a sample a set of possible events (arranged in classes)
are occurred with frequencies O1,O2,….,OK (the observed frequencies)
and that according to the probabilities p1, p2,….,pk .
Hence they are expected to occur with the frequencies e1,e2,….,ek
(expected frequencies or theoretical frequencies)
And we wish to know whether the observed frequencies differ
significantly from the expected frequencies . This is can be done by
using Chi Square Test.
The Chi Square Test is denoted by X² which assumes :-
ΣOi = Σei = n
For a single table:
If a frequency table has K classes that is the frequencies
are in one row of size (1xk) or in a column of size (kx1) ,
Then the computed Chi Square Test is expressed as
follow :-
X² = Σ ((oi – ei)² / ei)
Where each ei =npi and pi is the probability of i-th class
The H0 usually is assumed to be oi = ei for each i , or the
probabilities assumed to find the ei’s are true .
The X²cal above is compared with X²tab of d.f=k-1 and α as
a level of significance that is the tabulated value is ꭓ²(d.f, α)
Hence if X²cal ≤ X²(d.f, α) then the H0 is accepted .
Example (6)
An experiment concerned with blood groups was carried.
A sample of 100 patients was drawn , and the data were
as follow :-
O
B
A
AB
Blood groups
10
20
20
50
No. of patients
Give an evidence that the blood groups (AB , A , B ,O)
appear according to the following proportions(9:3:3:1)respectively . Make your decision on 10% level of significance .
Solution
n=∑f =100 the proportions (9:3:3:1) α=0.10
H0: The proportions assumed are true .
H1 :The proportions assume aren’t true .
(o-e)2/e
(o-e)2
o - eExpected
e=np
Proportion
P
Frequency
O
The classes
0.69
39.1
-6.25
56.25
9/16= 0.5625
50
AB
0.08
1.6
1.25
18.75
3/16= 0.1875
20
A
0.08
1.6
1.25
18.75
3/16= 0.1875
20
B
2.26
14.1
3.75
6.25
1/16= 0.0625
10
O
3.11
0.0
100
1.0
100
Total
so X²cal= 3.11 while for X²tab d.f= k-1= 4-1=3 and α=0.10
X²(d.f, α)= X²(3, 0.10)= 11.350Hence X²cal ≤ X²(3, 0.10) then the H0 is accepted at 0.10 level of
significance . Therefore the proportions we assume are true at
10% level of significance .
For contingency table :-
If the frequency table is arranged in R rows and C columns , so thefrequencies are distributed in (RxC) table
The Column’s classes
1 2 ………….. C sum row Ri
R1
O1c
….................
O12
O11
R2
O2c
…................
O22
O21
:
:
:
:
:
:
:
:
:
:
:
:
Rr
Orc
……...............
Or2
Or1
T
Cc
……...............
C2
C1
Sum column total
Where each of the expected values is computed as :-
eij =(Ri x Cj) / T
Where Ri is the sum of row i
Cj is the sum of column j
T is the total sum of of all values in the table .
The H0 assumes there is no correlation between the variable
on row and the variable on column .
The tabulated X² is found under d.f=(R-1)(C-1) and α as a level
of significance that is the tabulated value is X²(d.f, α) .
Therefore if X²cal ≤ X² (d.f, α) then the H0 is accepted under α as a level of significance.
Example (7)
In testing the effectiveness of a drug, two groups of patients were
used and the results were tabulated as follow :-
The Recover ness
Don’t
Low
Fast
Drug Groups
28
32
60
Control
45
17
28
Experiment
Give your decision under 0.05 level of significance .
Solution
α=0.05 ∑f =T= 210
The Recover ness
Cj
Don’t
Low
Fast
Drug Groups
120
28
32
60
Control
90
45
17
28
Experiment
210
73
49
88
Ri
H0: No correlation between the drug and the recover ness .
H1: There is a correlation ,i.e the drug has an effect on the sample.
(o-e)2/e
(o-e)2
(o-e)
E
O
1.87
( 9.7)²=94.09
60-50.3=9.7
(120x 88)/ 210=50.3
60
2.49
(-9.7) ²=94.09
28-37.7=-9.7
( 90 x 88)/ 210=37.7
28
0.57
( 4) ²=16
32-28.0=4
(120x 49)/ 210=28.0
32
0.76
(-4 (²=16
17-21.0=-4
( 90 x 49)/ 210=21.0
17
4.5
(-13.7)²=187.69
28-41.7=-13.7
(120x 73)/ 210=41.7
28
5.99
( 13.7)²=187.69
45-31.3=13.7
( 90 x 73)/ 210=31.3
45
10.49
0.0
210
210
X2 cal =10.49
d.f=(R-1)(C-1)= (2-1)(3-1)= 2 then the tabulated value is X² (d.f, α)
X²(2,0.05) = 5.99 . So X²cal ≥ X²tab , therefore H0 is rejected and H1 is
accepted under 0.05 level of significance . Hence there is a correlation
between the recover ness and the drug , that is the drug does effect
with 0.05 level of significance .
Notes :-
If d.f=1 and that is when the single table is of two classes or the
contingency table is (2x2) then X² should be corrected ,that is for
changing the result from a continuous to discrete data, so ꭓ²cal will
be in the form :- X²cal =∑(|o-e|-0. 5)² /e)
Example (8)
The vaccine B.C.G. was given to 100 students 50 males and
50 females . The response was as follows :-
-Ve
+Ve
Response
Sex
30
20
Male
26
24
Female
Test the effectiveness of B.C.G on sex at 10% level of significance .
Solution
n=Σf =T=100 α=0.10
H0 : There is no correlation between the B.C.G and sex .
H1 : There is a correlation between the B.C.G and sex .
d.f =(2-1)(2-1)= 1 Hence X² needs correction.
Ri
-Ve
+Ve
50
30
20
M
50
26
24
F
100
56
44
Cj
(│o-e │- 0. 5)2/e
│o-e │- 0. 5o-e
E
o
2.25 /22 =0.102
1.5
-2
(50x44)/100 =22
20
2.25 /22 =0.102
1.5
2
(50x44)/100 =22
24
2.25 /28 =0.080
1.5
2
(50x56)/100 =28
30
2.25 /28 =0.080
1.5
-2
(50x56)/100 =28
26
0.365
0
100
100
X²(1 ,0.10) = 2.71 then X²cal <X²tab Therefore H0 is accepted at 10%
level of significance , that is no correlation between the variables and
the sex doesn’t effect the B.C.G with 90% confidence.