statistical interference docx - د.زهراء

طب مجتمع / د.زهراء / طب عام ثالث

نظري (4) 26/3/2017
طب مجتمع / د.زهراء / طب عام ثالث
نظري (4) 26/3/2017

Statistical Inference

At the end of the lecture student will be able to:

-Identify the meaning of statistical inference.

-Determine types of error.

-Classify Z-Test according to the type of data and

number of samples.

-Classify T-Test according to the number of samples and equality of
Variances.
-Practice the use of X2 - Test in different conditions.

Population : Large no. of subject or object with common observable character. (no. of babies born in Iraq in 2004).

N → size

µ → mean

σ2 → variance

P → proportion

Sample:

Is subset ( sub group ) of population and have the same
character. (no. of babies born in Mosul city in 2004).
[sample is adopted to reduce time , labor and money].

n → size

x̅ → mean

S2 → variance

p → proportion

Statistical Inference :

It is the process in which we infer the properties of the population through the study of the properties of a sample belonging to that population.
Statistical hypothesis :
It is statement about the parameters (µ,σ2) of population been sampled which may or may not be true.

Null Hypothesis (Ho) :

In general this term relate to particular hypothesis
under test . It is often a hypothesis of no difference,
other wise it is called alternative hypothesis either
single or plural.

We denote null hypothesis: Ho

We denote alternative hypothesis: H1 or HA
Test of significance: It is the procedure which enable us to decide whether to accept or reject the hypothesis .
Type of error:
Type │error :
If the hypothesis is rejected when it is true , that is type one error and is denoted by α .
Type ║ error :
If the hypothesis is accepted when it is not true ,that is type two error and is denoted by β .
Level of significance:
It is the maximum probability with which we would be willing to risk type one error ,that is denoted by α .
(in practice α usually is 0.10 , 0.05 , 0.01). e.g. if α=0.05=5% that is there are about 5 chances in 100 attempts that we would reject the hypothesis when it should be accepted (we are about 95% confident that we have made the right decision and wrong with probability 5%).

Test Statistic :

Mathematical expression of sample values which

provide a base for testing a statistical hypo.

- Z - Statistic Test (Parametric Test)

- T - Statistic Test (Parametric Test)
- X2- Statistic Test (Non-Parametric Test)

One and Two tailed test :

If we decided that :-

H0 : µ = x̅ then the alternative hypothesis is either :-

H1 : µ ≠ x̅ that is two tailed test or
H1 : µ < x̅ that is one tailed test or
H1 : µ > x̅ and that is one tailed test too .

If we decided that α=0.05 then the tabulated z is 1.96 and the area or region outside the range -1.96 to 1.96 is called the critical region or rejection region or region of significance . While the region inside the range -1.96 to 1.96 is called the region of acceptances of the hypothesis or region of non significance .

Level of significance critical values of Z

α 0.10 0.05 0.01 0.005

For one tailed test

1.28
1.645
2.33
2.58
For two tailed test
1.645
1.96
2.58
2.81

Z –Test

(For n > 30 large & σ2 known)

problem

Is it reasonable to conclude that a sample of n
observation with mean x̅ could have been taken
from population with mean µ & variance σ2 ?

Solution:
1. -State the null hypo. & its alternative where :
HO : The difference between sample mean & pop.
Mean is merely (only) due to sampling error or
(due to chance).
H1 : The difference between sample mean & pop. Mean is real.

H0 : x̅ = µ
H1 : x̅ ≠ µ

2. calculate Z-statistics where:

Zcal=(X-µ)σ√n

3. compare the absolute of Zcal (│Zcal│) with Z tab i.e Z(one or two sided test, α or (α /2))and decide:

If (│Zcal│) ˂ Ztab → Accepted H0 the difference nonsignificance.

If (│Zcal│) ≥ Ztab → rejected H0 .

4. make conclusion and decision.

Example (1)

In I.Q. test a population with mean 100 and standard deviation 15

was tested by choosing a sample of size 49 persons , where found
that the sample mean was 106 . Dose this sample represent the
population ? use 0.05 as level of significance .
Solution
µ=100 σ=15 n=49 x̅=106 α=0.05
H0: x̅ = µ
H1: x̅ ≠ µ
n>30 therefore the test is Z where Zcal = (x̅ - µ)/ SE
and SE= σ /√n =15/ √49=2.143
hence Zcal =(106-100)/2.143 =2.80
while Ztab=Z(0.05 two tailed)= 1.96
therefore Zcal >Ztab then we reject H0 and accept H1 at 0.05 level of
significance , so this sample doesn’t represent that population with
5% level of significance.

-Test The Proportion Of One Sample

State the null hypothesis and its alternative. Where
H0 : P = p
H1 : P ≠ p

If P is a proportion in population where P= A / N such that :
A is the number of cases under consideration , while N is the
population size .
Similarly p is a proportion for sample which corresponds to P where
p= a / n and q=1 - p (q is the proportion of cases which doesn’t
have the cause under consideration). Using
SE=pqn

Zcal= (p-P)SE

Then compare Zcal with Ztab after that draw conclusion and make decision.

Example (2)

A manufacture claimed that it was 90% a medicine is effective in
leaving an allergy for a period of 8 hours . In a sample of 200
persons who had allergy , the medicine provided relieve for 160
persons . Determine whether the manufacture’s claime is true ,
use 10% as a level of significance ?

Solution :

P=0.90 n=200 a=160 α=0.10
Ho: P =p
H1: P ≠p

Because n>30 then the test is Z of two sided , hence: Zcal= (p - P) /SE

where p=160/200=0.8 and SE =s /√n
but s²= pq = 0.80(1-0.80)= 0.16 then s= √0.16= 0.4
so SE= 0.4/√200 = 0.0283 hence:
Zcal= (0.8-0.9)/0.0283 = -3.533
Ztab= -1.645
hence Zcal < Ztab so we reject Ho and accept H1 at 0.10 level of significance .
Therefore the manufacture’s claime is not true with confidence 90%.

Test The Two Independent Means

State the null hypothesis and its alternative. Where
H0 : µ1 = µ2
H1 : µ1 ≠ µ2

If the two populations of size N1,N2 respectively have means µ1 , µ2 and the two populations are uncorrelated then the test is :-

Z cal=(µ1- µ2 )/SEPooled Where SEPooled =σp√(1/N1+1/N2)

also σp² is the pooled variance where σp² =(N1 σ1²+ N2 σ2²)/(N1+N2)

Then compare Zcal with Ztab after that draw conclusion and make decision.

T –Test

(For n ≤ 30 large & σ2 unknown)

problem

Is it reasonable to conclude that a sample of n

observation with mean x̅ could have been taken
from population with mean µ & variance σ2 ?

Solution:

1. -State the null hypo. & its alternative where :
HO : The difference between sample mean & pop.
Mean is merely (only) due to sampling error or
(due to chance).
H1 : The difference between sample mean & pop. Mean is real.

H0 : x̅ = µ
H1 : x̅ ≠ µ

2. calculate T-statistics where:

tcal=(X-µ)s√n
3. compare the absolute of Zcal (│Zcal│) with Z tab of d.f = n-1 and level of significance = α i.e t(α or α/2) and decide:

If (│tcal│) ˂ ttab → Accepted H0 the difference nonsignificance.

If (│tcal│) ≥ ttab → rejected H0 .

4. make conclusion and decision.

Example (3)

On a test in math which takes 150 min. , a new method of teaching

was used on 25 students and was found that their mean is 135 min.
with sd 10 min. .Dose the new method give a real result or the
differences in means are due to chance ? Use 10% level of significance .

Solution

µ=150 n=25 x̅=135 S=10 α=0.10

H0: x̅ = µ

H1: x̅ ≠ µ

n<30 therefore the test is t where tcal = (x̅ - µ)/ SE

and SE=s /√n =10/ √25=2.0
hence tcal =(135-150)/2 =-7.5
for ttab d.f=n-1 =25-1=24 so ttab=t(d.f, α)=t(24,0.05)=1.71
therefore |tcal | >ttab then we reject H0 and accept H1 at
0.10 level of significance , so this sample doesn’t
represent that population with 10% level of significance.

Test The Proportion Of One Sample when n ≤ 30

State the null hypothesis and its alternative. Where
H0 : P = p
H1 : P ≠ P

If P is a proportion in population where P= A / N such that :

A is the number of cases under consideration , while N is the
population size .
Similarly p is a proportion for sample which corresponds to P where
p= a / n and q=1 - p (q is the proportion of cases which doesn’t
have the cause under consideration). Using

SE=pqn

tcal= (p-P)SE

Then compare Zcal with Ztab of d.f =n-1 after that draw conclusion and make decision.

Test The Two Independent Means n ≤ 30

State the null hypothesis and its alternative. Where

H0 : x̅1 = x̅2
H1 : x̅1 ≠ x̅2

If the two populations of size n1,n2 respectively have means x̅1 , x̅2 and the two populations are uncorrelated then the test is :-

t cal=( x̅1- x̅2 )/SEPooled Where SEPooled =Sp√(1/n1+1/n2)

also Sp² is the pooled variance where Sp² =((n1-1) S1²+ (n2-1) S2²)/(n1+n2-2)

Then compare tcal with ttab if n1 and n2 less than 30 while the ttab is found under d.f = n1+ n2 – 2 and α as considered after that draw conclusion and make decision.

Example (4)

Two samples were and the data were as following :-

Sample I 2 5 1 2 2 3 0 2 1

Sample II 5 7 3 2 3
Are the two samples from the same population with confidence 95%.

Solution:

n1=9 n2=5 α=0.05

H0 : x̅1 = x̅2

H1 : x̅1 ≠ x̅2

Because we have two samples then the test is t of d.f=n1+n2-2 ,

hence: x̅1= 18/9=2 x̅2= 20/5=4 and S1²= 16/8=2 S2²= 16/4=4
Sp²= (8*2 + 4*4)/ (9+5-2) = (16+16)/12 = 32/12 = 2.667

then Sp =√2.667 = 1.633 SEPooled= 1.633 √(1/9 +1/5) =0.894

and tcal= (2 - 4) /0.894 = -2.236
While t(d.f , α/2) =t(12, 0.025)=2.179 so ttab=- 2.179
Therefore tcal< ttab hence we reject Ho and accept H1at 5% level of
significance , and the two samples aren’t from the same population
with 95% confidence.

Test The Two Correlated Samples

If a sample of size n was measured in its control condition and then

the same sample was put under experiment then the differences
between the cases before and after experiment is tested , hence the
mean of these differences can give an idea about the effect of the
experiment . Some call this the test of two dependent samples .
So if d represent the difference between before and after then d̅
is differences mean , and SEd is the differences standard error
where SEd = σd /√n while σd is the differences standard deviation.
Hence :-
If is large (n > 30) then the test is Z where Zcal = d̅ / SEd
While if n is small (n ≤ 30) then the test is t where tcal = d̅ / SEd
Note that the tabulated value for t is of d.f =n-1 and α as a level of
significance .

Example (5)

An experiment was done on 5 persons after taking their measurements
in their control condition , and the following were the results :-
Control : 3 5 3 2 2
Experiment : 7 8 3 3 4
Give your conclusion about the experiment effect with 0.10 level of
significance ?

Solution :-

n =5 α=0.10

Ho: d̅ = 0
H1: d̅ ≠ 0
d : 4 3 0 1 2
d̅ =10/5=2 Sd² =10/4=2.5 Sd=1.581 SEd=1.581/√5= 0.707
tcal= 2 / 0.707 =2.828 while t(4 ,0.05 )= 2.132 hence ttab=2.132
so tcal > ttab that is to reject Ho and accept H1 at 10% level of
significance . Therefore the experiment has an effect on sample (by
increasing its measurements) with 0.10 level of significance .

The Chi Square Test

X² – test
It is one of the non parametric tests , it is used to test the frequencies
Of a frequency table .
Suppose that in a sample a set of possible events (arranged in classes)
are occurred with frequencies O1,O2,….,OK (the observed frequencies)
and that according to the probabilities p1, p2,….,pk .
Hence they are expected to occur with the frequencies e1,e2,….,ek
(expected frequencies or theoretical frequencies)
And we wish to know whether the observed frequencies differ
significantly from the expected frequencies . This is can be done by
using Chi Square Test.
The Chi Square Test is denoted by X² which assumes :-
ΣOi = Σei = n
For a single table:
If a frequency table has K classes that is the frequencies
are in one row of size (1xk) or in a column of size (kx1) ,
Then the computed Chi Square Test is expressed as
follow :-
X² = Σ ((oi – ei)² / ei)
Where each ei =npi and pi is the probability of i-th class
The H0 usually is assumed to be oi = ei for each i , or the
probabilities assumed to find the ei’s are true .
The X²cal above is compared with X²tab of d.f=k-1 and α as
a level of significance that is the tabulated value is ꭓ²(d.f, α)
Hence if X²cal ≤ X²(d.f, α) then the H0 is accepted .

Example (6)
An experiment concerned with blood groups was carried.
A sample of 100 patients was drawn , and the data were
as follow :-
O
B
A
AB
Blood groups
10
20
20
50
No. of patients

Give an evidence that the blood groups (AB , A , B ,O)

appear according to the following proportions(9:3:3:1)
respectively . Make your decision on 10% level of significance .
Solution
n=∑f =100 the proportions (9:3:3:1) α=0.10
H0: The proportions assumed are true .
H1 :The proportions assume aren’t true .

(o-e)2/e

(o-e)2

o - e
Expected
e=np
Proportion
P
Frequency
O
The classes
0.69
39.1
-6.25
56.25
9/16= 0.5625
50
AB
0.08
1.6
1.25
18.75
3/16= 0.1875
20
A
0.08
1.6
1.25
18.75
3/16= 0.1875
20
B
2.26
14.1
3.75
6.25
1/16= 0.0625
10
O
3.11

0.0
100
1.0
100
Total

so X²cal= 3.11 while for X²tab d.f= k-1= 4-1=3 and α=0.10

X²(d.f, α)= X²(3, 0.10)= 11.350
Hence X²cal ≤ X²(3, 0.10) then the H0 is accepted at 0.10 level of
significance . Therefore the proportions we assume are true at
10% level of significance .

For contingency table :-

If the frequency table is arranged in R rows and C columns , so the
frequencies are distributed in (RxC) table
The Column’s classes
1 2 ………….. C sum row Ri
R1
O1c
….................
O12
O11
R2
O2c
…................
O22
O21
:
:
:
:
:
:

:
:
:
:
:
:
Rr
Orc
……...............
Or2
Or1
T
Cc
……...............
C2
C1
Sum column total
Where each of the expected values is computed as :-
eij =(Ri x Cj) / T
Where Ri is the sum of row i
Cj is the sum of column j
T is the total sum of of all values in the table .

The H0 assumes there is no correlation between the variable
on row and the variable on column .
The tabulated X² is found under d.f=(R-1)(C-1) and α as a level
of significance that is the tabulated value is X²(d.f, α) .
Therefore if X²cal ≤ X² (d.f, α) then the H0 is accepted under α as a level of significance.
Example (7)
In testing the effectiveness of a drug, two groups of patients were
used and the results were tabulated as follow :-
The Recover ness
Don’t
Low
Fast
Drug Groups
28
32
60
Control
45
17
28
Experiment
Give your decision under 0.05 level of significance .

Solution
α=0.05 ∑f =T= 210
The Recover ness
Cj
Don’t
Low
Fast
Drug Groups
120
28
32
60
Control
90
45
17
28
Experiment
210
73
49
88
Ri
H0: No correlation between the drug and the recover ness .
H1: There is a correlation ,i.e the drug has an effect on the sample.
(o-e)2/e
(o-e)2
(o-e)
E
O
1.87
( 9.7)²=94.09
60-50.3=9.7
(120x 88)/ 210=50.3
60
2.49
(-9.7) ²=94.09
28-37.7=-9.7
( 90 x 88)/ 210=37.7
28
0.57
( 4) ²=16
32-28.0=4
(120x 49)/ 210=28.0
32
0.76
(-4 (²=16
17-21.0=-4
( 90 x 49)/ 210=21.0
17
4.5
(-13.7)²=187.69
28-41.7=-13.7
(120x 73)/ 210=41.7
28
5.99
( 13.7)²=187.69
45-31.3=13.7
( 90 x 73)/ 210=31.3
45
10.49

0.0
210
210

X2 cal =10.49
d.f=(R-1)(C-1)= (2-1)(3-1)= 2 then the tabulated value is X² (d.f, α)
X²(2,0.05) = 5.99 . So X²cal ≥ X²tab , therefore H0 is rejected and H1 is
accepted under 0.05 level of significance . Hence there is a correlation
between the recover ness and the drug , that is the drug does effect
with 0.05 level of significance .
Notes :-
If d.f=1 and that is when the single table is of two classes or the
contingency table is (2x2) then X² should be corrected ,that is for
changing the result from a continuous to discrete data, so ꭓ²cal will
be in the form :- X²cal =∑(|o-e|-0. 5)² /e)
Example (8)
The vaccine B.C.G. was given to 100 students 50 males and
50 females . The response was as follows :-
-Ve
+Ve
Response
Sex
30
20
Male
26
24
Female
Test the effectiveness of B.C.G on sex at 10% level of significance .
Solution
n=Σf =T=100 α=0.10
H0 : There is no correlation between the B.C.G and sex .
H1 : There is a correlation between the B.C.G and sex .
d.f =(2-1)(2-1)= 1 Hence X² needs correction.
Ri
-Ve
+Ve

50
30
20
M
50
26
24
F
100
56
44
Cj

(│o-e │- 0. 5)2/e

│o-e │- 0. 5
o-e
E
o
2.25 /22 =0.102
1.5
-2
(50x44)/100 =22
20
2.25 /22 =0.102
1.5
2
(50x44)/100 =22
24
2.25 /28 =0.080
1.5
2
(50x56)/100 =28
30
2.25 /28 =0.080
1.5
-2
(50x56)/100 =28
26
0.365

0
100
100
X²(1 ,0.10) = 2.71 then X²cal <X²tab Therefore H0 is accepted at 10%
level of significance , that is no correlation between the variables and
the sex doesn’t effect the B.C.G with 90% confidence.