Chi-Squared Test
*Chi-Squared Tests
Chi-Squared Tests are hypothesis tests for Qualitative Data Categories instead of scores Based on counts or frequencies Chi-squared statistic : Measures the difference between Actual frequencies and Expected frequencies (as expected under the null hypothesis H0)The chi-squared statistic is measured as:
The closer observed frequencies are to expected frequencies, the more likely the H0 is trueChi-Squared tests
A Chi-Squared Test for Independence Chi-Squared Goodness of Fit Tests*
The chi-squared test for independence
c2 Test of Independence is used to test for a relationship between 2 Categorical variables.c2 Test of Independence(cont.)
The data: A table indicating the counts for each combination of categories for two qualitative variables The hypotheses: H0: The two variables are independent of one another H1: The two variables are associated; they are not independent*
c2 Test of Independence
The assumptions:Data set is a random sample from the population of interestIndependent observationsMutually exclusive measurement classesAverage expected cell frequency should be ≥ 5c2 Test of Independence: Procedure
1. Set Hypotheses: H0: There is no relationship between the two variables. H1: There is a relationship between the two variables. 2. Set Up Contingency Table ( A table indicating the counts for each category for two qualitative variables) 3. Compute the Expected FrequenciesThe expected frequencies tells what the counts would have been, on average, if the variables were independent 4. Check conditions. A) All expected counts should be > 1 B) At least 80% of expected counts should > 5. 5. Calculate the Test Statistic (2 )
=
n
size
sample
Total
al
Column tot
x
total
Row
Expected
*5. Determine the degrees of freedom: df = (r – 1) (c – 1) 6. Compare calculated (c2 ) with Tabulated Value and Make DecisionThe test result: Significant if the chi-squared statistic is larger than the critical value from the table *
If the chi-squared statistic is larger than the critical value from the table .The p-value is <α , this result is statistically significant. Reject the H0 . Conclude that (the two variables) are related.If the chi-squared statistic is smaller than the critical value from the table . The p-value is >α , this result is NOT statistically significant. We cannot reject the H0 Cannot conclude that (the two variables) are related. *
2 Distribution Positively skewed distribution2 values will never be negative; minimum is 02 of close to 0 indicates that the variables are independent of one another
Detailed Example
Below are the results obtained from a random sample of college students showing the distribution of the students according to their place of residence & cigarettes smoking. Is there an association between place of residence & cigarette smoking?Place of residence
Smoking No Yes
Total
Big City
21
65
86
Rural
11
130
141
Small Town
18
198
216
Suburban
37
345
382
Total
87
738
825
c2 Test (solution)
1. Set Hypotheses: H0: The 2 variables (place of residence and smoking) are independent i.e there is no relationship between the two variables H1: The 2 variables are related (There is a relationship between place of residence and smoking habit)2. Computing Expected Frequencies Multiply row total by column total, then divide by the overall total Expected frequency=
E1,1 = (R1xC1)/n = (86x87)/825 = 9.07, E1,2 = (R1xC2)/n = (86x738)/825 = 76.93,
xtotal
row
x
total
row
Grand total
Column total
Detailed Example
No Yes All Big_City 21 65 86 ( 9.07) (76.93) (86.0) Rural 11 130 141 (14.87) (126.13) (141.0) SmallTown 18 198 216 (22.78) (193.22) (216.0) Suburban 37 345 382 (40.28) (341.72) (382.0) All 87 738 825 (87.0) (738.0) (825.0)3. calculate the test statistic
3. Chi- Square statistic χ2 = sum {(Observed – Expected)2/Expected} = (21-9.07)2/9.07+ (65-76.93)2/76.93 + (11-14.87)2/14.87+ (130-126.13)2/126.13 + (18-22.78)2/22.78+ (198-193.22)2/193.22 + (37-40.28)2/40.28+ (345-341.72)2/341.72 = 20.091
Degrees of freedom(Rows – 1)(Column – 1) = (4 – 1)(2 – 1) = 3ConclusionCompare the calculated 2 value to the critical value at 95% level (7.815),p<0 .05Compare the calculated 2 value to the critical value at 99% level (11.345) , p<0.01
Reject H0 P<0.01 There is a relationship between place of residence of the student and their smoking habit
c2 Test of Independence
Based on counts or frequencies not percentages or proportions. The expected frequency in each cell should not be less than 5. Shows whether a relationship exists between two variables of interest. Does Not show nature of relationship Does Not measure the strength of the association between two variables Does Not show causalityTesting Goodness -of- Fit
Testing Whether Population Proportions are Equal to Known Reference Values The chi-square test for equality of percentages or probabilities Could a table of observed counts have reasonably come from a population with known proportions (the reference values)?Testing Goodness-of-Fit
The data: A table indicating the frequency for each category for a single qualitative variable The hypotheses: H0: The population proportions are equal to a set of known, fixed reference values H1: The population proportions are not equal to a set of known, fixed reference valuesThe expected frequencies: For each category, multiply the population reference proportion by the sample size, n The assumptions: 1. Data set is a random sample from the population of interest 2. At least 5 counts are expected in each category The chi-square statistic: The degrees of freedom: Number of categories minus 1 The test result: Significant if the chi-square statistic is larger than the critical value from the table
Example: Chi-Square Goodness of Fit Test
Are children preferences for tooth pastes the same as those historically observed in the general population?n = 400
Children’s Brand
known proportions
Frequency
1
20%
102
2
35%
121
3
30%
120
4
15%
57
H0: p1 = 0.20, p2 = 0.35, p3 = 0.30, p4 =0.15 Ha: Not All pj Are As Stated
children
Expected
Brand
General population
Frequency
Frequency
1
20%
102
80
2
35%
121
140
3
30%
120
120
4
15%
57
60
df = k - 1 = 4 - 1 = 3
Special Case - Analyzing 2x2 tables
The general form of a 2x2 table is
In this case, the chi-square statistic has the following simplified form,
Column 1Column 2
Total
Row 1
A
B
R1
Row 2
C
D
R2
Total
C1
C2
n
Relationship Between Chi-Squared and Z test of 2 Proportions
When do we use Chi-Squared and when do we use z test of 2 proportions?Situation 1: Both categorical variables of interest have exactly 2 levels. Question - Is there a relationship between the variables, or is there a difference in the proportions? Answer - Either Chi-Squared or Two Tailed Z Test of 2-proportions will lead to the same conclusion! In this case, the χ2statistic = (z-statistic)2, and the p-values of the two tests are equal.Situation 2: Both categorical variables of interest have exactly 2 levels. Question - Is one proportion greater/smaller than the other. Answer - This is a one-Tailed Z test and you MUST use a test of 2 proportions. Situation 3: At least one of the two categorical variables of interest has MORE than 2 levels. Question - Is there a relationship between the variables? Answer - MUST use a Chi-Squared Test.