Tải bản đầy đủ

Example 6.8: Testing the Independence of Alcohol Dependence and Education Level in Young Adults ( Ages 18– 28) Using the NCS- R Data

169

Categorical Data Analysis

TableÂ€6.6

Design-Based Analysis of the Association between NCS-R Alcohol

Dependence and Education Level for Young Adults Aged 18–28

Alcohol Dependence Row Proportions (Linearized SE)

Education Level

(Grades)

0–11

12

13–15

16+

Total

Unadjusted X2

X2Pearson = 27.21

n18-28 = 1,275

0 = No

1 = Yes

Total

0.909 (0.029)

0.951 (0.014)

0.951 (0.010)

0.931 (0.014)

0.940 (0.009)

0.091 (0.029)

0.049 (0.014)

0.049 (0.010)

0.069 (0.014)

0.060 (0.009)

1.000

1.000

1.000

1.000

1.000

Tests of Independence

Rao–Scott F

P ( χ(23 ) > X2Pearson)

p < 0.0001

FR-S,Pearson = 1.64

P (F2.75, 115.53 > FR-S)

p = 0.18

Parameters of the Rao–Scott Design-Adjusted Test

Design df = 42

GDEFF = 6.62

a = 0.56

school, 3 = some college, 4 = college and above). The analysis is restricted to the

subpopulation of NCS-R Part II respondents 18–28 years of age. After identifying

the complex design features to Stata, we request the cross-tabulation analysis and

any related design-adjusted test statistics by using the svy: tab command:

svyset seclustr [pweight = ncsrwtlg], strata(sestrat)

svy, subpop(if 18<=age<29): tab ed4cat ald, row se ci deff

ED4CAT is specified as the row (factor) variable and ALD as the column

(response) variable. Weighted estimates of the row proportions are requested

using the row option. TableÂ€6.6 summarizes the estimated row proportions and

standard errors for the ALD × ED4CAT crosstabulation along with the Rao–Scott

F-test of independence.

An estimated 9.1% of young adults in the lowest education group have been

diagnosed with alcohol dependence at some point in their lifetime (95% CI = 4.7%,

17.0%), while an estimated 6.9% of young adults in the highest education group

have been diagnosed with alcohol dependence (95% CI = 4.6%, 10.2%). By default,

Stata reports the standard uncorrected Pearson chi-square test statistic ( X 2Pearson =

27.21, p < 0.0001) and then reports the (second-order) design-adjusted Rao–Scott

F-test statistic (FR-S,Pearson = 1.64, p = 0.18) (see TableÂ€6.5). The standard Pearson X2

test rejects the null hypothesis of independence at α = 0.05; however, when the

corrections for the complex sample design are introduced, the Rao–Scott designadjusted test statistic fails to reject a null hypothesis of independence between education and a lifetime diagnosis of alcohol dependence in this younger population.

The appropriate inference in this case would thus be that there is no evidence of a

bivariate association between these two categorical factors in this subpopulation.

Multivariate analyses examining additional potential predictors of alcohol dependence could certainly be examined at this point (see Chapter 8 for examples).

© 2010 by Taylor and Francis Group, LLC

170

Applied Survey Data Analysis

We remind readers that Stata is using a second-order design correction for the

test statistic, which is why the results of these analyses may differ from those

found using other software packages (note the decimal degrees of freedom for

the design-adjusted F-statistic in Table 6.6, due to the second-order correction).

If a user specifies the deff option, Stata also reports both the mean generalized design effect (GDEFF = 6.63) used in the first-order correction and the

coefficient of variation of the generalized design effects (a = 0.56) used in the

second-order correction.

Additional test statistics, including design-adjusted likelihood ratio and Wald

test statistics, can be requested in Stata by using the lr and wald options for the

svy: tab command. These options do not lead to substantially different conclusions in this illustration and will generally not lead to different inferences about

associations between two categorical variables. As mentioned previously, Stata

developers advocate the use of the second-order design-adjusted Pearson chisquare statistic (or the Rao–Scott chi-square statistic and its F-transformed version)

in all situations involving crosstabulations of two categorical variables measured

in complex sample surveys (Sribney, 1998).

6.4.5â•‡ Odds Ratios and Relative Risks

The odds ratio, which we denote by ψ, can be used to quantify the association

between the levels of a response variable and a categorical factor. FigureÂ€6.7

displays NCS-R weighted estimates (row proportions) of the prevalence of

one or more lifetime episodes of major depression by gender.

The odds ratio compares the odds that the response variable takes a specific value across two levels of the factor variable. If the response variable

is truly independent of the chosen factor, then ψ = 1.0. For example, from

FigureÂ€6.7, the estimated male (A)/female (B) odds ratio for MDE is

ψˆ =

Odds( MDE = 1|Male )

p /(1 − p1|A ) p1|A / p0|A 0.151 / 0.849

=

=

= 0.595

= 1|A

Odds( MDE = 1|Female ) p1|B /(1 − p1|B ) p1|B / p0|B 0.230 / 0.770

MDE

SEX

0

1

A—Male

p0|A =

Nˆ A0

= 0.849

Nˆ A+

p1|A =

Nˆ A1

= 0.151

Nˆ A+

pA+ = 1.0

B—Female

p0|B =

Nˆ B0

= 0.770

Nˆ B+

p1|B =

Nˆ B1

= 0.230

Nˆ B+

pB+ = 1.0

FigureÂ€6.7

Estimates of row proportions for MDE by gender.

© 2010 by Taylor and Francis Group, LLC

Categorical Data Analysis

171

Note that although this estimate of ψ is computed using the estimated row

proportions for the SEX × MDE table, the same estimate would be obtained

if the estimated total proportions had been used (TableÂ€6.4):

ψˆ =

pA1 / pA0 0.072 /0.407

=

= 0.595

pB1 / pB0 0.120 /0.402

Since this odds ratio is estimated with no additional controls for other

factors such as age or education, it is labeled as an unadjusted odds ratio.

Note that a correct description of this result is the following: “The odds that

adult men experience major depression in their lifetime are estimated to be

only 59.5% as large as the odds for women.” A common mistake in reporting

results for estimated odds ratios is to make a statement like the following:

“The probability that a man experiences an episode of major depression in

their lifetime is 59% of that for women.”

The latter statement is confusing the odds ratio statistic with a related, yet

different, comparative measure, the relative risk (computed here using the

estimates in TableÂ€6.4):

ˆ = Prob( MDE = 1|Male ) = p1|A = 0.151 = 0.656

RR

Prob( MDE = 1|Female ) p1|B 0.230

The relative risk is the ratio of two conditional probabilities: the probability of MDE for males and the probability of MDE for females. Although

both the odds ratio and the relative risk measure the association of a categorical response and a factor variable, they should be distinguished. Only

in instances where the prevalence of the response of interest is very small for

all levels of the factor (i.e., p1|A and p1|B < 0.01 ) will the odds ratio and relative

risk statistics converge to similar numerical values.

If the response and factor variables are independent, then

ψ = 1.0 (and RR = 1.0). Therefore, to test if categorical response and factor

variables are independent, it would be reasonable to construct a confidence

interval of the form ψˆ ± t1−α/2 ,df ⋅ se( ψˆ ) , and establish whether the null value

of ψ = 1 is contained within the interval. Although a TSL approximation to

se( ψˆ ) can be derived directly, a CI for ψ is generally obtained from the technique of simple logistic regression.

6.4.6â•‡ Simple Logistic Regression to Estimate the Odds Ratio

Logistic regression for binary dependent variables will be covered in depth

in Chapter 8. Here, the logit function and simple logistic regression models

are briefly introduced to demonstrate their application to estimation of the

unadjusted odds ratio and its confidence interval.

© 2010 by Taylor and Francis Group, LLC

172

Applied Survey Data Analysis

The natural logarithm of the odds is termed a logit function. Again, using

the NCS-R MDE example in TableÂ€6.4, the logits of the probabilities of MDE

for the male and female factor levels are

p

0.151

= −1.727

logit( p1|A ) = ln(Odds( MDE = 1|Male )) = ln 1|A = ln

0.849

1 − p1|A

p

0.230

= −1.208

logit( p1|B ) = ln(Odds( MDE = 1|Female ) = ln 1|B = ln

0.770

1 − p1|B

Consider a single indicator variable, Imale, coded 1 = male and 0 = female,

that distinguishes the two levels of SEX. The outcome MDE is coded 1 =

yes, 0 = no. A simple logistic regression model for these data is written

as follows:

ψˆ =

p1|A /(1 − p1|A ) exp( logit( p1|A )) exp(βˆ 0 + βˆ 1 ⋅ 1)

=

= exp(βˆ 1 )

=

p1|B /(1 − p1|B ) exp( logit( p1|B )) exp(βˆ 0 + βˆ 1 ⋅ 0)

Then, we can derive the following result:

CI ( ψ ) = (exp(βˆ 1 − t1−α/2 ,df ⋅ se(βˆ 1 )), exp(βˆ 1 + t1−α/2 ,df ⋅ se(βˆ 1 )))

The resulting confidence interval is not symmetric about the estimated

odds ratio but has been shown to provide more accurate coverage of the true

population value for a specified level of Type I error (α).

Example 6.9:â•‡ Simple Logistic Regression to

Estimate the NCS-R Male/Female Odds Ratio

for Lifetime Major Depressive Episode

As mentioned previously, logistic regression will be covered in detail in later chapters. Here, a simple logistic regression of the NCS-R MDE variable on the indicator

of male gender (SEXM) is used to illustrate the technique for estimating the unadjusted Male/Female odds ratio for MDE and a 95% CI for that odds ratio:

svyset seclustr [pweight = ncsrwtsh], strata(sestrat)

svy: logistic mde sexm

From the output provided by the svy: logistic command, the estimated

odds ratio and a 95% CI for the population odds ratio are as follows:

ψˆ MDE (SE)

0.597 (0.041)

© 2010 by Taylor and Francis Group, LLC

CI .95 ( ψ )

(0.520, 0.685)

173

Categorical Data Analysis

Based on this analysis, the odds that an adult male has experienced a lifetime

MDE are only 59.7% as large as the odds of MDE for adult females, which agrees

(allowing for some rounding error) with the simple direct calculation. Since the

95% CI does not include ψ = 1, we would reject the null hypothesis that MDE

status is independent of gender.

6.4.7â•‡Bivariate Graphical Analysis

Graphical displays also are useful tools to describe the bivariate distribution

of two categorical variables. The following Stata graphics command generates gender-specific vertical bar charts for the BP_CAT variable generated

in Example 6.3 (note that the pweight option is used to specify the survey

weights, and the over() option is used to generate a plot for each level of

gender). The output is shown in FigureÂ€6.8.

graph bar (mean) bp_cat1 bp_cat2 bp_cat3 bp_cat4 ///

[pweight=wtmec2yr] if age18p==1, blabel(bar, format(%9.1f) ///

color(none)) bar(1,color(gs12)) bar(2,color(gs4)) ///

bar(3,color(gs8)) bar(4,color(black)) ///

bargap(7) scheme(s2mono) over(riagendr) percentages ///

legend (label(1 “Normal”) label(2 “Pre-Hypertensive”) ///

label(3 “Stage 1 Hypertensive”) label(4 “Stage 2 ///

Hypertensive”)) ytitle(“Percentage”)

60

53.7%

Percentage

49.9%

40

40.0%

34.4%

20

8.9%

8.4%

0

3.0%

1.7%

Male

Normal

Stage 1 Hypertensive

Female

Pre-Hypertensive

Stage 2 Hypertensive

FigureÂ€6.8

Bar chart of the estimated distribution of blood pressure status of U.S. adult men and women.

(Modified from the 2005–2006 NHANES data.)

© 2010 by Taylor and Francis Group, LLC

## 2010 applied survey data analysis

## 4 Simple Random Sampling: A Simple Model for Design- Based Inference

## 2 Analysis Weights: Review by the Data User

## Example 5.1: A Weighted Histogram of Total Cholesterol Using the 2005– 2006 NHANES Data

## Example 5.8: Estimating Population Quantiles for Total Household Assets Using the HRS Data

## Example 5.11: Estimating Mean Systolic Blood Pressure for Males and Females Age > 45 Using the NHANES Data

## Example 5.13: E stimating Differences in Mean Total Household Assets from 2004 to 2006 Using Data from the HRS

## Example 6.4: A Goodness- of- Fit Test for Blood Pressure Status Category Proportions

## Example 6.6 E stimation of Total and Row Proportions for the Cross- Tabulation of Gender and Lifetime Major Depression Status Using the NCS- R Data

## Example 6.9: Simple Logistic Regression to Estimate the NCS- R Male/ Female Odds Ratio for Lifetime Major Depressive Episode

## 5 Application: Modeling Diastolic Blood Pressure with the NHANES Data

Tài liệu liên quan