What does it mean to say that a relationship between two variables is spurious?


Answer the questions below to the best of your ability. Type your responses in the space provided below each question.

Explain your answers thoroughly and precisely to demonstrate command of the material.


  you will need access to Stata and the bitdata.dta file with associated codebook. Note that all statistical analysis must be done using Stata. You will see the appropriate variable names in the exam questions.

I. Short Answer

1.  Imagine that you wanted to research the role of greed in whether individuals are likely to be prejudiced towards minority social groups in the United States. In a paragraph, explain how you might conduct either an experiment or an observational study of this research question. In your answer, briefly explain how you would set-up the experiment or observational study. In addition, spend most of your answer explaining the specific strengths and weaknesses of your approach (experiment or observational study) to the research question. Remember to include some of the issues we read about or discussed on this topic this semester, but apply them directly to your specific research approach.

II. Reading and Critically Thinking

Answer the two questions below based on this short article.

Access to Nature Trails Helps Combat Childhood Obesity, Research Shows

A new study might have found a solution to the growing problem of childhood obesity. The researchers found that counties with more non-motorized nature trails and forest lands have higher levels of youth activity and lower youth obesity, while counties with more nature preserves have lower activity levels.

“More non-motorized nature trails available for use in a particular county lead to an increase in the physical activity rates as well as lower youth obesity rates,” Sonja Wilhelm Stanis, an associate professor of recreation and tourism in the Missouri College of Agriculture said in a press statement. This was in contrast to counties with more nature preserves, which showed decreased levels of physical activity among youth, and parklands, which did not show any relationship with obesity levels and physical activity of youth. Overall, this research shows how local policymakers can impact the health of their youth through land-use decisions.”

For the study, researchers analyzed data from every county in Minnesota. They compared youth activity rates and youth obesity rates to the amount of public non-motorized nature trails, motorized nature trails, nature preserves, parklands and forest land available. The researchers also found that though public land was associated with higher activity rates, there was no association between parklands and activity levels or obesity rates.

2a.  What is (one) of the central causal claim of the article above? Be sure to identify clearly the independent (IV) and dependent (DV) variables.

2b. To what degree is the causal claim you identified above justified based on the evidence presented? Explain your answer with specific reference to the criteria/hurdles for causality that we learned this semester.

III. Multiple Choice

Select the one correct answer for the questions below.

3. ___ What does it mean to say that a relationship between two variables is spurious?

  1. The relationship is so complicated that research into the relationship is futile.
  2. The relationship that seems to be true in a bivariate examination is not, in fact, the true relationship.
  3. The relationship is causal.
  4. The relationship is not scientifically interesting.

4. ___ Imagine you want to conduct a bivariate test of significance. Your dependent variable is measured as a continuous variable, and your independent variable is measured as a binary independent variable. Assuming you do not want to recode the variables, which test do you conduct?

  1. Tabular analysis with chi-squared test of significance.
  2. Pearson’s r with a t-statistic.
  3. Difference of means with a t-test.
  4. Flux capacitor with a warp drive test.

5. ___ Which of the following accurately describes content validity for measuring a concept?

  1. Degree to which the measure is related to other measures that the theory requires them to be related to.
  2. Degree to which a measure contains all of the critical elements, that, as a group, define the concept we wish to measure.
  3. Extent to which applying the same measurement rules to the same case or observation will produce identical results.
  4. None of the above.

6. ___ Which of the following accurately describes a categorical variable?

  1. A one-unit increase in the variable value always means the same thing.
  2. Different values mean different things.
  3. The values are ordered from least to greatest.
  4. Both b and c are accurate.

7. ___ The central limit theorem states and implies:

  1. There is no difference between a random sample distribution of values (regardless of sample size) and the true population values.
  2. A sampling distribution for a variable is normally distributed only if the underlying population is distributed normally for that variable.
  3. Sampling distributions are often observed in real life.
  4. If one were to collect an infinite number of random samples and plot the resulting sample means, those sample means would be distributed normally around the true population mean.

IV. Interpreting Quantitative Research

A researcher hypothesizes that for individuals, a higher level of education is related to his or her support for open, international trade. Based on a survey of 150 people, she finds the following frequencies:

Table 1: Surveyed Relationship between Education and Support for Free Trade as Frequency

 Level of education 
Attitude on free tradeLowHighTotal

8a.  Is there any evidence that education is related to support for free trade? Explain.

8b. What is the appropriate statistical test for evaluating hypotheses about the relationship between these variables in the population?

8c. A researcher conducts a statistical test to evaluate this bivariate relationship. The test yields a p-value of 0.077. What specifically does that p-value score indicate? How do you interpret that in relationship to the proposed hypothesis?

V. Applied Quantitative Research and STATA

Use the bitdata.dta dataset and codebook for questions 9 and 10. Paste any Stata code you use to answer these questions at the end of this exam sheet.

9a. What is the unit of analysis in these data?

9b. Is the level of FDI inflows more heterogeneous (ie more variable) in host states (fdi_host) or home states (fdi_home)? Cite relevant statistics to justify your answer.

9c. Using an appropriate visual and two or three sentences, describe the data on government spending as a share of GDP (min_govexp).  Be sure to cite the most appropriate summary statistics for this variable.

[paste appropriate visual here]

[keep scrolling for final questions]

The variable fdi_home records FDI outflows (% GDP). The variable month_count records the number of months to ratify the BIT. minpolity and log_gapgdppc record the minimum democracy score and the gap between signees in per capita GDP respectively. comcol records whether the countries share a common colonial heritage.

H0: There is no relationship between FDI outflows (%GDP) and months to ratification of the BIT.

HA: An increase in FDI outflows is associated with a decrease in the number of months to ratify the BIT.

10a. (25 pts) Use linear regression to estimate two models. First, estimate the effect of FDI outflows on the time to ratification (Model 1 below). Then estimate the same relationship conditional on colonial heritage, level of democracy and relative wealth (Model 2). Use the estimates to fill in the table below (round to the 2nd decimal place).

 DV: Months to ratification
 Model 1Model 2
Independent VariablesEstSEEstSE
FDI outflows (%GDP)    
Democracy score    
Relative GDP per capita    
Common colonial tie    

OLS estimates with standard errors. * p < 0.05, ** p < 0.01, *** p < 0.001

10b.  Interpret the results of the bivariate estimate (Model 1). Be sure to explain the estimate itself and interpret its significance. [Bonus points if you can also explain the results in plain language based on the actual units of the dependent variable.]

10c.  Interpret the effect of FDI outflows in Model 2. Again, present the estimate and interpret the result of the significance test.

10d. What are some potential reasons that the multivariate estimate is different from the bivariate estimate? Which model is a “better fit” (more of the variation in the outcome is explained) and how do you know?

REPLICATION FILE: paste below the Stata commands you used to complete the exam.