Writing the Introduction to an Epidemiology Paper

This is some brief guidance from my advisor on how to write the introduction section of an epidemiology scientific paper. When addressing previous papers in the introduction, do so only briefly. Generally, save the thorough literature review for the discussion.


Paragraph 1

What is the public health or clinical importance of the topic? What is the primary problem that will be addressed? How many people will be affected? What level of impact does this problem have? Statistics from the World Health Organization are often cited here.


Paragraph 2

What is currently known about the problem?

For example, what has been published on health related quality of life (HRQOL) in type 2 diabetes mellitus (T2DM) patients?

Briefly describe a variety of primary literature papers on the topic. State the lacking knowledge that will be addressed by the rest of the paper.

There is much known about HRQOL in T2DM in populations of White Americans, but there have been no studies to date describing HRQOL in Pacific Islanders diagnosed with T2DM.

Address challenges unique to this study.

Are there variations in HRQOL perceptions among different cultures?


Paragraph 3

Clearly and concisely state the primary aim of this study.

For example, in the current analysis we will study the impact of T2DM on HRQOL in a population of Pacific Islanders living in Oahu, Hawaii.

Say something specific about the population being studied.

The Pacific Islander Cohort of Hawaiians is a longitudinal, population-based cohort that has been ongoing since 1999, with followup every 4 years.

Explain why this study is novel. Tell what you are going to show.

 Hemoglobin A1c (HbA1c) is a validated clinical measure of T2DM severity (citation here), and the SF-36 is a validated health questionnaire measuring HRQOL (citation here). To the extent of our knowledge, this is the first study to study a potential quantitative association between HbA1c and the SF-36 in a population-based cohort of Pacific Islanders.


Mendeley Reference Management

Mendeley is a convenient, free research resource that allows you to manage primary literature references. Mendeley’s Citation Plugin allows easy citations in Microsoft Word while drafting scientific papers from your library.

Online Population-Based Cohort Study

An Internet Survey in a Population-Based Cohort Study

Consumption of ultra-processed foods and cancer risk: results from NutriNet-Santé prospective cohort is a web-based survey looking at the association of cancer risk and consuming ultra-processed foods in people in France who responded to a survey. Population-based cohort studies were previously done by calling people’s landlines, asking them to fill out surveys, and requesting that they drive to the clinic for a health examination.

Perhaps further epidemiological studies will be done primarily using online surveys, as the authors did in this paper. It would make epidemiological studies much less expensive and more readily available. But the validity of the results have not yet been verified.

Using the internet selects for younger people responding to the survey. This may not be representative of the larger population. But as these generations age, using the internet for data collection may be a useful tool.

The internet is an anonymous place, and it is difficult to understand the population that is being studied when using the World Wide Web as the only data collection vehicle. This may be a worth-while sacrifice for the convenience of bypassing what has historically been the most arduous part of studying the public’s health.


Source

NCBI: Consumption of ultra-processed foods and cancer risk: results from NutriNet-Santé prospective cohort

The 7 Deadly Sins of Data Analysis

In her final lecture, my statistics professor described the “7 deadly sins” of statistics in cartoon form. Enjoy


1. Correlation ≠ Causation

Correlation

xkcd: Correlation

CausCorr2_Optimized.jpg

Dilbert: Correlation


2. Displaying Data Badly

Convincing

xkcd: Convincing

Further reading on displaying data badly

The American Statistician: How to Display Data Badly by Howard Wainer

Johns Hopkins Bloomberg School of Public Health: How to Display Data Badly by Karl Broman


3. Failing to Assess Model Assumptions

FailingModelAssumptions.png

DavidMLane.com: Statistics Cartoons by Ben Shabad


4. Over-Reliance on Hypothesis Testing

Null Hypothesis

xkcd: Null Hypothesis

While we’re on the topic of hypothesis testing, don’t forget…

We can fail to reject the null hypothesis.

But we never accept the null hypothesis.


5. Drawing Inference from Biased Samples

DilbertInferences.gif

Dilbert: Inferences


6. Data Dredging

If you try hard enough, eventually you can build a model that fits your data set.

DataDredging_Optimized.jpg

Steve Moore: Got one

The key is to test the model on a new set of data, called a validation set. This can be done by splitting your data before building the model. Build the model using 80% of your original data, called a training set. Validate the model on the last 20% that you set aside at the beginning. Compare how the model performs on each of the two sets.

For example, let’s say you built a regression model on your training set (80% of the original data). Maybe it produces an R-squared value of 0.50, suggesting that your model predicts 50% of the variation observed in the training set. In other words, the R-squared value is a way to assess how “good” the model is at describing the data, and at 50% it’s not that great.

Then, lets say you try the model on the validation set (20% of the original data), and it produces an R-squared value of 0.25, suggesting your model predicts 25% of the variation observed in the validation set. The predictive ability of the model seems to depend on which data set is used; on the training set (R-squared 50%) it is better than on the validation set (R-squared 25%). This is called overfitting of the model to the training set. It gives off the impression that the model is more accurate than it really is. The true ability of the model can only be assessed once it has been validated on new data.


7. Extrapolating Beyond Range of Data

Extrapolating

xkcd: Extrapolating


Similar Ideas Elsewhere

Columbia: “Lies, damned lies, and statistics”: the seven deadly sins

Child Neuropsychology: Statistical practices: the seven deadly sins

Annals of Plastic Surgery: The seven deadly sins of statistical analysis

Statistics done wrong


Sources

xkcd: Correlation

Dilbert: Correlation

xkcd: Convincing

The American Statistician: How to Display Data Badly by Howard Wainer

Johns Hopkins Bloomberg School of Public Health: How to Display Data Badly by Karl Broman

DavidMLane.com: Statistics Cartoons by Ben Shabad

xkcd: Null Hypothesis

Dilbert: Inferences

Steve Moore: Got one

Wiki: Overfitting

xkcd: Extrapolating

Columbia: “Lies, damned lies, and statistics”: the seven deadly sins

Child Neuropsychology: Statistical practices: the seven deadly sins

Annals of Plastic Surgery: The seven deadly sins of statistical analysis

Statistics done wrong

Fixed Effects vs Random Effects Models

What is a fixed effects model? What is a random effects model? What is the difference between them? Many people around me have been using these terms over and over in the past few weeks. I finally compiled several 5-10 min videos of people answering these questions well online.

IndianJDermatol_2014_59_2_134_127671_f3_Vertical

If I had to answer the question of what fixed and random effects models are in one image, I would choose this one from the Indian Journal of Dermatology. Watch the videos and come back to this image for a quick reminder of these concepts.


Motivating Example: Meta-Analysis of Bieber Fever

This silly example is a simplistic demonstration of when fixed and random effects models should be used in designing a meta-analysis. This video is for the medical student and clinician.


Summary of Fixed and Random Effects Models

This summary video is a bit more technical and is aimed at a student of epidemiology or biostatistics.


What is Heterogeneity?

The concept of heterogeneity kept coming up in these videos. How is it different from random chance? This is a clear explanation of the difference that defines concepts alluded to in the previous videos.


Sources

Indian Journal of Dermatology: Understanding and evaluating systematic reviews and meta-analyses

Brian Cohn: Fixed and Random Effects Models and Bieber Fever

Terry Shaneyfelt: Fixed Effects and Random Effects Models

Terry Shaneyfelt: What is Heterogeneity?

Inigo Montoya & Openintro Statistics

I do not think it means what you think it means.

After reading Statistics Done Wrong, there were a couple of resources mentioned in the end of the book. One was a journal article written by a sassy pediatric orthopedist who quotes Inigo Montoya, challenging people to understand p values and to apply and interpret them correctly. The other was an free, open source introductory textbook on statistics, thus allowing people to learn about p values and other statistical concepts.


Sources

Statistics Done Wrong

NCBI

OpenIntro Statistics

Powered by WordPress.com.

Up ↑