“Highly publicized cases of fabrication or falsification of data in clinical trials have occurred in recent years and it is likely that there are additional undetected or unreported cases. We review the available evidence on the incidence of data fraud in clinical trials, describe several prominent cases, present information on motivation and contributing factors and discuss cost-effective ways of early detection of data fraud as part of routine central statistical monitoring of data quality. Adoption of these clinical trial monitoring procedures can identify potential data fraud not detected by conventional on-site monitoring and can improve overall data quality.”

# Barbershop-Based Healthcare Study Lowers High Blood Pressure in African-American Men

“New England Journal of Medicine: Nearly 64% Reduced Their Blood Pressure to Healthy Levels After Barbers Promoted Follow-Up With Pharmacists in the Barbershops.

“African-American men lowered their high blood pressure to healthy levels when aided by a pharmacist and their barber, according to a new study from the Smidt Heart Institute.”

#### Source

Cedars Sinai: Barbershop-Based Healthcare Study Lowers High Blood Pressure in African-American Men

# MOOC: Learning From Data – Machine Learning

“This is an introductory course in machine learning (ML) that covers the basic

theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applications. It enables computational systems to adaptively improve their performance with experience accumulated from the observed data. ML has become one of the hottest fields of study today, taken up by undergraduate and graduate students from 15 different majors at Caltech. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. The lectures below follow each other in a story-like fashion:

- What is learning?
- Can a machine learn?
- How to do it?
- How to do it well?
- Take-home lessons.
“The 18 lectures are about 60 minutes each plus Q&A.”

#### Source

Caltech: Learning From Data Machine Learning Course by Yaser S. Abu-Mostafa

Textbook: Learning From Data by Yaser S. Abu-Mostafa and Malik Magdon-Ismail

# The Age That Women Have Babies: How a Gap Divides America

#### Source

The New York Times: The Age That Women Have Babies: How a Gap Divides America by Quoctrunk Bui and Claire Cain Miller

# What’s the Point? Centering Independent Variable on Mean in Regression Models

Centering continuous independent variables was one of the earliest lessons in my linear regression class. I was recently asked to explain, “what’s the point?” of going through the trouble of centering? I was at a loss, and realized I had been assuming the answer was obvious when it was not.

After a quick google, this article explained the answer well. In short, centering is useful when interpreting the intercept is important. Here example of age of development of language in infants. Her original article has been copied below.

## Should You Always Center a Predictor on the Mean?

by Karen Grace-Martin

Centering predictor variables is one of those simple but extremely useful practices that is easily overlooked.

It’s almost too simple.

Centering simply means subtracting a constant from every value of a variable. What it does is redefine the 0 point for that predictor to be whatever value you subtracted. It shifts the scale over, but retains the units.

The effect is that the slope between that predictor and the response variable doesn’t change at all. But the interpretation of the intercept does.

The intercept is just the mean of the response when all predictors = 0. So when 0 is out of the range of data, that value is meaningless. But when you center X so that a value within the dataset becomes 0, the intercept becomes the mean of Y at the value you centered on.

What’s the point? Who cares about interpreting the intercept?

It’s true. In many models, you’re not really interested in the intercept. In those models, there isn’t really a point, so don’t worry about it.

But, and there’s always a but, in many models interpreting the intercept becomes really, really important. So whether and where you center becomes important too.

A few examples include models with a dummy-coded predictor, models with a polynomial (curvature) term, and random slope models.

Let’s look more closely at one of these examples.

In models with a dummy-coded predictor, the intercept is the mean of Y for the reference category—the category numbered 0. If there’s also a continuous predictor in the model, X2, that intercept is the mean of Y for the reference category only when X2=0.

If 0 is a meaningful value for X2 and within the data set, then there’s no reason to center. But if neither is true, centering will help you interpret the intercept.

For example, let’s say you’re doing a study on language development in infants. X1, the dummy-coded categorical predictor, is whether the child is bilingual (X1=1) or monolingual (X1=0). X2 is the age in months when the child spoke their first word, and Y is the number of words in their vocabulary for their primary language at 24 months.

If we don’t center X2, the intercept in this model will be the mean number of words in the vocabulary of monolingual children who uttered their first word at birth (X2=0).

And since infants never speak at birth, it’s meaningless.

A better approach is to center age at some value that is actually in the range of the data. One option, often a good one, is to use the mean age of first spoken word of all children in the data set.

This would make the intercept the mean number of words in the vocabulary of monolingual children for those children who uttered their first word at the mean age that all children uttered their first word.

One problem is that the mean age at which infants utter their first word may differ from one sample to another. This means you’re not always evaluating that mean that the exact same age. It’s not comparable across samples.

So another option is to choose a meaningful value of age that is within the values in the data set. One example may be at 12 months.

Under this option the interpretation of the intercept is the mean number of words in the vocabulary of monolingual children for those children who uttered their first word at 12 months.

The exact value you center on doesn’t matter as long it’s meaningful, holds the same meaning across samples, and within the range of data. You may find that choosing the lowest value or the highest value of age is the best option. It’s up to you to decide the age at which it’s most meaningful to interpret the intercept.

#### Source

The Analysis Factor: Should You Always Center a Predictor on the Mean? by Karen Grace-Martin

# 100 years of the FDA

“The 1906 pure food and drug act was set up to protect US citizens from unregulated and potentially harmful products. Implementing the regulation has presented the US Food and Drug Administration with many high-profile challenges, as Fiona Case finds out.”

#### Source

Chemistry World: 100 years of the FDA (2006) by Fiona Case

# Biostatistics vs. Lab Research

“How not to collaborate with a biostatistician. This is what happens when two people are speaking different research languages! My current workplace is nothing like this, but I think most biostatisticians have had some kind of similar experiences like this in the past!”

#### Source

YouTube: Biostatistics vs. Lab Research by JavaMama926

# Reporting Guidelines for Main Study Types

This site has reporting guidelines for all types of studies. These are checklists for writing all parts of a paper on these various study types.

CONSORT is for clinical trials and STROBE is for observational studies.

#### Source

Equator Network: Reporting guidelines for main study types

# Covariance vs. Correlation

Covariance and correlation are two statistical concepts that are closely related, both conceptually and by their name. The excerpts below are from a concise article that differentiates them.

## Difference Between Covariance and Correlation

“Correlation is a special case of covariance which can be obtained when the data is standardised. Now, when it comes to making a choice, which is a better measure of the relationship between two variables, correlation is preferred over covariance, because it remains unaffected by the change in location and scale, and can also be used to make a comparison between two pairs of variables.”

## Key Differences Between Covariance and Correlation

“The following points are noteworthy so far as the difference between covariance and correlation is concerned:

- “A measure used to indicate the extent to which two random variables change in tandem is known as covariance. A measure used to represent how strongly two random variables are related known as correlation.
- “Covariance is nothing but a measure of correlation. On the contrary, correlation refers to the scaled form of covariance.
- “The value of correlation takes place between -1 and +1. Conversely, the value of covariance lies between -∞ and +∞.
- “Covariance is affected by the change in scale, i.e. if all the value of one variable is multiplied by a constant and all the value of another variable are multiplied, by a similar or different constant, then the covariance is changed. As against this, correlation is not influenced by the change in scale.
- “Correlation is dimensionless, i.e. it is a unit-free measure of the relationship between variables. Unlike covariance, where the value is obtained by the product of the units of the two variables.”