Back to Agenda
Part 5: Multiplicity and Data Mining
Session Chair(s)
James Whitmore, PhD, MS
Vice President, Biometrics
Kite Pharma, United States
Multiple comparisons arise when, within a study, we want to compare multiple treatment groups on a single endpoint, compare two treatment groups on multiple endpoints, or compare two treatment groups on an endpoint at multiple evaluation time points. They can also be results of conducting interim analysis or comparing treatments within important subgroups that will affect the claim in the target package insert. More important, multiple comparisons can result from a combination of the above scenarios. In this session, we will examine challenges associated with multiple comparisons and what one can do to address such challenges. “Data mining” has become a hot topic of late in many different fields. From telecommunications to retail sales to online merchandising to pharmaceuticals, researchers, and decision makers are looking toward something new and exciting called “data mining” to take them to places where they have never been. However, data mining is nothing more than statistics, using data to answer questions. In this module, we will explore three areas that separate data mining exercises from controlled experiments. We will examine statistical techniques commonly used in data mining. We will also examine what can and cannot be done from a valid scientific perspective.
- Situations where multiple comparisons arise; why bother?
- The impact of multiple comparisons on the study-level false positive rate
- Common approaches to adjust for multiple comparisons
- Bonferroni approach
- Holm and Hochberg approach
- Hierarchy (step-down, Gatekeeping) approach
- Hailperin-Ruger approach
- Re-sampling approach
- Dunnett’s and Hierarchy approach
- Multiple comparisons as a result of interim analysis
- O’Brien and Fleming approach
- Pocock and Haybittle-Peto approach
- Alternative approaches to controlling for multiplicity, including composite endpoints, global tests and re-randomization approaches
- Data mining versus controlled experiments
- Primary vs secondary uses of data
- Sizes of data sets
- Disparate data sources
- Data wrangling
- “Supervised” learning (prediction) and “Unsupervised” learning
- Data mining of safety data
- Examples
Exercise
Have an account?