Analysis of cohort studies with Stata and R
Compared to other study designs, cohort studies are somewhat more difficult to analyze because of the presence of timedependent variables, including calendar period, age, time since first exposure, length of exposure, and cumulative exposure. Since individuals typically pass across several categories of these variables, the primary preanalytic step is to calculate persontime at risk of observing events of interest to the correct categories of those variables. This usually amounts to create a new dataset with many records per subject and then to proceed with statistical analysis by calculating one or more frequency and association measures, including incidence/mortality rates, rate ratios (RR) with Poisson regression models, standardized mortality ratios (SMR), or hazard ratios (HR) with Cox regression models. In calculating cumulative exposure (lagged or unlagged, or within a specified timewindow) an added complication is the necessity to link individual data with external exposure data (exposure matrices).
The purposes of the course are to give:
1) a theoretical overview of the analysis of cohort data;
2) practical information on how to do it with available software.
Learning objectives are:
1) to deal with timedependent variables, with a focus on exposure matrices and cumulative exposure;
2) to work with time and dates in a statistical software;
3) to calculate persontime at risk in a correct manner, while avoiding mistakes such as immune/immortal persontime;
4) to illustrate specific commands for persontime calculation and cohort analysis available in statistical software like Stata and R.
