Part of Ovarian Cancer Audit Feasibility Pilot (OCAFP) - Profile and treatment report
Appendix 6 - Statistical analysis for geographic variation in treatment
Descriptive statistics
The statistical significance of differences in the crude distribution of treatment groups by patient demographics and tumour characteristics was estimated using the chi-squared test.
Linear probability models
Each of the four binary treatment comparison groups detailed above was added as an outcome variable in a separate linear probability model. Covariates were then introduced as explanatory variables in three stages:
- Model 1: Cancer Alliance and patient age at diagnosis
- Model 2: as Model 1, plus adjustment for differences between Cancer Alliances in the distribution of tumour morphology and tumour stage
- Model 3: as Model 2, plus area deprivation and Charlson comorbidity score.
Linear probability models are equivalent to linear regression with a binary outcome, where standard errors, confidence intervals and p-values are adjusted for heteroskedasticity (residuals that violate the normal distribution assumption due to the outcome for each tumour only taking one of two values). A linear approximation of probabilities when using a binary outcome is considered appropriate when probabilities fall between values of 0.2 and 0.8, representing the range within which a logistic function is largely linear.1 This requirement held for all models under study. Importantly, in contrast to logistic probability models, which are conventionally used in analyses of binary outcome data, linear regression permits the direct comparison of estimates across nested models, allowing readers to assess the impact of adjustment as new covariates are introduced.2
Weighted effect coding3 was applied to each linear probability model such that the sum of all estimates from variable categories reported in each model was equal to zero. Estimates are then interpretable as percentage-point deviations from the sample mean (i.e., from the average probability for the tumour cohort, weighted according to the number of observations within each category reported by the respective model).
Estimates are to be interpreted as percentage point differences from the national average.
Analyses were undertaken using R version 4.2.1.
Notes:
- Zhao L, Chen Y, Schaffner DW. Comparison of logistic regression and linear regression in modeling percentage data. Appl Environ Microbiol. 2001;67(5):2129-35.
- Breen R, Karlson KB, Holm A. Interpreting and Understanding Logits, Probits, and other Non-Linear Probability Models. Annu Rev Sociol. 2018;44:39-54.
- Te Grotenhuis M, Pelzer B, Eisinga R, Nieuwenhuis R, Schmidt-Catran A, Konig R. When size matters: advantages of weighted effect coding in observational studies. Int J Public Health. 2017;62(1):163-167.
Funnel plots
For each binary treatment comparison group, Cancer Alliance estimates from Model 1 (age adjusted) and Model 3 (maximally adjusted) were extracted and presented on funnel plots. Each point on a funnel plot represents a Cancer Alliance. The standard error is shown on the horizontal axis and provides an indication of the number of tumours diagnosed within the Cancer Alliance. Estimates from Cancer Alliances with a greater number of tumours are more precise, appearing further to the right-hand side of the plot. Each Cancer Alliance is plotted with a radius proportional to the inverse of its estimate’s standard error, providing a visual indication as to differences in the size of each plotted Cancer Alliance, as represented by the number of tumours.
The percentage difference in the probability of treatment (overall or a particular combination) is shown on the vertical axis relative to the population average (all tumours combined). A Cancer Alliance with an estimate above the middle line suggests that tumours within the geography were more likely to receive treatment than the population average, with estimates below the line indicating a lower probability.
Two pairs of dashed lines are included on each funnel plot that represent the bounds of statistical confidence around the average value. The inner set of dashed lines represents two standard deviations (SD) from the population average and the outer set represents three SD, being approximately equivalent to 95.0% and 99.7% confidence intervals, respectively. Any observation plotted outside of these dashed lines will have a confidence interval that does not include the average value and may therefore indicate a systematic deviation in clinical practice that warrants further investigation. However, some random variation in the probability of treatment is expected between regions such that some points will sit outside the dashed lines through chance alone. This should be taken into consideration when interpreting funnel plots (for example, five out of every 100 observations are likely to lie outside the two SD funnel).
Last edited: 25 May 2023 11:52 am