Sunday, September 08, 2019

Cox proportional hazards regression model: univariate, multivariate, adjusted, stratified

Cox proportional hazards regression model has been called different names (Cox model, Cox regression model, Proportional hazards model, ... can be used interchangeably). The original paper by D.R. Cox "Regression models and life tables" is one of the most cited papers. Paired with the Kaplan-Meier method (and the log-rank test), the Cox proportional hazards model is the cornerstone for the survival analyses or all analyses with time to event endpoints.

With Cox regression model, we are now able to analyze the time to event data just like the linear regressions and logistic regressions to compare the treatment difference and to investigate the explanatory variables. The explanatory variables may be called independent variables, covariates,  confounding factors,...

Depending on whether the explanatory variables are continuous or categorical and depending on how the explanatory variables are used in the Cox regression model (as covariates, as stratification factors), Cox regression model can be fitted in different ways:

Univariate Cox regression
Unstratified and unadjusted
Multivariate Cox regression
Unstratified and adjusted
Stratified Cox regression
Stratified and unadjusted
Stratified Multivariate Cox regression
Stratified and adjusted

The variables used in adjusted Cox regression can be categorical or continuous, but the variables used in stratified Cox regression should be categorical.

The following are compiled from various sources listed below:



Univariate Cox Proportional Hazards Regression Model (Unstratified Unadjusted Proportional Hazards Regression Model)

The unstratified, unadjusted proportional hazards regression model is more commonly called Univariate Cox proportional hazards regression model and its assumptions are illustrated by:


h (.) hazard function as a function of time (relative to the start date), the patient’s
treatment and the unknown regression parameter
h0(t) unspecified baseline hazard function at time t
xj1 treatment for patient j coded as 0 (placebo group) or 1 (treatment group); this coding leads to a hazard ratio less than 1 if the estimated effect suggests a lower hazard in the treatment group relative to the placebo group
β1 unknown regression parameter for the treatment effect on the log-scale (the log hazard ratio).
The hazard ratio (HR) is a multiplicative constant λ1 = exp(β1) comparing the hazard function in the treatment group relative to the hazard function in the placebo group (identical to the baseline hazard function). A HR less than 1 indicates decreased hazard for the event of interest in the treatment group compared to the placebo group.

SAS PROC PHREG (with the TIES=EXACT option for the “exact” handling of ties) is used to estimate the hazard ratio based on the partial maximum likelihood function; a Wald-test based two-sided CI is requested by the RISKLIMITS option (additional options controlling the output may be added):
PROC PHREG DATA=dataset;
MODEL tte*cnsr(1)=treat / TIES=EXACT RISKLIMITS ALPHA=;
RUN;
* tte represents variable containing event/censoring times;
* cnsr represents censoring variable (0=event, 1=censored);
* treat represents treatment group variable (0=placebo, 1=treatment);

Stratified (Unadjusted) Proportional Hazards Regression

The stratified unadjusted proportional hazards regression model and its assumptions are described by:



h (.) hazard function as a function of time (relative to the start date), the patient’s
stratum and treatment and the unknown regression parameter
h0k(t) unspecified baseline hazard function for stratum k at time t
xjk1 treatment for patient j in stratum k coded as 0 (placebo group) or 1 (treatment group); this coding leads to a hazard ratio less than 1 if the estimated effect suggests a lower hazard in the treatment group relative to the placebo group
β1 unknown regression parameter for the treatment effect on the log-scale (the log hazard ratio).
The hazard ratio is a multiplicative constant λ1 = exp(β1) comparing the hazard functions in the treatment group relative to the hazard functions in the placebo group. The latter are allowed varying across strata. However, the hazard ratio is assumed to be common across strata. An HR less than 1 indicates decreased hazard for the event of interest in the treatment group compared to the placebo group.

SAS PROC PHREG (with the TIES=EXACT option for the “exact” handling of ties) is
used to estimate the hazard ratio based on the partial maximum likelihood function; a Wald-test based two-sided CI is requested by the RISKLIMITS option (additional options controlling the output may be added):
PROC PHREG DATA=dataset;
MODEL tte*cnsr(1)=treat / TIES=EXACT RISKLIMITS ALPHA=;
STRATA strat1 .. stratj;
RUN;
* tte represents variable containing event/censoring times;
* cnsr represents censoring variable (0=event, 1=censored);
* treat represents treatment group variable (0=placebo, 1=treatment);
* strat1 to stratj represent stratification variables;
Adjusted Cox Proportional Hazards Regression Model (including Univariate Cox Proportional Hazards Regression Model and Multivariate Cox Proportional Hazards Regression Model)
The purpose of the model is to evaluate the effect of a single factor (univariate) or simultaneously the effect of several factors (multivariate) on survival. In other words, it allows us to examine how specified factor(s) influence the rate of a particular event happening (e.g., infection, death) at a particular point in time. This rate is commonly referred as the hazard rate. Predictor variable(s) (or factor(s)) are usually termed covariates in the survival-analysis literature.
The Cox model is expressed by the hazard function denoted by h(t). Briefly, the hazard function can be interpreted as the risk of dying at time t. It can be estimated as follow:
where,
·         t represents the survival time
·         h(t) is the hazard function determined by a set of p covariates (x1,x2,...,xp)
·         the coefficients (b1,b2,...,bp) measure the impact (i.e., the effect size) of covariates.
·         the term h0 is called the baseline hazard. It corresponds to the value of the hazard if all the xi are equal to zero (the quantity exp(0) equals 1). The ‘t’ in h(t) reminds us that the hazard may vary over time.

The Cox model can be written as a multiple linear regression of the logarithm of the hazard on the variables xi, with the baseline hazard being an ‘intercept’ term that varies with time.

The quantities exp(bi) are called hazard ratios (HR). A value of bi greater than zero, or equivalently a hazard ratio greater than one, indicates that as the value of the ith covariate increases, the event hazard increases and thus the length of survival decreases.

Put another way, a hazard ratio above 1 indicates a covariate that is positively associated with the event probability, and thus negatively associated with the length of survival.
      Univariate Cox proportional hazards regression model
PROC PHREG DATA=dataset;
MODEL tte*cnsr(1)=treat /
TIES=EXACT RISKLIMITS ALPHA;RUN;

      Multivariate Cox proportional hazards regression model
PROC PHREG DATA=dataset;
MODEL tte*cnsr(1)=treat cov1 .. covk /
TIES=EXACT RISKLIMITS ALPHA;
RUN;
* tte represents variable containing event/censoring times;
* cnsr represents censoring variable (0=event, 1=censored);
* treat represents treatment group variable (0=placebo, 1=treatment);
* cov1 to covk represent covariates other than treatment group;

Stratified Multivariate Cox Regression (Stratified Adjusted Proportional Hazards Regression)

The stratified adjusted proportional hazards regression model and its assumptions are illustrated by:


h (.) hazard function as a function of time (relative to the start date), the patient’s
stratum and covariates, and the unknown vector of regression parameters
h0k(t) unspecified baseline hazard function for patients with covariate vector (0, .., 0) for stratum k at time t
xjk vector of Baseline covariates for patient j in stratum k, with xjk1 representing
treatment coded as 0 (placebo) or 1 (treatment)
β: unknown regression parameter vector on the log-scale (β1 represents the log hazard ratio for treatment).

The hazard ratio for the i-th covariate is a multiplicative constant exp(βi) comparing the hazard functions between the levels of the i-th covariate. Baseline hazard functions are allowed varying across strata without any restrictions. However, the hazard ratio comparing a covariate (including treatment) is assumed to be common across strata. The estimates of βi are adjusted for the other covariates in the model.

SAS PROC PHREG (with the TIES=EXACT option for the “exact” handling of ties) is
used to estimate the hazard ratios based on the partial maximum likelihood function; a Wald-test based two-sided CI is requested by the RISKLIMITS option (additional options controlling the output may be added):
PROC PHREG DATA=dataset;
MODEL tte*cnsr(1)=treat cov1 .. covk /
TIES=EXACT RISKLIMITS ALPHA;
STRATA strat1 .. stratj;
RUN;
* tte represents variable containing event/censoring times;
* cnsr represents censoring variable (0=event, 1=censored);
* treat represents treatment group variable (0=placebo, 1=treatment);
* cov1 to covk represent covariates other than treatment group;
* strat1 to stratj represent stratification variables;

2 comments:

Anonymous said...

Hi, Dr. Deng, in the multivariate cox model, do we have to define the categorical covariates in the CLASS statement?

Anonymous said...

it is always good to include the categorical predictor (or covariate) in the CLASS statement. It is even better to include the 'ref=' option to specify which category will be served as the comparator. See the example below:

https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_phreg_sect048.htm

it is also true that the results may be the same with or without a categorical predictor variable in the CLASS statement if the numeric values are used as categories.