Prospectively Defined.

In the next few weeks Celyad (née Cardio3) will announce their results from Chart 1, the first of two phase 3 trials examining their cell therapy for the heart. Previous entries on this site have expressed skepticism about the chances of success for these trials, and that stance remains unchanged. However with data release imminent, intellectual honesty requires that we set some prospective expectations for the dataset. Such expectations are particularly important for biotech investors, because companies can be very industrious and creative during data analyses.

To their credit, Celyad have published a paper prospectively defining their primary endpoint. They define the primary and secondary objectives:

 

The primary objective of CHART-1 ... is to evaluate the efficacy of cardiopoietic stem cells ... delivered using an endoventricular injection catheter (C-Cathez®) in comparison with a sham procedure on a hierarchical outcome comprising measures of mortality, morbidity, and changes in quality of life, 6MWT distance, functional capacity, and LV structure and function at 39 weeks (9 months) post-procedure.
The secondary objective is to assess safety by comparing the incidence of serious adverse events between study groups at 52 weeks (12 months) and all-cause mortality at 104 weeks (24 months) post-procedure.
Follow-up will occur at 4, 13, 26, 39, 52, and 104 weeks post-procedure. The primary efficacy endpoint is a Finkelstein – Schoenfeld hierarchical composite endpoint compris- ing all-cause mortality, worsening HF (WHF) events, and changes in Minnesota Living with Heart Failure Questionnaire (MLHFQ) score, 6MWT distance, LVESV, and LVEF by transoesophageal echocardiography (TTE) at 39 weeks.

 

Arguably, the secondary objective of the trial will not be readily assessable at the time of the upcoming data analysis. The trial enrollment completed in Q3 of 2015, and with the upcoming primary objective analysis due 9 months from then (~Q2 of 2016), safety at 12 and 24 months will not be available. That leaves us with the responsibility of setting prospective expectations for the primary objective. To provide some additional clarity on what exactly the primary endpoint is, we can add the description from the NIH to our notebook:

 

“Efficacy between groups post-index procedure [ Time Frame: 39 weeks post-index ]
Change between groups from baseline and 39 weeks in a hierarchical composite outcome comprising, from most to least severe outcome, days to death from any cause, number of worsening of heart failure events, change in score for the Minnesota Living with Heart Failure Questionnaire (MLHFQ) (10-point deterioration, no meaningful change,10-point improvement), change in six-minute walk distance (40-m deterioration, no meaningful change, 40-m improvement) and change in left ventricular end systolic volume (15-mL deterioration, no meaningful change, 15-mL improvement), and left ventricular ejection fraction (4% absolute deterioration, no meaningful change, 4% absolute improvement).”

 

This is corroborated by the description in the published paper, wherein the parameters included in the hierarchical analysis are narrowed down more clearly:

  1. Mortality = days alive out of 39 weeks
  2. Number of WHF events: 0, 1, or ≥ 2
  3. MLHFQ ≥10 point improvement, no meaningful change, ≥10 point deterioration
  4. 6MWT ≥40 m improvement, no meaningful change
  5. LVESV ≥15 mL improvement, no meaningful change
  6. LVEF ≥4% absolute improvement, no meaningful change, ≥4% absolute deterioration

Based on a method outlined by Finkelstein and Schoenfeld here, the company aims to analyze the data in a hierarchical manner that gives weight to item 1, which is the most clinically relevant hard endpoint, but also takes input from the softer, longitudinal measures. The Chart-1 trial planners themselves note that these endpoints are not robust:

 

The relevance of the specified changes in MLHFQ, 6MWT, LVESV, and LVEF merits discussion. Some parameters, including the 6 MWT and MLHFQ, are subjective and may be particularly influenced by knowledge of the treatment assignment. The absolute change in MLHFQ score that would indicate a clinical meaningful outcome is not certain. The selected 10-point change excludes chance variability, and was associated with substantially increased risks of death and re-hospitalization in patients with advanced chronic HF followed for 18 months on average. Regarding the 6MWT, a 43 m improvement was found to be statistically significant in the COMPANION study and accompanied by reduced risks of death or HF rehospitalization at 6 months. We have therefore considered a 40 m change as meaningful. Lastly, LVESV and LVEF are more objective measures of response, which, when considered in conjunction with more subjective parameters, provide a clinically valuable readout of regenerative impact. However, when taken alone, in a patient population with baseline NYHA class II/III symptoms and LVESV of 200 mL, it has been suggested that a change of 10 mL is clinically meaningful, informing the CHART-1 design.

 

Whether this hierarchical measurement increases or decreases the chances of a statistically significant outcome is up to the reader. Nonetheless, Finkelstein and Schoenfeld suggest caution when longitudinal measures are included:

 

If the true treatment effect is to improve a longitudinal measure but make survival worse, then a combined efficacy measure may lead to the conclusion that the treatment with worse survival is superior. The fact that mortality is included in the efficacy measure makes this less likely than an analysis that ignores mortality, but it can still happen, especially if the effect on the longitudinal measure is strong and occurs early in the trial.

 

All told, my expectations for a successful Chart-1 trial would have to meet these two criteria:

  1. A statistically significant improvement on the composite endpoint when analyzed on an intent-to-treat (ITT) basis and
  2. An accompanying statistically significant reduction in mortality on an ITT basis

The reason that I would like to see both the overall composite *and* the mortality endpoints improve is to ensure that the soft endpoints (#3-6 above) aren’t driving the outcome. Additionally, the analysis must be completed on an ITT basis.

 

The most important red flag to look for in the data release will be any mention of an analysis based on a per-protocol population rather than an ITT population. Per protocol analyses tend to omit patients who were randomized but unable to undergo the procedure. In this case, it would primarily be patients from whom the cell preparation could not be reliably extracted and/or propagated for reinjection. There is a long history demonstrating the importance of ITT analyses over PP analyses, and this blog won’t try to re-enact those lessons.

An additional red flag will be singular mentions of statistical significance for parameters 2 through 6 listed above, with no mention of (or a trend in) parameter 1. Further still, if the primary endpoint is not explicitly disclosed and there is mention of statistical significance in a subgroup of the Chart-1 trial, that is a red flag and should be accepted as an implicit sign that the primary endpoint has failed.

To readers, this may seem an unnecessarily skeptical view of a data release. As the company has prospectively published their analysis plan for Chart-1, it would seem data release will be a simple matter of completing 9 months of follow up on the last enrollee and publishing the results. However, the company recently noted that a former board of directors member, Prof. William Wijns, will oversee data analysis and dissemination . The company notes that Dr. Wijns is a co-founder of the company.

One has to ask why a prospectively defined statistical analysis plan needs a co-founder and ex-BOD member to be appointed to oversee its execution and dissemination? In any case, being armed with clear prospective expectations of what is and isn’t a success will allow an objective assessment of the technology.