You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.

A Systematic Review of Outcome Reporting, Definition and Measurement Heterogeneity in Non-Muscle Invasive Bladder Cancer Effectiveness Trials of Adjuvant, Prophylactic Treatment After Transurethral Resection

Abstract:

BACKGROUND:

Heterogenous outcome reporting in non-muscle-invasive bladder cancer (NMIBC) effectiveness trials of adjuvant treatment after transurethral resection (TURBT) has been noted in systematic reviews (SRs). This hinders comparing results across trials, combining them in meta-analyses, and evidence-based decision-making for patients and clinicians.

OBJECTIVE:

We aimed to systematically review the extent of reporting and definition heterogeneity.

METHODS:

We included randomized controlled trials (RCTs) identified from SRs comparing adjuvant treatments after TURBT or TURBT alone in patients with NMIBC (with or without carcinoma in situ) published between 2000–2020. Abstracts and full texts were screened independently by two reviewers. Data were extracted by one reviewer and checked by another.

RESULTS:

We screened 807 abstracts; from 15 SRs, 57 RCTs were included. Verbatim outcome names were coded to standard outcome names and organised using the Williamson and Clarke taxonomy. Recurrence (98%), progression (74%), treatment response (in CIS studies) (40%), and adverse events (77%) were frequently reported across studies. However, overall (33%) and cancer-specific (33%) survival, treatment completion (17%) and treatment change (37%) were less often reported. Quality of Life (3%) and economic outcomes (2%) were rarely reported. Heterogeneity was evident throughout, particularly in the definitions of progression and recurrence, and how CIS patients were handled in the analysis of studies with predominantly papillary patients, highlighting further issues with the definition of recurrence and progression vs treatment response for CIS patients. Data reporting was also inconsistent, with some trials reporting event rates at various time-points and others reporting time-to-event with or without Hazard Ratios. Adverse events were inconsistently reported. QoL data was absent in most trials.

CONCLUSIONS:

Heterogenous outcome reporting is evident in NMIBC effectiveness trials. This has profound implications for meta-analyses, SRs and evidence-based treatment decisions. A core outcome set is required to reduce heterogeneity.

PATIENT SUMMARY:

This systematic review found inconsistencies in outcome definitions and reporting, pointing out the urgent need for a core outcome set to help improve evidence-based treatment decisions.

INTRODUCTION

Description of the condition

Bladder cancer is the 6th commonest male, and 17th commonest female cancer globally, with the highest incidence rates being observed in Europe and North America [1]. The disease is categorised into two broad stage groupings, non-muscle invasive (NMIBC) and muscle-invasive (MIBC) bladder cancer. Most cases (75– 85%) present as NMIBC and these patients typically have a higher long-term survival and a lower cancer specific mortality compared to those with MIBC [2].

NMIBC is defined as tumour(s) confined to the mucosa or invading the lamina propria [3]. Using the TNM staging system, they are classified as Ta-T1 or Tis (or Cis) N0 M0 [4]. NMIBC tumours may be graded using the WHO 1973 or WHO 2004 grading systems – both indicating worse prognosis with increasing grade. Most patients diagnosed with NMIBC are initially treated conservatively (sparing the bladder) with curative intent by transurethral resection of bladder tumour (TURBT). NMIBC is seen as a chronic disease requiring frequent follow-up and repeated TURBTs, making it the most expensive of all cancers to treat from diagnosis to death [5– 8] with additional productivity losses and informal care costs [9]. Cumulative costs of care are especially high in intermediate- and high-risk NMIBC due to higher risk of progression to MIBC requiring definitive treatment [7].

Given the high recurrence rates and the risk of progression to MIBC, NMIBC treatment usually involves adjuvant intravesical instillations with che-motherapy or immunotherapy. The timing, treatment duration, and choice of agent for intravesical therapy is guided by a risk categorisation system which is based upon clinical and pathological factors [3]. For instance, evidence from high quality systematic reviews and meta-analyses shows that a single immediate post-operative instillation of chemotherapy (IPOIC) is well tolerated and clinically effective in reducing recurrences in low risk patients [10– 12]. The European Association of Urology (EAU) [3] and the National Institute for Clinical and Healthcare Excellence (NICE) [13] both recommend that eligible patients receive IPOIC. It is considered cost effective for the NHS [13]. Intermediate risk patients may also be given repeated chemotherapy instillations, but their optimal timing and frequency remain undefined [14]. It is recommended that high risk patients are treated with intravesical bacillus Calmette-Guerin (BCG) immunotherapy or be considered for immediate cystectomy [3]. Five-year recurrence and progression rates for patients with stage Ta-T1 bladder cancer treated with 1 to 3 years maintenance BCG are 28– 51% and 7– 20%, respectively [15].

Why it is important to do this review

Inconsistent outcome reporting (different outcomes in different trials) and variability in outcome reporting (same outcomes reported, but different definitions used) become acutely evident when many bladder cancer trials are included in systematic reviews of intervention effectiveness [16– 18]. Outcome reporting heterogeneity has been highlighted as a concern within evidence-based medicine generally, [19– 22] and has been emphasised as an area for improvement in NMIBC trials by the International Bladder Cancer Group [23]. Heterogeneous outcome reporting and the potential for selective outcome reporting bias in NMIBC trials hinder comparing and contrasting the results of individual trials as well as the publication of unbiased systematic reviews and meta-analyses of the evidence base. As a consequence, making evidence-based recommendations in clinical practice guidelines, translating them into health care policy, and decision-making by clinicians and patients are all hampered.

Developing a core outcome set (COS) is a solution to reduce outcome heterogeneity, selective outcome reporting bias, and helps to ensure that all trials contribute useable information to the evidence base. A COS is an agreed standardised collection of outcomes which should be measured and reported, as a minimum, in all trials for a specific clinical area [22]. Our group has registered a bladder cancer COS development project (B-COS) with the Core Outcome Measures for Effectiveness Trials initiative COS register (http://www.comet-initiative.org/studies/details/1135), with the intent to create separate COS for three broad categories of disease: NMIBC, MIBC, and metastatic BC. Within each COS we define the scope with regards to the applicable populations and treatments. After defining the scope of a COS, the next step is to identify existing knowledge regarding outcomes. To meet this requirement, we have aimed to systematically review the outcomes reported in NMIBC effectiveness trials. Our systematic review protocol was registered with PROSPERO (https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=91820). The reviews for the other parts of the project will be reported separately as will the subsequent phases of the COS development projects, involving qualitative interview studies with patients, and consensus studies with key stakeholders such as patients and healthcare professionals using Delphi methods to come to consensus on the core outcomes to be measured in future bladder cancer effectiveness trials and audits.

METHODS

Aims and objectives

The aim was to systematically review outcomes reported in NMIBC effectiveness trials of adjuvant, prophylactic treatment after TURBT.

The objectives were to systematically review:

• 1. Outcomes reported

• 2. Outcome definitions (including time points)

• 3. Outcome assessment methods

Eligibility criteria

Types of studies

We included phase III randomised controlled trials (RCTs) comparing different adjuvant instillation treatments after TURBT or trials with TURBT alone as a control arm. We limited to RCTs included in systematic reviews of intervention effectiveness as a pragmatic and efficient way to identify studies and overview potentially important outcomes. This is a strategy that has been used in published systematic reviews of outcome reporting heterogeneity where the aim is to overview outcome reporting heterogeneity rather than to find every outcome previously reported [22, 24]. All pre phase III trials and all non-randomised designs were excluded. Studies reported only as abstracts were excluded a priori because it was unlikely that all outcomes would be reported in the abstract, and that they would also not provide enough information on the definition and measurement of outcomes reported.

Types of participants

We included studies with adult (≥18 years) males and females with histologically confirmed urothelial NMIBC, stage Ta or T1 N0 M0, with or without carcinoma in situ (CIS), and all tumour grades (using any grading system). Studies including paediatric patients and patients with MIBC, clinical N + or M + were excluded unless outcomes were separately reported and defined for NMIBC patients.

Types of interventions and comparators

We included RCTs comparing any type of intravesical adjuvant prophylactic treatments after TURBT and RCTs comparing intravesical treatment after TURBT versus TURBT alone. Studies of oral vitamins or mineral supplements were excluded.

Types of outcomes

We report on all outcomes related to clinical effectiveness including, for example, outcomes related to recurrence, progression, survival and cause of death, local and systemic adverse events and quality of life/patient reported outcomes. Outcome definitions, timepoints, and assessment methods are also reported.

We do not report any estimates of treatment effect for any individual trials and there was no attempt to synthesise aggregated quantitative data.

Literature search

The literature search was undertaken by an exper-ienced information specialist (CY) using the search criteria specified in Appendix 1. Medline, Embase and Cochrane Database of Systematic Reviews (CDSR) were searched for relevant systematic re-views. We also hand-searched the reference sections of relevant international clinical practice guidelines. We restricted to systematic reviews and RCTs published after 2000 to reflect outcomes reported in the current clinical practice. We excluded non-English studies as a pragmatic consideration due to resource restrictions.

An update search was done on 15th January 2020.

Data collection and analysis

Selection of studies

Following de-duplication, at least two review authors (DC, SM, SS, IO, EV, RC) independently screened the titles and abstracts of identified systematic reviews for eligibility. The full texts of all potentially eligible publications were retrieved and screened independently by two review authors (DC, SM, SS, IO, EV, RC) using a standardised form, linking together multiple records of the same study in the process. Any disagreements were resolved by discussion or by consulting a senior review author (RS). Once the list of systematic reviews meeting the inclusion criteria were finalised, a second screening process was initiated whereby the studies included in the systematic reviews were screened against our inclusion criteria. Where lists of studies excluded from the systematic reviews were available, we also screened these in case the studies had been excluded for not reporting on outcomes of interest. In such instances the trial may still have met inclusion criteria for our review. The study selection process is described in the PRISMA flow diagram (Fig. 1) [25].

Fig.1

Preferred Reporting Items for Systematic Reviews (PRISMA) diagram of studies. SR, systematic review; RCT, randomized controlled trial.

Data extraction and management

A standardised data extraction form was developed and piloted. One review author extracted data and a second review author checked data extractions for accuracy (DC, SM, SS, IO, EV, RC). Any disagreements were resolved by discussion or by consulting a third review author.

Data that were extracted included: the study design; countries and institutions where the data were collected; dates defining start and end of patient recruitment and follow-up; how intervention comparator groups were formed; participant demographic and clinical characteristics; eligibility criteria for participants; the numbers of participants who were included in the study, assigned to each intervention comparator group; description of interventions; study funding sources; and ethical approval. All primary and secondary effectiveness outcomes reported, their definitions, and any outcome measurement instruments used were extracted verbatim.

Assessment of risk of bias in included studies

Risk of bias assessment is not necessary for systematic reviews undertaken for COS development. Some outcomes may be at risk of detection bias depending on whether they are relatively subjective or objective. Although these aspects were extracted under the ‘definition’ or ‘measurement’ fields in the data extraction form, this is out of the scope of this phase of our project. They will be investigated in a subsequent phase whereby we will assess the psychometric properties of the various outcome measurements and seek consensus on the most appropriate and feasible definitions and measurements [26, 27].

Data synthesis

Verbatim outcome names were recoded to common names. This was done by categorising outcomes referring to the same underlying constructs under a common term. For example, “survival rates”, “overall survival”, “number of deaths at median follow up” and “mortality rate” all refer to the concept of ‘overall survival’ and were coded as such. The outcome and domain coding process was inductive and iterative. Coded outcomes were further grouped in broader domains using the standardised Williamson and Clarke Taxonomy (W/C Taxonomy) [28].

EVIDENCE SYNTHESIS

Characteristics of the included studies

Our initial search for relevant systematic reviews yielded 807 abstracts, of which 639 remained after removing duplicates. In total, 100 full-text SRs were assessed and 19 SRs, including 14 meta-analyses, were included. Four SRs included only previously identified RCTs and these SRs were not utilised further (Supplemental Table 1). From 15 SRs published between years 2010– 2018, 106 full-texts of RCTs were screened and 57 eligible RCTs were finally included (see PRISMA flow diagram, Fig. 1).

An overview of the included studies’ populations, stage and grade, instillation treatments and number of outcome domains reported is shown in Table 1. Overall, 32 studies included patients with papillary only tumors, while 25 studies included a mixed population of patients with CIS with/without papillary tumors. There were 11 “single-instillation” trials, 12 “single instillation followed by induction course” trials, 27 “maintenance instillation” trials and 7 trials comparing instillations with different schedules.

Table 1

Baseline characteristics of the study population, instillation treatments and outcome domains

 Bladder cancer morphology Studies POPULATION INSTILLATION TREATMENT (S-single; I- induction; M-maintainance) Number of outcomes domains (n/10) pTa pT1 CIS as authors have reported low grade high grade low grade high grade primary cis secondary cis concomitant cis CIS+/- PAPILLARY Lamm 2000 x x x x x x M 7 Palou 2001 G3 G3 x x M 6 Au 2001 x x x x x x I 2 Sekine 2001 x x x x x x x I 5 Martinez-Pinneiro 2002 G2G3 G1 G2G3 x x I 7 Di Stasi 2003 x x x x M 7 Kaasinen 2003 x x x x x x x M 5 Martinez-Pinneiro 2005 G3 G3 x x I 4 de Reijke 2005 x x x x x x x M 5 Di Stasi 2006 G2G3 x M 6 Gårdmark 2007 x x x x x x M 4 Cai 2008 G2G3 G2 x M 4 Neple 2010 x x x x x x M 2 Porena 2010 G3 G3 x x M 3 Koga 2010 x x x x M 5 Gülpınar 2012 x x x x x x I 4 Järvinen 2012 x x x M 5 Sengiku 2013 x x x x x x x I 3 Inamoto 2013 x x x x x x x I 4 Rentsch 2014 x x x x x I 5 Hemdan 2014 G2G3 x M 4 Martinez-Pineiro 2015 G1G2 + cis G3 G1G2 + cis G3 x I vs M 5 Solsona 2015 G1 + cis G2G3 G1 G2G3 x x I 5 Arends 2016 x x x x x x x M 4 Nakai 2016 x x x x x x I vs M 5 PAPILLARY Kaasinen 2000 G1G2 G1G2 M 2 Bilen 2000 x I 4 Van der Meijden 2001 x x x x M 4 Nomata 2002 G1G2 G1G2 M 2 Okamura 2002 x x x x S 3 Rajala 2002 x x x x S 1 Kuroda 2004 G1G2 G1G2 M 4 Koga 2004 x x x x I vs M 2 Mitsumori 2004 x x x x I 2 Cheng 2005 x x x x M 5 Vijjan 2006 x x x x I 5 Hinotsu 2006 x x x x I vs M 3 Barghi 2006 G1G2 G1 S 3 Ojea 2007 G2 G1G2 M 4 El-Ghobashy 2007 G1G2 G1G2 S 4 Agrawal 2007 x x x x M 2 Friedrich 2007 x x x x M 2 Hendricksen 2008 x x x x M 3 Berrum-Svennung 2008 G1G2 G1G2 S 3 Isbarn 2008 x x x x I vs M 2 Böhle 2009 x x x x S 4 Gudjonsson 2009 G1G2 G1G2 S 1 Järvinen 2009 x x x x M 5 Seretta 2010 G1G2 G1G2 I vs M 3 Sylvester 2010 x x x x M 6 De Nunzio 2011 G1G2 S 4 Di Stasi 2011 x x x x S 5 Hinotsu 2011 x x x x I vs M 4 Oddens 2013 x x x x M 6 Huang 2015 x x x x M 4 Onishi 2017 G1G2 G1G2 S 3 Bijalwan 2017 x x x x S 3

In all studies, patients were followed up at regular intervals in the same and largely accepted manner: urinary cytology, cystoscopy and if necessary, by taking biopsies from the urinary bladder [3].

Heterogeneity in outcome reporting, detection, and definitions

The outcomes were organised into the 10 domains in the W/C taxonomy [27]: “recurrence”, “progression”, “treatment response” (for CIS), “cancer-specific survival”, “overall survival”, “adverse events”, “completion/adherence”, “treatment failure/change of treatment”, “quality of life” and “health economics” (Table 2).

Table 2

Outcome domains reported in 57 randomised controlled trials classified using the Williamson and Clarke Taxonomy

 Bladder cancer morphology Studies CLINICAL DEATH ADVERSE EVENTS LIFE IMPACT RESOURCE USE TUMOR RELATED OUTCOMES SURVIVAL DELIVERY OF CARE GLOBAL QUALITY OF LIFE ECONOMIC Recurrence Progression Treatment response (for cis) Overall survival Cancer-specific survival Adverse events Completion/adherence Treatment failure/change of treatment reported (RC,RT) Quality of life Health Economics CIS+/- PAPILLARY Lamm 2000 x x x x x x x Palou 2001 x x x x x x Au 2001 x x Sekine 2001 x x x x x Martinez-Pinneiro 2002 x x x x x x x Di Stasi 2003 x x x x x x x Kaasinen 2003 x x x x x Martinez-Pinneiro 2005 x x x x de Reijke 2005 x x x x x di Stasi 2006 x x x x x x Gårdmark 2007 x x x x Cai 2008 x x x x Neple 2010 x x Porena 2010 x x x Koga2010 x x x x x Gülınar 2012 x x x x Järvinen 2012 x x x x x Sengiku 2013 x x x Inamoto 2013 x x x x Rentsch 2014 x x x x x Hemdan 2014 x x x x Martinez-Pineiro 2015 x x x x x Solsona 2015 x x x x x Arends 2016 x x x x Nakai 2016 x x x x x PAPILLARY Kaasinen 2000 x x NA x Bilen 2000 x x NA x x Van der Meijden 2001 x x NA x x Nomata 2002 x NA x Okamura 2002 x x NA x Rajala 2002 x NA Kuroda 2004 x NA x x x Koga 2004 x NA x Mitsumori 2004 x NA x Cheng 2005 x x NA x x x Vijjan 2006 x x NA x x x Hinotsu 2006 x x NA x Barghi 2006 x x NA x Ojea 2007 x x NA x x El-Ghobashy 2007 x x NA x Agrawal 2007 x NA x Friedrich 2007 x NA x Hendricksen 2008 x x NA x Berrum-Svennung 2008 x x NA x Isbarn 2008 x NA x Böhle 2009 x x NA x x Gudjonsson 2009 x NA Järvinen 2009 x x NA x x x Seretta 2010 x x NA x Sylvester 2010 x x NA x x x x De Nunzio 2011 x x NA x x Di Stasi 2011 x x NA x x x Hinotsu 2011 x x NA x x Oddens 2013 x x NA x x x x Huang 2015 x NA x x x Onishi 2017 x x NA x Bijalwan 2017 x x NA x TOTAL (n/%) 56/57 (98%) 42/57 (74%) 10/25(40%) 19/57 (33%) 19/57 (33%) 44/57 (77%) 10/57(17%) 21/57(37%) 2/57 (3 %) 1/57 (2 %) Number of individual verbatim outcomes 35 20 14 10 11 36 14 6 2 2

As seen in Table 2, tumor related outcomes such as recurrence (98%), progression (74%), treatment response (in CIS studies) (40%), and adverse events (77%) were frequently reported across studies. However, overall (33%) and cancer specific (33%) survival, treatment completion (17%) and treatment change (37%) were less often reported. Quality of Life (3%) and economic outcomes (2%) were rarely reported.

Tumor related outcomes

The heterogeneity in the definition and reporting of recurrence and progression in studies that recruited patients with papillary tumors only, and also treatment response in patients with CIS with or without papillary tumors, are shown in Tables 3 and 4, respectively.

Table 3

Definitions and reporting of bladder cancer recurrence and progression in RCTs with papillary-only tumors. RFS, recurrence-free survival; NMIBC, non-muscle invasive bladder cancer; MIBC, muscle-invasive bladder cancer; PFR, progression-free survival; DFR, disease-free survival

 Study ID RECURRENCE PROGRESSION DEFINITION Time to recurrence RFS Recurrence rate 2 yr recurence-free rates 3 yr recurence-free rates Recurrence per year (n) NMIBC recurrence (pTa,pT1) Early recurrence (0-2 yrs) Late recurrence (> 2 yrs) 50 % recurrence-time (days) Recurrence index/100 patients per month Recurrence-rate reduction T-stage Grade MIBC Metastases Progression rate Progression rate at the time of first recurrence Time to progression to MIBC Time to progression to distant metastasis 5-yr PFR Progression as the first event included as recurrence? Progression-free survival Kaasinen 2000 x x x Bilen 2000 x x ≥pT2 x x Van der Meijden 2001 x x x x x x x Nomata 2002 x NA Okamura 2002 x x x x Rajala 2002 x x NA Kuroda 2004 x x NA Koga 2004 x x x NA Mitsumori 2004 x x NA Cheng 2005 x x x x x x x Vijjan 2006 x x Ta->T1, T1->MIBC x Hinotsu 2006 x x x x ≥pT2 x x Barghi 2006 x x x Ojea 2007 x x x x x x x El-Ghobashy 2007 x x x x x ≥pT2 x x Agrawal 2007 x NA Friedrich 2007 x x x x NA Hendricksen 2008 x x x ≥pT2, cis x x x x x x Berrum-Svennung 2008 x x x Isbarn 2008 x x x Böhle 2009 x x x x x Gudjonsson 2009 x x x Järvinen 2009 x x ≥pT2 x x x x Seretta 2010 x x x x x ≥pT2 x x Sylvester 2010 x ≥pT2 x x x x x De Nunzio 2011 x x (0-1yrs) x (> 1yr) x ≥pT2 x Di Stasi 2011 x x ≥pT2 x x x x Hinotsu 2011 x x x ≥pT2 x x x x Oddens 2013 x x 5yr DFR ≥pT2, cis x x x x x Huang 2015 x x x NA Onishi 2017 x x x not specified x x x Bijalwan 2017 x x Ta->T1 LG->HG x
Table 4A

Definitions and reporting of A) recurrence, B) progression and C) treatment response in patients with CIS with or without papillary tumours

 RECURRENCE Study ID Recurrence rate Disease-free interval Recurrence-free survival Time to recurrence 5-year Recurrence-free survival 1-year Recurrence-free survival Interval before recurrence Recurrence rate at 5 years Regression of grade/stage Worsening-free survival (mo) Low-grade relapse High-grade superficial relapse Lamm 2000 x x x x Palou 2001 x x x Au 2001 x x Sekine 2001 Martinez-Pinneiro 2002 x x x x Di Stasi 2003 x x x Kaasinen 2003 x Martinez-Pinneiro 2005 x x de Reijke 2005 di Stasi 2006 x (for cis pts) x Gårdmark 2007 Cai 2008 x x x x Neple 2010 x x Porena 2010 x x Koga 2010 x Gülpınar 2012 x x x Järvinen 2012 x x Sengiku 2013 x x Inamoto 2013 x x Rentsch 2014 x Hemdan 2014 x x Martinez-Pineiro 2015 x x x Solsona 2015 x Arends 2016 x Nakai 2016 x
Table 4B
 PROGRESSION DEFINITION Study ID T-stage Grade MIBC Metastases Progression rate Time to progression Progression-free time Progression-free survival 5 year progression-free survival Other? Not defined Progression as the first event included as recurrence? Lamm 2000 ≥pT2 Palou 2001 x x x Au 2001 NA Sekine 2001 x x x Martinez-Pinneiro 2002 ≥pT2 x x x x for cis -> extravesical extension Di Stasi 2003 x x Kaasinen 2003 ≥pT1 x x Martinez-Pinneiro 2005 ≥pT2 x x de Reijke 2005 ≥pT2 x di Stasi 2006 x x x x Gårdmark 2007 Ta->T1; x x (stage) NA T1->T2 Cai 2008 x x Neple 2010 x Porena 2010 x NA Koga 2010 x x x Gülpınar 2012 x x Järvinen 2012 ≥pT2 x x x x Sengiku 2013 NA Inamoto 2013 x Rentsch 2014 x x x x Hemdan 2014 x x x Free of progression (%) Martinez-Pineiro 2015 x x x x Solsona 2015 x x x x Arends 2016 x x Nakai 2016 x x x
Table 4C
 Carcinoma in Situ Response Study ID Complete response Partial response no Cis, but Ta/T1 persists No change Cis or Ta/T1 persists No Cis, no Ta/T1 at 9 mo no progression; at 12 mo no recurrence Complete response in cis patients Time to recurrence in complete responders Recurrence after complete response First recurrence type after complete response (papillary, cis, papillary + cis) Lamm 2000 x x Palou 2001 Au 2001 Sekine 2001 x x Martinez-Pinneiro 2002 Di Stasi 2003 x Kaasinen 2003 x Martinez-Pinneiro 2005 de Reijke 2005 x x x x di Stasi 2006 x Gårdmark 2007 Cai 2008 x Neple 2010 Porena 2010 Koga 2010 x Gülpınar 2012 Järvinen 2012 Sengiku 2013 x Inamoto 2013 Rentsch 2014 Hemdan 2014 Martinez-Pineiro 2015 Solsona 2015 Arends 2016 x Nakai 2016

Recurrence

Recurrence was reported in 56 (98%) of 57 trials (Tables 1,3,4), with 35 different verbatim names (Table 5), often related to the definition. The definition of recurrence was missing in 8/56 (14%) studies and in the others, variations of the percent of recurrences at a given time point or as a time to event outcome were used, but no consistent way of defining and measuring recurrence was used overall. Furthermore, in studies that reported both progression and recurrence, progression as the first event was regarded as a recurrence event in 12 studies and in 34 others it was not.

Table 5

Verbatim outcome name and definition heterogeneity

Progression

Of 57 studies, 42 (74 %) reported bladder cancer progression. Definition for progression was given in 41/42 (97%) studies with a large variability in definition. A common threshold for “progression” was≥pT2 in 16 (38%) studies, with 2 of them also classifying CIS as a progression. As an example of inconsistency in verbatims used, “progression to MIBC” was used in the definition in 31/42 (74%) studies, with 22 of those further including metastases. Ta-> T1 and T1-> MIBC was also considered progression in 4/42 (9%) studies (Tables 3 and 4).

Treatment response

Treatment response in patients with CIS was reported in 10 (40%) of 25 studies (Table 2). There was heterogeneity in what time-point was considered to assess the response to treatment. de Reijke et al defined and reported “complete response”, “partial response”, “no change” and “progression” [29]. The rest of the studies reported only complete response to treatment.

The time-point to assess complete response varied widely, ranging from 3 months from enrollment up to 12 months.

Eight different outcomes were included in the “Treatment response (for CIS)” domain (Tables 4 and 5).

Treatment relapse after complete response was described in three trials (Table 4).

Death

A survival outcome was reported in 44/57 (40%) of studies; equally common were cancer-specific survival and overall survival, each reported in 19 (33%) studies. Ten and eleven different verbatim names were used to report overall survival and cancer-specific survival, respectively (Tables 2, 5).

Adverse events (AEs) were heterogeneously defined. In 12 of the 44 studies (27%) reporting AEs, there was no definition of an AE, and overall 24 different definitions/instruments were used. Studies reporting AEs used unique systems to categorise the type of AE or grade the severity of the AEs, and made no reference to a standardised reporting system. Across 10 studies, 3 standardised AE reporting instruments were used, but these did not include some of the most relevant AEs for intravesical instillations:

• NCI-CTCAE (Common Terminology Criteria of Adverse Events),

• WHO-ART (1979 WHO Adverse Reaction Terminology)

Adverse events were further grouped in numerous ways, e.g. local or systemic toxicity, constitutional symptoms, laboratory abnormality, death, and treatment interruption due to AEs (Table 5). Detailed lists of how AEs were described and reported are provided in Supplementary Table 2.

In 25 of the 44 studies (57%) specific AEs were not listed; instead, authors reported either only local toxicities, or major/severe/more common side-effects or AEs that resulted in treatment interruption. Five of these 25studies did not report the list of individual toxicities at all; instead, authors presented only the frequency and percentage [n (%)] of any AEs which occurred. Furthermore, poor treatment compliance related to AEs was not consistently reported.

Adherence to completion of all planned instillations was at least partially reported in 10/57 (17%) studies: six studies concerning maintenance instillations, two “induction course” studies, and two studies comparing induction to maintenance. None of the single-instillation studies reported completion rates. Four studies gave a comprehensive overview of the reasons for treatment discontinuation. The author definitions for instillation treatment completion are reported in Table 5.

TREATMENT FAILURE/CHANGE OF TREATMENT

21/57 studies (37%) reported treatment failure and/or the need to change from instillations to a different treatment. 21 studies specified the treatment that was given after instillations were discontinued:

• radical cystectomy (RC) (14/21 studies)

• RC and/or radiotherapy (RT) (4/12 studies)

• TURBT (1/21)

• RC, TURBT + RT, chemotherapy (1/21)

• “non-allowed instillations” (1/21)

GLOBAL QUALITY OF LIFE

Two studies measured and reported patient experience during the instillations; Koga et al by measuring Qol, and Huang et al by evaluating instillation related pain/irritation [30, 31].

In the study by Koga et al, Qol was assessed according to the Japanese version of the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 (EORTC QLQ-C30) v2.0. QoL was assessed before induction therapy, after the 5th instillation of induction therapy, 4 weeks after the completion of induction therapy, and 14 months after randomization [30].

Huang et al evaluated the effect of hyaluronic acid in reducing pirarubicin instillation related side-effects. A visual analog scale (VAS) was used daily to evaluate pain [31].

RESOURCE USE (HEALTH ECONOMICS)

Only one study evaluated the costs related to the treatment. Berrum-Svennung et al randomized NMIBC patients to one instillation of epirubicin or placebo after TURBT and evaluated cancer recurrences. They also calculated the cost of delivering a single instillation during the initial treatment and as first recurrences occurred [32].

DISCUSSION

This is the first study to systematically and comprehensively overview the extent of outcome reporting, measurement, and definition heterogeneity in the setting of adjuvant treatments for NMIBC.

Recurrence was frequently reported in the included RCTs; yet, some studies did not define it. In those that did, there was variability in the names that were used, the definitions and the reporting. Most concerning, however, was the variation in how progression was handled in the analysis of recurrence. In studies where progression as the first event was counted as a recurrence, the measure provided is qualitatively and quantitatively different from those where recurrence was more narrowly defined as the re-appearance of a non-muscle invasive tumour. To overlook this subtlety runs the risk of not comparing like with like across studies, or statistically pooling aggregated results in a potentially misleading way.

Progression was also frequently reported, but again the definitions were inconsistent across trials. Worsening of the disease leads to a change in treatment strategy, and that was also inconsistently reported. It is also crucial to point out whether assessment of progression has been made based on imaging (e.g. CT or MRI), TURBT or radical cystectomy. As only four studies gave a comprehensive overview of reasons to change the treatment strategy, there is a high risk of getting misleading results. If prior to progression, patients die due to an unrelated cause, or undergo cystectomy (for example due to recurrent high grade T1 disease), then the progression rates at specific time points will be different according to whether the death and cystectomy have been counted as a competing risk (cumulative incidence function) or simply as censored (Kaplan-Meier curve). Equally important is to highlight how patients are followed for the efficacy outcomes in case the treatment has been stopped due to side-effects. There may also be a difference in outcomes according to whether the results are reported in all randomized patients (intent to treat analysis) or only in eligible patients who have been treated according to the protocol (per protocol analysis).

Treatment response in patients with CIS in specific was evaluated and reported in only 40% of studies. The rest of the studies recruiting patients with CIS evaluated CIS as papillary tumors, and reported only recurrence or/and progression. However, CIS has additional diagnostic challenges and may have a very different disease course than papillary tumors do: as such, separate approaches to measure and define their outcomes should be applied [23].

The most heterogenous outcome was AEs, evident in the many categorizations and instruments used to record AEs, and in the system level subgroupings chosen by trialists. Unfortunately, many of these were not optimal for instillation-related AEs. Whilst in some instances it may be possible for systematic reviewers to recode lists of AEs (if they are provided) to a common standardized toxicity classification system, this is a poor excuse for lack of standardization in primary trials and needlessly adds time and complexity to the critical interpretation of the evidence base. Poor treatment compliance reporting is likely to confound other cancer related outcomes such as recurrence, progression and overall survival.

Perhaps the most alarming finding is that QoL is conspicuously missing. Instillation treatments are demanding for patients and it would be very important to understand all the consequences (both oncological and QoL-related) for patients before the decision about treatment is made. A recent investigation of QoL in bladder cancer patients compared to a matched sample of older adults without bladder cancer in a US population found significant declines in health-related QoL (HRQoL) scores over time in the physical, mental and social components of the SF-36 [33]. The EORTC Quality of Life Group also developed an externally validated QLQ-BLS24 questionnaire for NMIBC [34]. In a systematic review, Mason and colleagues used the COSMIN checklist to evaluate the psychometric properties of patient reported outcome measures (PROMs) used in bladder cancer populations, of which two of the 15 included PROMs were NMIBC-specific (QLQ-BLS24 and CAVICAVEMNI) [35, 36]. Of note, they found that no existing PROM stood out as the most appropriate measure of QoL in any bladder cancer population and although further validation studies are required, generic PROMs, cancer-generic PROMs and bladder cancer-specific PROMs will currently provide the most robust picture. Mason et al’s study [35] is very important for a subsequent phase of our COS development as most existing cancer COS have included QoL and it is anticipated NMIBC patients will also prioritise this, encompassing urinary, bowel and sexual function, as critically important outcome domains.

Without having included NMIBC patients in a qualitative study of their experiences of bladder cancer and its treatments, it cannot yet be known which outcomes are of most importance to them, or if they are adequately captured in current trials, but it is discouraging that so few trials routinely include PROMs.

Health economics was considered in only one RCT, which calculated costs of single instillation [32]. Bladder cancer, especially NMIBC, contributes significantly to healthcare costs due to intense surveillance strategies and its potential to recur and progress [8, 37]. This should be considered when treatments and outcomes are compared.

Kamat et al provided recommendations on NMIBC intervention trial designs, eligibility criteria, and ‘clinically meaningful’ effect size thresholds for outcomes [23]. Likewise, Lamm et al suggested a change in definition for progression in NMIBC [38]. These initiatives are important to bear in mind for subsequent phases of our project. Once the outcomes considered core by all stakeholders (e.g. patients, urologists, oncologists, nurses, payers, methodologists) are known (i.e. what to measure) [22] then we will turn attention to definitions and measurement tools (i.e. how to measure) [39] whilst again including key stakeholders. Importantly, these initiatives, in conjunction with ours, show that there is an acknowledgement of problems with the evidence base and a desire to do improvements.

LIMITATIONS

The decision to exclude phase I and II trials (phases before determining the therapeutic effect of the drug) and to exclude all non-randomised designs may have limited the chance to capture longer-term and patient reported outcomes relating to function and QoL. However, in subsequent phases of the project, such as in Delphi survey and consensus meetings, participants will have an opportunity to propose ‘new’ outcomes not already considered for prioritisation, therefore we consider that the risk of having missed outcomes is minimal, and that we have carried out a pragmatic trade-off against the resource implication of including all study designs.

CONCLUSIONS

We have shown that there is inconsistency in outcome reporting and variation in definitions in randomized trials comparing adjuvant treatments in NMIBC patients. This situation makes comparing the results of individual studies difficult, and makes their statistical combination challenging, impossible, or inappropriate; hence, providing summaries of the evidence which are, at best, unwieldy and at worst misleading, making evidence-based treatment recommendations difficult. A core outcome set, incorporating the views of a variety of stakeholders such as urologists, oncologists, methodologists and, most importantly, patients, is urgently required.

ACKNOWLEDGMENTS

The authors have no acknowledgements.

FUNDING

The authors report no funding.

AUTHOR CONTRIBUTIONS

Erik Veskimae - performance of work; interpretation or analysis of data; writing the article. Selvarani Subbarayan – performance of work; interpretation or analysis of data; writing the article. Riccardo Campi - performance of work; interpretation or analysis of data; writing the article. Domitille Carron - performance of work; interpretation or analysis of data; writing the article. Muhammad Imran Omar – conception; performance of work; interpretation or analysis of data; writing the article. Cathy Yuan- performance of work; interpretation or analysis of data; writing the article. Konstantinos Dimitropoulos - performance of work; interpretation or analysis of data; writing the article. Mieke Van Hemelrijck - interpretation or analysis of data; writing the article. Richard T. Bryan - interpretation or analysis of data; writing the article. James N’Dow - interpretation or analysis of data; writing the article. Marek Babjuk - interpretation or analysis of data; writing the article. J. Alfred Witjes - interpretation or analysis of data; writing the article. Richard Sylvester – conception; performance of work; interpretation or analysis of data; writing the article. Steven MacLennan – conception; performance of work; interpretation or analysis of data; writing the article.

ETHICAL CONSIDERATIONS

This study, as a literature review, is exempt from any requirement for Institutional Review Board approval. No human or animal research was involved in the elaboration of this manuscript.

CONFLICT OF INTEREST

Erik Veskimae – Has no conflict of interest to report. Selvarani Subbarayan – Has no conflict of interest to report. Riccardo Campi – Has no conflict of interest to report. Domitille Carron – Has no conflict of interest to report. Muhammad Imran Omar – Has no conflict of interest to report. Cathy Yuan – Has no conflict of interest to report. Konstantinos Dimitropoulos – Has no conflict of interest to report. Mieke Van Hemelrijck – Has no conflict of interest to report. Richard T. Bryan – Reports other from Janssen EMEA, grants from UroGen Pharma, grants from QED Therapeutics, outside the submitted work. James N’Dow – Has no conflict of interest to report. Marek Babjuk – Has no conflict of interest to report. J. Alfred Witjes – Has no conflict of interest to report. Richard Sylvester – Has no conflict of interest to report. Steven MacLennan – Has no conflict of interest to report.

Appendices

APPENDIX 1

A systematic review of outcome reporting, definition and measurement heterogeneity in Non-Muscle Invasive Bladder Cancer effectiveness trials of adjuvant, prophylactic treatment after transurethral resection

Search strategies

OVID Medline Epub Ahead of Print, In-Process & Other Non-Indexed Citations, Ovid MEDLINE(R) Daily and Ovid MEDLINE(R) 1946 to Jan 15, 2020.

(n = 341)

• 1. exp Urinary Bladder Neoplasms/

• 2. ((bladder or vesical) adj3 (cancer* or carcin* or malign* or tumor* or tumour* or neoplasm* or papilloma)).tw,kw.

• 3. exp Carcinoma, Transitional Cell/

• 4. (transitional cell adj3 (carcinoma* or cancer* or tumor* or tumor*)).tw,kw.

• 6. or/1– 5

• 7. ((transurethral or trans-urethral) and resect* and bladder).tw,kw.

• 8. (TURBT or TUR or TURB).tw,kw.

• 9. exp Prophylactic Surgical Procedures/

• 11. (adjuvant or prophylaxis or prophylactic or prevent* or intravesical or intra-vesical or instillation*).tw,kw.

• 12. or/7– 11

• 13. 6 and 12

• 14. randomized controlled trial.pt.

• 15. controlled clinical trial.pt.

• 16. random*.mp.

• 17. placebo.ab.

• 18. drug therapy.fs.

• 19. trial.ab.

• 20. groups.ab.

• 21. or/14– 20

• 22. exp animals/ not humans.sh.

• 23. 21 not 22

• 24. 13 and 23

• 25. exp meta-analysis as topic/

• 26. exp Meta-Analysis/

• 27. (Systematic review or meta-analysis).tw,kw. or (Medline or Embase or Pubmed or Cochrane or literature search or literature review).ab.

• 28. or/25– 27

• 29. 24 and 28

• 30. Congresses as Topic/ or “Journal: Conference Abstract”.pt.

• 31. 29 not 31

Embase < 1974 to 2020 January 15 > (via Ovid):∥(n = 375)

• 2 ((bladder or vesical) adj3 (cancer* or carcin* or malign* or tumor* or tumour* or neoplasm* or papilloma)).tw,kw.

• 3 exp transitional cell carcinoma/

• 4 (transitional cell adj3 (carcinoma* or cancer* or tumor* or tumor*)).tw,kw.

• 6 or/1– 5

• 7 exp transurethral resection/

• 8 ((transurethral or trans-urethral) and resect* and bladder).tw,kw.

• 9 (TURBT or TUR or TURB).tw,kw.

• 10 exp prophylaxis/

• 11 exp cancer adjuvant therapy/

• 12 (adjuvant or prophylaxis or prophylactic or prevent* or intravesical or intra-vesical or instillation*).tw,kw.

• 13 or/7– 12

• 14 6 and 13

• 15 Randomized controlled trial/

• 16 Random$.ti,ab. • 17 randomization/ • 18 intermethod comparison/ • 19 placebo.ti,ab. • 20 (compare or compared or comparison).ti. • 21 ((evaluated or evaluate or evaluating or assessed or assess) and (compare or compared or comparing or comparison)).ab. • 22 (open adj label).ti,ab. • 23 ((double or single or doubly or singly) adj (blind or blinded or blindly)).ti,ab. • 24 double blind procedure/ • 25 parallel group$1.ti,ab.

• 26 (crossover or cross over).ti,ab.

• 27 ((assign$or match or matched or allocation) adj5 (alternate or group$1 or intervention$1 or patient$1 or subject$1 or participant$1)).ti,ab.

• 28 (assigned or allocated).ti,ab.

• 29 (controlled adj7 (study or design or trial)).ti,ab.

• 30 (volunteer or volunteers).ti,ab.

• 31 human experiment/

• 32 trial.ti.

• 33 or/15– 32

• 34 (random$adj sampl$ adj7 (“cross section$” or questionnaire$1 or survey$or database$1)).ti,ab. not (comparative study/ or controlled study/ or randomi?edcontrolled.ti,ab. or randomly assigned.ti,ab.)

• 35 Cross-sectional study/ not (randomized controlled trial/ or controlled clinical study/ or controlled study/ or randomi?edcontrolled.ti,ab. or control group$1.ti,ab.) • 36 (((case adj control$) and random$) not randomi?ed controlled).ti,ab. • 37 (nonrandom$ not random$).ti,ab. • 38 “Random field$ ”.ti,ab.

• 39 (random cluster adj3 sampl$).ti,ab. • 40 (rat or rats or mouse or mice or swine or porcine or murine or sheep or lambs or pigs or piglets or rabbit or rabbits or cat or cats or dog or dogs or cattle or bovine or monkey or monkeys or trout or marmoset$1).ti. and animal experiment/

• 41 Animal experiment/ not (human experiment/ or human/)

• 42 or/34– 41

• 43 33 not 42

• 44 14 and 43

• 45 exp “systematic review”/

• 46 exp meta analysis/

• 47 (Systematic review or meta-analysis).tw,kw. or (Medline or Embase or Pubmed or Cochrane or literature search or literature review).ab.

• 48 or/45– 47

• 49 44 and 48

• 50 conference abstract.pt.

• 51 Conference Review.pt.

• 52 50 or 51

• 53 49 not 52

Cochrane Database of Systematic Reviews< 2005 to January 15, 2020 > (via Ovid):

(n = 91)

• 1 ((bladder or vesical) adj3 (cancer* or carcin* or malign* or tumor* or tumour* or neoplasm* or papilloma)).tw,kw.

• 2 (transitional cell adj3 (carcinoma* or cancer* or tumor* or tumor*)).tw,kw.