The benefits of exercise for preventing and managing chronic disease have been well described.1 Indeed, ‘with the possible exception of diet modification, we know of no single intervention with greater promise than physical exercise to reduce the risk of virtually all chronic diseases simultaneously’.
2 Systematic reviews and meta-analyses show exercise to be similarly effective to medications for managing several chronic conditions,3–7 adding credibility to the notion that ‘exercise is medicine’.8 However, compared with trials of medicines, exercise trials tend to be of lower quality, at higher risk of bias and are less likely to report adverse events.3–7 Admittedly, some contribution to the reduced quality and higher risk of bias of exercise trials arises from difficulties in blinding participants and intervention providers. However, other important methodological features (eg, allocation concealment, analysis using intention-to-treat and blinding of assessors for objective measures) that should be used, often are not.9 Together, these methodological weaknesses limit confidence in the findings of exercise trials.3–7
Notably, the reporting of interventions in exercise trials is also often poor,10 especially when compared with similar trials of medicines.11 This is significant because poor reporting of interventions in clinical trials impairs quality appraisal, evidence synthesis and replication, and limits the ability of stakeholders (eg, patients, clinicians, policymakers) to implement them into clinical practice.12 If the reporting of an intervention is poor, the intervention itself, or its ‘dose’ is unknown. To combat poor reporting of exercise interventions in clinical trials, the Consensus on Exercise Reporting Template (CERT)—a 16-item minimum data set considered necessary to report exercise interventions—was developed in 2016.13 The CERT adds to other intervention-specific reporting guidelines such as the Template for Intervention Description and Replication (TIDieR),12 developed in 2014, which aim to improve the reporting of interventions in clinical trials. Despite the advent of these checklists, and several others,13–16 reporting of exercise interventions remains poor and does not appear to have improved over time.17–20
To illustrate the importance of reporting from research to clinical practice, take the example of a clinician who wants to prescribe an evidence-based exercise programme for their patient with patellofemoral pain.19 They find a methodologically rigorous systematic review showing that, based on moderate certainty evidence, exercise reduces patellofemoral pain compared with usual care. Hence, they deemed the exercise to be effective, and wish to replicate the intervention in practice. However, on reading the review, they discover the exercise interventions were poorly described, with little information on the type and dose, level of supervision and what co-interventions, if any, were delivered. The clinician is now uncertain what the ‘effective’ exercise programme was. The same could be said for clinicians working in exercise oncology, where key principles of training (eg, progression and reversibility) and prescriptive components of exercise (eg, frequency, intensity, time and type) are poorly reported.21
Several systematic reviews have been published that investigated the reporting quality of exercise interventions for various health conditions (eg, cardiovascular, musculoskeletal, neurological),19 22–24 but these are yet to be synthesised. These individual systematic reviews provide evidence for quality of reporting of specific health conditions; however, they do not inform on the quality of reporting across the entire field more broadly. Hence, the quality of reporting across exercise medicine literature remains unknown. The aim of this overview of systematic reviews was to determine how well exercise interventions have been reported in clinical trials of exercise for health and disease. For clinical research to be translated into practice, clinicians must be able to identify the intervention components in sufficient detail to replicate them. This is particularly important for complex interventions like exercise given the many modifiable variables that may impact its effectiveness.25–27
This overview of systematic reviews was conducted in accordance with the Cochrane Handbook for Systematic Reviews of Interventions (Chapter V—Overviews) recommendations28 and reported in line with the Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) 2020 statement.29 The protocol was registered prospectively on the Open Science Framework (OSF) (osf.io/my3ec/) and PROSPERO (CRD42021261285) prior to conducting searches. All data and code are publicly available on OSF (osf.io/my3ec/).
We included systematic reviews of exercise interventions that specifically examined, as a primary aim, how well the exercise interventions were reported.
We searched electronic databases (PubMed, EMBASE, CINAHL, SPORTDiscus and PsycINFO) up to June 2021, using search terms relating to ‘exercise’ or ‘physical activity’ and ‘reporting’ (eg, CERT or TIDieR). We did not restrict the inclusion of reviews by year of publication, publication status or language. The search strategy for PubMed was as follows: (exercis*[Title] OR sport*[Title] OR physical activity[Title] OR train*[Title] OR aerobic[Title] OR resistance[Title] OR physical training[Title] OR active[Title] OR move*[Title] OR rehab*[Title]) AND (cert[TiAb] OR tidier[TiAb] OR “template for intervention description and replication”[TiAb] OR report*[Title] OR complet*[TiAb] OR describ*[TiAb] OR replic*[TiAb] OR characteristics[TiAb] OR design[TiAb] OR program[TiAb]) OR (consensus on exercise reporting template) with a filter for systematic reviews in humans. The search strategy for all other databases is shown in online supplemental appendix 1. We also identified systematic reviews previously known to the authors and conducted forward and backward citation tracking up to July 2021 using Google Scholar,30 to identify any other relevant reviews not discovered in the initial search.
Record management and screening
Results of electronic database searches were imported to Covidence31 where duplicate records were removed automatically. Two reviewers independently conducted two stages of eligibility screening: (1) title and abstract; (2) full text. Any disagreements on screening were resolved through discussion, with arbitration of a third author if required.
Data from included reviews were extracted in duplicate by independent authors using Covidence extraction V.2.0.31 Disagreements were resolved through discussion or arbitration from a third author if required.
Data items: characteristics of included reviews
We extracted data relating to the review characteristics (included study design, health condition(s), sample size, number of interventions, exercise intervention type, comparator(s)) and reporting guideline used (CERT and/or TIDieR). If investigated by an included review, we also extracted data related to changes in reporting quality over time.
Data items: reporting characteristics
From the included systematic reviews, we extracted the completeness of reporting (primary outcome), expressed as the percentage of interventions that reported each item in sufficient detail for replication, according to the relevant checklist, TIDieR or CERT. All items of each of these checklists were considered in this overview (table 1). If a combined or hybrid checklist was used, we separated the items from CERT and TIDieR and considered these separate checklists. We ourselves did not evaluate the completeness of reporting, rather, completeness of reporting was judged by the primary authors of the included systematic review. As a result, we relied on the level of reporting which the authors chose to obtain data from the primary trial manuscripts (ie, from the primary publication and supplemental material only). Similarly, we ourselves did not evaluate the change in reporting quality over time but instead used the judgements of the primary authors of the included systematic reviews. We contacted authors of the included systematic reviews when items on the relevant guideline were not reported. If a review included studies of multiple interventions, we extracted the completeness of reporting relative to the number of interventions, rather than the number of studies. We did not extract risk of bias ratings of individual studies as these were unlikely to affect the quality of reporting.
We used R32 to conduct all analyses. From each review, we extracted the number or percentage of studies that appropriately described each item of the respective scale(s) (CERT or TIDieR). When data were presented in systematic reviews as the number of studies (ie, 6 of 24 studies reported the item sufficiently), it was converted to a percentage of studies for analysis to allow comparability between reviews. Data were synthesised using simple descriptive statistics (median, IQR and range) for each item of each relevant tool. Data were visually inspected for normality. Most data were not normally distributed; therefore, for consistency, the median was chosen as the summary statistic. We performed subgroup analyses on the completeness of reporting within different health and disease areas when >3 reviews of the same area were identified (eg, cardiovascular, musculoskeletal, neurological, etc). Studies were grouped into the above areas based on the domains described in Exercise and Sports Science Australia’s standards.33 As several methods were used by review authors to analyse changes in quality of reporting over time (eg, correlations, linear regression, completeness of reporting across different time periods or before and after the introduction of CERT and TIDieR), we did not pool these results. Instead, our analysis of changes in reporting over time was described narratively. We did not assess the certainty of evidence as this was not relevant to the purpose of our overview of systematic reviews. CERT and TIDieR do not define ‘good’ or ‘poor’ reporting13 14; however, post hoc, we categorised reporting quality as ‘good’ when ≥80% of interventions included in the reviews reported the item(s) sufficiently, ‘moderate’ when 79% to 50% reported the items sufficiently and ‘poor’ when <50% reported items sufficiently; in line with included reviews’ cut-offs.34 35
Quality of systematic reviews
Review quality was assessed independently and in duplicate using a modified version of A MeaSurement Tool to Assess systematic Reviews (AMSTAR 2)36 (online supplemental appendix 3). In our modified version, we excluded the items pertaining to meta-analysis or risk of bias within individual studies (items 9, 11, 12, 13, 14 and 15) as these were not relevant to our review question. The quality of each review was deemed ‘high’, ‘moderate’, ‘low’ or ‘critically low’ based on the number of critical flaws (a rating of ‘no’ in items 2, 4 and 7)36 or non-critical weaknesses (a ‘no’ or ‘partial yes’ in any other domain) with the review (online supplemental appendix 4).
We identified 7804 studies and included 28 systematic reviews18–20 22 24 34 35 37–57 (figure 1). These 28 reviews included 1467 studies comprised of 1724 interventions. We found only one article published in a language other than English—German—which was translated using Google Translate.58 A list of the studies excluded during full-text assessment, with reasons, is provided in online supplemental appendix 5. Ten reviews only used CERT,35 38 41 42 44 48 52–54 56 11 reviews only used TIDieR,18–20 24 39 45–47 50 51 55 and 6 reviews22 34 37 40 43 57 used both CERT and TIDieR (table 2). One review38 used a hybrid tool comprised of items 1–5, 7, and 9–12 from TIDieR, complemented and expanded upon by items 6 and 8 of CERT. The median number of interventions included in the reviews was 24 (range 3–287, IQR 67). Twelve reviews assessed quality of reporting in musculoskeletal conditions,19 34 35 37 38 40–42 45 48 56 57 four in neurological conditions,39 47 51 53 six in cardiovascular conditions,18 20 22 24 49 55 one in cancer50 and five in ‘other’ conditions including urinary dysfunction (n=1),44 pelvic organ prolapse (n=1)43 organ transplant patients (n=2),46 52 and older adults (n=1).54 See table 2 for all characteristics of included reviews. We contacted two review authors, who provided data not reported in the review manuscripts.
Quality of included reviews (AMSTAR 2)
Nine reviews were rated moderate quality, 11 low quality and 8 critically low quality. The most common methodological shortcomings were item 10, Reporting sources of funding of included studies, where 28 reviews (100%) did not report the item sufficiently, and item 3, Rationale for selection of study designs, where 22 reviews (79%) did not report the item sufficiently. The most adhered to item was item 6, Conducting data extraction in duplicate, with 23 reviews (82%) reporting this sufficiently (see online supplemental appendix 6 for the complete results of the AMSTAR 2 assessment).
Quality of reporting: CERT
Sixteen reviews used CERT to assess quality of reporting (n=643 studies, n=757 interventions). The median percentage of all CERT items appropriately reported was 24% (range 5%–68%, IQR 19). The median score for each CERT item across the 16 reviews can be seen in figure 2. Item 4, Describe whether exercises are supervised or unsupervised and how they are delivered (median=68%, range 0%–100%, IQR 89) and Item 14, Describe whether the exercises are generic (one size fits all) or tailored to the individual (median=59%, range 0%–100%, IQR 70) were the best reported. In contrast, item 16a, Describe how adherence or fidelity to the exercise intervention is assessed/measured (median=5%, range 0%–93%, IQR 21) and item 16b, Describe the extent to which the intervention was delivered as planned (median=5%, range 0%–77%, IQR 38) were the most poorly reported. Some of the items most important for replication, item 8, Description of each exercise to enable replication (median=23%, range 0%–95%, IQR 44) and item 13, Detailed description of the exercise intervention, including sets, reps, duration, etc (median=24%, range 0%–100%, IQR 66) were also poorly reported. There were no obvious differences in CERT scores between health condition subgroups (online supplemental appendix 7).
Quality of reporting: TIDieR
Eighteen reviews used TIDieR to assess quality of reporting (n=1099 studies, n=1353 interventions). The median percentage of all TIDieR items appropriately reported was 49% (range 0%–100, IQR 33). The median score for each TIDieR item across the 18 reviews can be seen in figure 3. Item 1, Brief name (median=100%, range 0%–100%, IQR 4) and item 2, Why (median=98%, range 0%–100%, IQR 6) were the best reported. In contrast, item 10, Modifications (median=0%, range 0%–55%, IQR 12) and item 11, How well (planned) (median=23%, range 0%–70%, IQR 26) were the most poorly reported. The most relevant item to the ‘dose’ of exercise, item 8, When and how much, was moderately well reported (median=62%, range 0%–100%, IQR 68). Subgroup analyses (8online supplemental appendix 8) showed the neurological area had the highest median score (65% (range 2%–100%, IQR 62)), followed by the cardiovascular area (48% (range 0%–100%, IQR 23)) and the ‘other’ area (43% (range 0%–100%, IQR 30)).
Changes in reporting over time
Five reviews18–20 40 57 investigated changes in reporting quality over time, but the findings were mixed. Three reviews18–20 found no changes over time. One review57 found slight decreases in reporting quality over time, whereas another40 found improvements in reporting quality over time (table 3).
Our overview of systematic reviews revealed that exercise interventions are poorly reported across all health and disease areas. This was true regardless of the reporting template used, though completeness of reporting was slightly higher according to TIDieR than CERT. Completeness of reporting does not appear to have improved over time, and most reviews were of low quality. Based on these findings, if exercise is medicine, then how it is prescribed and delivered is unclear, potentially limiting its translation from research to practice.
Maintaining a high quality of intervention reporting is important in all fields of medicine, including exercise. Poor reporting of interventions may limit the ability of clinicians and policymakers to implement interventions in clinical practice, as it may be unclear how interventions should be delivered.12 For example, if any intervention was shown to improve an important aspect of health (eg, blood pressure) or fitness (eg, aerobic capacity or muscle strength), it is important to know the characteristics of the intervention that led to this improvement. Further, in an increasingly global field, it can be confusing with many different naming conventions of exercise(s) within and between disciplines both nationally and internationally. Descriptions of exercises, including pictures, could help combat this issue and enhance the quality of reporting.59 60 Evidence synthesis is also impaired by poor reporting as comparators and interventions may not be pooled for meta-analysis if the content of the treatments is unclear.61 High-quality reporting is needed in the field of exercise in order to promote clinical translation, evidence synthesis and clear appraisal of studies.
Poor reporting of interventions is not unique to exercise. Indeed, similar issues have been observed across a range of medical interventions,62 but exercise studies appear to more poorly report interventions.11 Our results show that the names of, and rationale for, exercise (TIDieR items 1 and 2) were very well reported, but this is of little use for researchers or clinicians trying to replicate the intervention. In contrast, key intervention components needed to optimise translation to practice, for example, detailed description of exercises to enable replication and, perhaps most crucially, detailed description of the exercise prescription, were poorly reported (figure 2). Moreover, items crucial to assessing intervention fidelity, adherence and adverse events were also poorly reported (figure 2). Intervention fidelity has important implications for the internal validity of a study,63 whereas reporting of adherence and adverse events is crucial to enable assessment of how tolerable and feasible the intervention was. To improve quality appraisal, evidence synthesis, replication and translation of exercise interventions to practice, reporting of exercise interventions must improve.
Several templates have been developed to assist in improving the reporting of exercise interventions. These include condition-specific tools (eg, CERT-PFMT64) for pelvic floor muscle training and more general templates.13–16 We chose to use CERT and TIDieR for this overview of systematic reviews as they are valid and reliable14 65 and focus on key intervention variables such as how, how much and how well, among others.13 14 The CERT was designed to build upon TIDieR to provide additional detail of important exercise intervention components.13 Interestingly, while the included reviews scored poorly well on TIDieR (median=49%, IQR 33) they scored much worse on CERT (median=24%, IQR 19). This disparity may be explained by the broad nature of TIDieR whereby, in trying to cover all healthcare interventions,62 it is too general for exercise. Based on the specificity of CERT to exercise, we recommend that authors use CERT to guide reporting of their exercise interventions. Our overview of systematic reviews showed that when important intervention components are defined and examined with more scrutiny, as exercise is with CERT, items crucial to the replicability of exercise interventions are poorly reported.
Despite the advent of TIDieR and CERT, there has been little change in the quality of reporting of exercise interventions over time. The reason for this is not clear. It may be that authors are unaware of these templates. Indeed, it can be difficult to navigate the hundreds of reporting guidelines available on the Enhancing the QUAlity and Transparency Of health Research Network (equator-network.org). Alternatively, authors may be aware of these templates but simply choose not to use them. We acknowledge that full adherence to reporting guidelines can be difficult, particularly with the stringent word limits of many journals. In this instance, we suggest authors provide as much detail as possible within the manuscript and provide all other additional information required by CERT or TIDieR (or other relevant reporting guideline) as supplemental material. There may also be other methods to improve reporting of exercise trials. Journals have previously mandated the use of reporting guidelines such as Consolidated Standards of Reporting Trials66 and PRISMA,67 which significantly improved the reporting of trials and systematic reviews, respectively.67–70 Therefore, exercise medicine journals may be well positioned to improve the reporting quality of the research they publish by encouraging, or preferably requiring, submission of a completed CERT checklist when exercise trials are submitted. Without this, the quality of reporting of exercise interventions may remain poor, limiting the possibility of potentially impactful exercise interventions being implemented in clinical practice.
A noted limitation of the evidence included in our overview of systematic reviews was that, using a modified AMSTAR 2 tool, no reviews were deemed to be of high quality, with most deemed moderate quality (n=9) or low quality (n=11). We chose only to include data that the authors of the included systematic reviews could extract from the main articles and supplemental materials, rather than information gathered by seeking out protocols or contacting trial authors. This may have reduced the completeness of reporting observed in our overview, as reporting does improve when these additional sources are used.18 24 This was not often done in the systematic reviews included in our overview, but when it was, reporting improved by 12%–34%.18 24 We chose only to include data that the primary review authors could extract from the main articles and supplemental materials, rather than information gathered by seeking out protocols or contacting trial authors. This may have reduced the completeness of reporting observed in our overview, as reporting does improve when these additional sources are used.24 However, it has been argued13 14 that contacting authors for more information about an intervention should not be necessary given the impact the intervention has on the study’s findings. We believe that the manuscript and supplemental information should, at a minimum, describe all items of the reporting guidelines to allow replication. We did not predefine a cut-off to categorise the quality of reporting (eg, as good or poor). However, we did make a judgement on these categorisations post hoc using suggested cut-offs from included systematic reviews34 35 to support our claims, and indeed, CERT and TIDieR do not provide criteria to do so.13 14 While outside the scope of this overview of systematic reviews, it is also important to note that many trials often do not sufficiently report comparators,62 which is important for assessing the internal validity of the trial.
Exercise is an intervention with widespread positive effects on many health conditions. But, across all fields involving exercise medicine, the quality of exercise intervention reporting is poor. High-quality reporting is needed to improve quality appraisal, enable evidence synthesis and replication, and improve translation in clinical settings. There has been little change in quality of reporting over time despite the presence of reporting checklists. Researchers, and the journals they submit to, have the opportunity to improve intervention reporting in exercise medicine by following TIDieR or CERT and encouraging or requiring inclusion of a completed checklist as part of standard practice when submitting exercise studies. This would likely lead to improved reporting quality over time, and a better understanding of the ‘dose’ of exercise medicine needed to optimise health outcomes.
What is already known
Exercise is effective for improving a range of health conditions, although exercise interventions are often poorly reported.
Poor reporting of interventions can reduce the ability for readers and researchers to assess quality, synthesise evidence, replicate and implement potentially effective interventions into practice.
The quality of reporting across studies of exercise medicine is unknown.
What are the new findings
Exercise interventions are poorly reported across all health areas of exercise medicine.
The quality of intervention reporting has not improved over time.
If exercise is medicine, then how it is prescribed and delivered is unclear, potentially limiting its translation from research to practice.
Patient consent for publication
This study does not involve human participants.