A review of studies Posted in Open JAMA Network found few randomized clinical trials for medical machine learning algorithms, and the researchers noted quality issues in many of the published trials they reviewed.
The review included 41 RCTs of machine learning interventions. He revealed that 39% were published in the last year and more than half were conducted on unique sites. Fifteen trials took place in the United States, while 13 were conducted in China. Six studies were conducted in several countries.
Only 11 trials collected data on race and ethnicity. Of these, a median of 21% of participants belonged to underrepresented minority groups.
None of the tests fully complied with the Consolidated Test Reporting Standards – Artificial Intelligence (CONSORT-AI), a set of guidelines developed for clinical trials evaluating medical interventions that include AI. Thirteen trials met at least eight of the 11 CONSORT-AI criteria.
The researchers noted some common reasons why trials did not meet these standards, including failing to assess low-quality or unavailable input data, failing to analyze performance errors, and failing to include information about the availability of the code or algorithm.
By using the Cochrane Risk of Bias Tool to assess potential bias in the RCTs, the review also found that the overall risk of bias was high in all seven clinical trials.
“This systematic review found that despite the large number of machine learning-based medical algorithms being developed, few RCTs for these technologies have been conducted. Among the published RCTs, there was great variability in the adherence to reporting standards and the risk of bias and a lack of participants from underrepresented minority groups.These findings deserve attention and should be considered in the design and reporting of future RCTs,β the authors wrote. of the study.
WHY IS IT IMPORTANT
The researchers said there were some limitations to their review. They reviewed studies evaluating a machine learning tool that had a direct impact on clinical decision-making so that future research could examine a wider range of interventions, such as those for workflow efficiency or patient stratification. The review also only assessed studies up to October 2021, and further reviews would be needed as new machine learning interventions are developed and studied.
However, the study authors said their review demonstrated that more high-quality RCTs of machine learning algorithms in healthcare need to be conducted. Whereas hundreds of machine learning enabled devices were cleared by the FDA, the review suggests that the vast majority did not include RCTs.
βIt is impractical to formally evaluate every potential iteration of a new technology through an RCT (e.g., a machine learning algorithm used in a hospital system and then used for the same clinical scenario in a other geographic location),β the researchers wrote.
“A baseline RCT of the effectiveness of an intervention would help determine whether a new tool provides clinical utility and value. This baseline evaluation could be followed by retrospective or prospective external validation studies to demonstrate how the effectiveness of an intervention becomes generalized over time and in all clinical settings.”