Why is Machine Learning Not in Our Doctor’s Offices?

[ad_1]

Table of Contents

Machine Learning became a heavyweight trend in medical research. But why don’t we see any practical applications of machine learning and artificial intelligence in our doctor’s offices?

Machine learning (ML) programs computers to learn the way we do – through continually assessing data and identifying patterns based on past outcomes. ML can quickly pick out trends in big datasets, operate with little to no human interaction and improve its predictions over time. Due to these abilities, it rapidly finds its way into medical research.

Why is Machine Learning Not in Our Doctor’s Offices?

Doctors, medical team – illustrative photo. Machine learning is now routinely used in medical research. Image credit: NCI via Unsplash, free license

People with breast cancer may soon be diagnosed through machine learning faster than through a biopsy. Those suffering from depression might be able to predict mood changes through smart phone recordings of daily activities such as the time they wake up and amount of time they spend exercising.

Machine learning may also help paralyzed people regain autonomy using prosthetics controlled by patterns identified in brain scan data. ML research promises these and many other possibilities to help people lead healthier lives.

But while the number of machine learning studies grow, the actual use of it in doctors’ offices has not expanded much past simple functions such as converting voice to text for notetaking.

The limitations lie in medical research’s small sample sizes and unique datasets. This small data makes it hard for machines to identify meaningful patterns. The more data, the more accuracy in ML diagnoses and predictions.

For many diagnostic uses, massive numbers of subjects in the thousands would be needed, but most studies use smaller numbers in the dozens of subjects.

Doctor analyzing medical information – illustrative photo. Image credit: Irwan @tweetbyirwan via Unsplash, free license

But there are ways to find significant results from small datasets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate significance in a dataset that in reality may be just random outliers.

This tactic, known as P-hacking or feature hacking in machine learning, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper doesn’t translate to a doctor’s ability to diagnose or treat us.

These statistical mistakes, oftentimes done unknowingly, can lead to dangerous conclusions.

To help scientists avoid these mistakes and push ML applications forward, Konrad Kording, Nathan Francis Mossell University Professor with appointments in the Departments of Bioengineering and Computer and Information Science in Penn Engineering and the Department of Neuroscience at Penn’s Perelman School of Medicine, is leading an aspect of a large, NIH-funded program known as CENTER – Creating an Educational Nexus for Training in Experimental Rigor.

Kording will lead Penn’s cohort by creating the Community for Rigor which will provide open-access resources on conducting sound science. Members of this inclusive scientific community will be able to engage with ML simulations and discussion-based courses.

“The reason for the lack of machine learning in real-world scenarios is due to statistical misuse rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and many times we can track that back to their use of statistics.”

A nurse wearing a protective face mask. Image credit: Ömer Yıldız via Unsplash, free license

Such studies that make their way into peer-reviewed journals contribute to misinformation and mistrust in science and are more common than one might expect.

A recent publication grabbed Kording’s attention. The study, which used machine learning on data from MRI scans of the brain, claimed to have created a model that could detect suicidal ideation with an accuracy of 91 percent – a model that would surely transform certain diagnostic procedures.

But upon repeating the data analysis of this study, Kording and colleague Tim Verstynen, Associate Professor of Psychology at the Neuroscience Institute at Carnegie Mellon University, found many instances of feature hacking that would have led the researchers to cherry pick data points to create a highly specific predictive model.

“With only 34 patients, their study started off with too small of a sample size to result in sound science,” says Kording.

“The data they used were a combination of words relating to mortality and corresponding regions of the brain that lit up in MRI scans. Instead of using all of the data from each patient, they chose specific words and regions.”

Preparation for an MRI scan – illustrative photo. Image credit: Accuray via Unsplash, free license

Those choices led to the creation of a very high-performing model when used and tested against that specific dataset. But, if used to predict suicidal ideation in real patients, it would not be accurate.

Research based on the manipulated data points of 34 people won’t serve mental health practitioners looking for diagnostic tools. After Kording’s re-analysis, the study was retracted from the journal.

To make meaningful advancements in the field of ML in biomedical research, it will be necessary to raise awareness of these issues, help researchers understand how to identify them and limit them, and create a stronger culture around scientific rigor in the research community.

Kording aims to communicate that just because incorporating machine learning into biomedical research can introduce room for bias doesn’t mean scientists should avoid it. They just need to understand how to use it in a meaningful way.

Operating room in a hospital. Image credit: sasint | Free image via Pixabay

The Community for Rigor aims to address challenges of the field with specific plans to create a module on machine learning in biomedical research that will guide participants through datasets and statistical tests and pinpoint exact locations where bias is commonly introduced.

The Community is still in its infancy but Kording and colleagues plan to publish resources as soon as the fall. One of the first ways to get involved in this effort is to follow The Community for Rigor on Twitter and join the conversation by anonymously sharing your own scientific rigor mistakes and challenges.

“While it would be extremely helpful to have easy and accurate ways of diagnosing and treating medical conditions, our own human bias can get in the way of what the data is actually saying or not saying,” warns Kording. “That’s what this community aims to improve.”

Source: University of Pennsylvania

[ad_2]
Source link