By Ruthie Fogel


What if there was a way the world can prevent cancer and other detrimental diseases? What if there was a way to discover new drugs and predict their efficacy and success rate to find the right therapy path for a patient? Artificial intelligence (AI) can do all of this and more and is well on its way to being implemented in healthcare.

Machine learning (ML) is a strand of computer science and AI that collects data from the user to mimic the ways the user learns. According to a paper written by Park et al., it was first described by computer scientist Arthur Samuel in 1952 as “a field of computer science that gives computers the ability to learn without recognition…and creates algorithms that can learn from data and make predictions on the data.” For example, say you wake up and then ask your virtual assistant (Google Home) what your schedule is today. By simply asking: “Hey Google, what’s my schedule for today?”, the virtual assistant seeks out information, collects and recalls some of your past requests or collects information from your applications like Google Calendar. The Virtual Assistant uses machine learning to collect data from your past requests to interpret these patterns in the data into results based on your preferences.  


Machine Learning in Biomedical Research

In biomedical research, there are two main branches of ML: supervised learning and unsupervised learning. Supervised learning tells the machine precisely what patterns to identify using a pre-existing set of data known as a training set; for example, this is similar to a dog sniffing out something when it knows exactly what the scent is. Unsupervised learning essentially does not have any assigned data it needs to look for but rather it just identifies any patterns it can detect. For example, that same dog is sniffing out a number of smells that are similar and can be grouped together. 


The Roadmap to Curing Cancer with Neural Networks 

Supervised learning training sets can do a wide array of tasks, including predicting outputs based on the input data. Researchers can use this type of ML to go through a tremendous amount of information to aid them with understanding how types of cancers are developed and the treatments that can best eradicate them. 

According to Real Engineering, one of the most interesting types of supervised ML is the artificial neural network. A type of machine learning inspired by the biological neural networks in the human brain, and are used as tools for finding relationships between variables in a data set that are too complex for a human to recognize. This type of machine learning is very powerful, as it allows scientists to predict a cancer diagnosis before onset. It does this by comparing the interaction of genes, nutrients, and demographic indicators and the relationship to cancer development. 

These neural networks require a lot of datasets and information needed to be analyzed. For cancer ML, the training set may have set parameters in order to prepare the neural network; these may include genetics, demographics, nutrition and body mass index (BMI). The training set also needs information about the cancer patients. The data is then sifted through to determine which patients have cancer and which do not. Around 75% of the data is used to train the neural network and 25% is for testing the efficiency at which the network is operating. A neural network is composed of an input layer, one or more hidden layers, and an output layer. During the training stage, you first feed the network your input variables at the input layer. The output variable would be whether or not the patient had cancer, represented as a zero for no and one for yes.

There are a number of ML algorithms that have been created to help prognose and detect disease development such as cancer. Although this technique is critical and helpful for cancer prediction, using only these types of parameters do not provide sufficient information for making robust decisions. New molecular information can be received by ways of the advancement of genomic, proteomic, and imaging technologies. Over the years, there has been an extensive amount of cancer data collected and made readily available to professionals within the medical field.

The Cancer Genome Atlas Research Network (TCGA) works to discover major cancer-causing genomic alterations to create a comprehensive “atlas” of a cancer genomic profile. TCGA has molecularly characterized over 20,000 primary cancer samples, spanning 33 cancer types and has generated over 2.5 petabytes of genomic data. Cancer detection uses ML methods for modeling the progression and identifying informative factors that are utilized afterwards in a classification scheme. The success of a disease prognosis is dependent on the quality of a medical diagnosis; however, a prognostic prediction should take into account more than a simple diagnostic decision. 


AI used for Healthcare Diagnosis 

One of the leading causes of life-threatening medical emergencies in healthcare is sepsis. According to the CDC, sepsis is the body’s extreme response to an infection, which triggers a chain reaction that leads to tissue damage and death. Though sepsis is preventable if treated early, it is difficult for doctors to recognize early signs of sepsis in patients. Dr. Suchi Saria, an Associate Professor at John Hopkins University, designed an early warning algorithm called Targeted Real-time Early Warning System (TREWS). Essentially, TREWS is able to detect conditions, such as sepsis, by analyzing data from past patients. Using data from metrics such as lab tests and vital signs, TREWS can analyze and identify whether a person presents with early signs of sepsis. TREWS also compares biometrics in the context of other similar symptoms and lab results. For example, a sepsis patient could have elevated levels of creatinine (a waste product filtered in the kidneys). However, there are a number of other health conditions that can raise creatinine levels such as Chronic Kidney Disease (CKD). The TREWS algorithm needs to identify whether the cause for high creatinine levels are from sepsis, CKD, or any of a number of other factors, and it does this for other electronic lab results, deriving a pattern of different symptoms and signals in sepsis patients.TREWS detects signs of sepsis early, giving more time for doctors to stabilize and treat patients before it becomes fatal. Using data from 16,000 patients, TREWS, on average, detected the diagnosis 24 hours prior to shock onset. In two-thirds of these patients, their shock onset was detected prior to any organ dysfunction, this is a 60% increase in detection performance. However, TREWS is not yet implemented clinically into every hospital, as there are certain requirements to make TREWS globally accessible. First, it requires engineers who work in healthcare to build and scale these technologies. Second, it requires policy makers to create incentives to integrate TREWs within electronic medical records (EMRs). Third, the healthcare system needs to prioritize quality patient care. Though the TREWS algorithm is specified for sepsis, it has applications for a wide array of medical conditions and emergencies, with the capacity to tell doctors what, when, and who to treat before it is too late. 



These vast applications of ML in biomedical research are truly improving the quality of healthcare as well as the quality of life for many people. The future of biomedical research and the healthcare industry depends on technological advances in computer science, artificial intelligence, and machine learning. 

Published On: September 10th, 2021 / Categories: STEM Fellowship Journal /