By: Candice Tang, ORT Times Writer and UHN Trainee
Imagine a computer that can diagnose patients with great accuracy. Like a trained specialist, the computer has learned to recognize patterns in biological data by “seeing” hundreds of patients, yet it only took the computer several hours to become a master. How is this possible?
The answer is machine learning algorithms. They are designed to analyze large datasets and use this knowledge to make conclusions about similar but new data. This technology has many applications in healthcare, yet few researchers have the experience necessary to use it.
Dr. Davide Chicco, a postdoctoral fellow in the Hoffman lab at the Princess Margaret Cancer Centre, has observed common malpractices that can lead to incorrect conclusions. He shares ten helpful tips to run a successful machine learning project. His suggestions mimic how a student learns in a classroom.
Set yourself up for success:
Davide places special emphasis on preparing one’s dataset before running the algorithm. He explains that properly arranging, shuffling and removing outliers from a dataset as needed can improve processing speed and performance. Like a student writing an essay, they must include a set of points in their argument but can discard ones that don’t make sense.
Trial and error:
Hyper-parameters are properties that can affect the complexity and speed at which the algorithm processes data, thus optimizing them is a critical step in machine learning. Similarly, a student choosing a study environment can affect both productivity and performance.
Pay it forward:
Students often learn more from discussing with their peers or teachers than learning on their own. For those starting out in machine learning, talk with your local computational biologist or ask questions on several online forums. This may open up opportunities for collaboration. In an era of big data, machine learning software can help us process more information with greater speed, accuracy, and precision than ever before. But it is up to the user to teach the algorithm to do its job. The
ORT spoke with Davide.
How are machine learning technologies impacting biology and medicine?
Machine learning has already been helping biologists and health care researchers in thousands of ways: it is been used to identify tumor traits in breast cancer images, for example, and to recognize clusters of complementary DNA microarray datasets. In our lab, we are currently working on a project about the prediction of mesothelioma patients diagnosis through machine learning: by reading a patient data profile, our machine-learning-trained software will be able to state if he/she has mesothelioma or is healthy. All in approximately one minute! I hope we will have much more to talk about this soon.
Dr. Davide Chicco, Author and UHN Trainee