In the article titled Johann Gregor Mendel: the victory of statistics over human imagination, the authors argue that big data analysis using machine learning, deep learning, et al can identify patterns that human beings can’t and hence these become “an important tool for developing more effective therapeutic strategies for complex diseases”.
Let us examine whether the claim that data driven science is going to take center stage is true. Let us also examine whether it is worth the hype.
How the nature of science changes with technological advancements
M D Madhusudan once narrated to me the evolution of how wildlife conservation is practiced.
When he was entering the field, people would travel to forests, look at paw marks, note them down in notebooks, and do analysis with pen and paper.
A little while later, camera traps and spreadsheet software entered the scene. The number of data points generated increased, and the kind of analysis that could be performed using GIS software and the like increased too.
Today drone photography and image recognition generates millions of data points, and much of the analysis is through modeling, machine learning, and deep learning.
It would be inappropriate to not use technology that’s available to us to generate more rigorous science.
The limits of resolution
Understanding problems with very high precision need not give us ways to solve them. This is why the victory of statistics ends often in diagnosis.
For example, take sickle cell disease. We know the exact mutation that causes it since 1956 (70 years ago). But severe anemia due to sickle cell disease and related morbidity and mortality is routine.
While the precision that’s enabled by data analysis is real, it is not always important.
It’s like having bigger and bigger microscopes. Yes, that allows us to see more and more clearly. But the benefits plateau after a while.
It is also important to remember the garbage-in garbage-out principle in any kind of machine learning. It is human imagination that allows non-garbage data to be collected and fed into models. The wishful thinking that it is possible to collect all kinds of data and have unsupervised machine learning automatically solve all diseases is impractical. Even if it is made practical, it need not directly translate to alleviation of suffering from those diseases.
All of what the authors write is true — except for the optimism.
Thanks:
- MK Shahzad who shared the article (without comments) in Swades🇮🇳 WhatsApp group
- MD Madhusudan for the insight from long career