In the article titled Johann Gregor Mendel: the victory of statistics over human imagination, the authors argue that big data analysis using machine learning, deep learning, et al can identify patterns that human beings can’t and hence these become “an important tool for developing more effective therapeutic strategies for complex diseases”.
Let us examine whether the claim that data driven science is going to take center stage is true. Let us also examine whether it is worth the hype.
How the nature of science changes with technological advancements
M D Madhusudan once narrated to me the evolution of how wildlife conservation is practiced.
When he was entering the field, people would travel to forests, look at paw marks, note them down in notebooks, and do analysis with pen and paper.
A little while later, camera traps and spreadsheet software entered the scene. The number of data points generated increased, and the kind of analysis that could be performed using GIS software and the like increased too.
Today drone photography and image recognition generates millions of data points, and much of the analysis is through modeling, machine learning, and deep learning.
It would be inappropriate to not use technology that’s available to us to generate more rigorous science.
The limits of resolution
Understanding problems with very high precision need not give us ways to solve them. This is why the victory of statistics ends often in diagnosis.
For example, take sickle cell disease. We know the exact mutation that causes it since 1956 (70 years ago). But severe anemia due to sickle cell disease and related morbidity and mortality is routine.
While the precision that’s enabled by data analysis is real, it is not always important.
It’s like having bigger and bigger microscopes. Yes, that allows us to see more and more clearly. But the benefits plateau after a while.
It is also important to remember the garbage-in garbage-out principle in any kind of machine learning. It is human imagination that allows non-garbage data to be collected and fed into models. The wishful thinking that it is possible to collect all kinds of data and have unsupervised machine learning automatically solve all diseases is impractical. Even if it is made practical, it need not directly translate to alleviation of suffering from those diseases.
All of what the authors write is true — except for the optimism.
Thanks:
- MK Shahzad who shared the article (without comments) in Swades🇮🇳 WhatsApp group
- MD Madhusudan for the insight from long career
Subscribe to my Substack Newsletter
As you might have noticed, social media these days prioritize engagement and doesn’t really let people build relationship with an audience. One way to take back some control is to directly be connected to your favorite authors, artists, etc through e-mail. And that’s what subscribing to my newsletter is about, too. I’m intentionally staying away from substack (which is susceptible to enshittification), and self-hosting the newsletter. What that means is also that the mails could land in spam. So, please press “Not spam” if they do end up in spam. Thank you!
You can fill your details below
↓