The Mystery of Data

By Jonathan D. Teubner ’09 M.A.R.

Data, we have been told, is the new oil.

In some ways that’s exactly right. Almost everything we rely on today—from Google Maps for finding your destination to the keycards you use to access your building—has become reliable because it is “powered” by data. The algorithms that determine the best route have learned that you should go left instead of right at the light because those before you went left and got there sooner than those who went right. In its most basic form, this process, called machine learning, is how a machine learns patterns in existing datasets to make predictions on unseen data points. In more advanced models, often called “deep learning,” the machine does its learning in relative independence from humans. But whether it’s machine learning or deep learning, none of this happens without data.

Theology offers particular analytical strengths and stamina—also self-criticism and humility—for facing the sheer messiness of whatever we are attempting to think about.

The analysis of large datasets is transforming how we see the world: it diagnoses diseases before symptoms emerge, predicts when and where environmental degradation is likely to occur, and identifies unseen systemic injustice in sentencing decisions. How and what we study won’t be the same: the 0’s and the 1’s whizzing in and around our machines and bodies are bringing light to previous perplexities and confusions. Indeed, the most bullish techno-futurists have claimed that “big data” will make the scientific method obsolete, as Chris Anderson, the former Wired editor, once (in)famously suggested.

Algorithmic Rhythms

Yet, as we know all too well, this vision of scientific and social progress is not quite as neat in reality. The advances of machine learning have made weapons more lethal, hate speech more virulent, and resource extraction more ruinous. Moreover, many algorithms reflect racial and gender-based biases because the algorithms are trained on insufficiently diverse and representative datasets. The world that big data has wrought is not the utopia envisioned by its boosters. To some extent, the analogy with oil is apt: it powers our new economic age while, in some cases, quickening the pace of environmental, political, and social decay.

For the last several years, I have had a front row seat on how the role data is playing out in our increasingly algorithmically paced life. The company I founded, FilterLabs.AI, manages one of the largest databases of conversational and behavioral data from Russia and other similarly closed and hard-to-reach countries: we capture what is said openly online and pair that with the economic and social statistics that can help our algorithms give a more contextualized analysis and identify contradictions and tensions between political rhetoric and real-life conditions. Through this we can track the flow of propaganda around the world for governments, journalists, and NGOs and the ways in which it blends, sometimes seamlessly, into the everyday discourse and life of people in far-flung locales.

Human All Too Human

In my experience, most of the moral risks for those in data analytics don’t initially appear as “moral” or even legal—generally speaking, it is pretty clear what is or should be out of bounds—but rather emerge in the ways we mistakenly treat our data as complete, when it is actually reductive, limited, and in some cases simply misleading. Data shares all the limitations of human beings who can only look at a problem or answer a question from a specific socially, politically, historically formed vantage point.

The challenge is not only to be constantly aware that behind the millions of data points we ingest every day are humans with complicated lives, often in political or social circumstances that provide very limited space for something we might call human agency or free will, but also to remember that it is still humans who are writing both the underlying code and the queries that provoke the response from the databases.

My way into this work was unusual, to say the least. Before I founded FilterLabs, I held a faculty position at the University of Virginia, where my main job was to manage a collaboration between the Department of Religious Studies and what was then the Institute of Data Science (it is now the School of Data Science). I was, for this project, the bridge between scholars of religion and data scientists.

One of our projects was to develop some natural language processing signals to predict which extremist religious groups were potentially violent and which were just peaceably fundamentalist. We used scholars of and researchers indigenous to the regions and faiths we were studying to identify relevant data, and how they might be machine-analyzed to bring about meaningful results. What became increasingly clear was that the scholars of religion and the data scientists had radically different approaches to what might be meant by the term “data.”

The (De)Construction of Data

All data is, in some sense, constructed—it is a version or measure of the phenomena but is not itself the phenomena. How and to what extent data represents a trend or phenomenon is highly contested—but not typically by practitioners of data science, who talk about data as something “given,” as straightforward facts that can be compressed, stored, encrypted, or hidden. The gulf between those thinking about the basic, unsettled, mysterious nature of data, and the practitioners who manipulate and mold the data, often without too much conceptual engagement, has the potential, we found, to obscure some fundamental questions about data’s relationship with the world.

Our projects at UVA carried on, and those at FilterLabs carry on, as nearly all data science projects do: without having to contend conceptually with what data is. Despite what the optimistic rhetoric of data science and AI suggests (and what many companies want you to think), this is possible because humans are still very much in the loop: data analysis is only as good as the questions we have when we query the data or initiate the exploration of the algorithm. It can be said of data analytics what the British historian and philosopher R.G. Collingwood said, when speaking about what he learned from participating in archaeological digs, that “one found nothing at all except in answer to a question … What one learnt depended not merely on what turned up in one’s trenches but also on what questions one was asking.”[1]And it is humans who have those questions, questions that are driven by desires, well- or mal-formed as they may be.

Here is the critical point where, for me, these questions of data analysis intersect with the practice of theology: to formulate questions of data requires that we make explicit both our assumptions and what we predict or hope will be the outcome. Whatever the endeavor, building theories and models should force us to clarify more precisely what is often merely implied.

Theology to the Rescue

This can be tedious, as much of systematic theology is tedious. But it ought to be done in all fields—including data analysis. Theology offers particular analytical strengths and stamina—also self-criticism and humility—for facing the sheer messiness of whatever we are attempting to think about and with. The practice of reflecting theoretically on a process so disparate and inscrutable as, say, prayer or charity, helps us come to terms with the nature and consequences of our efforts to pray, give, bless, care, and even to spot habits or stratagems that pervert practices and virtues in the pursuit of power and domination. Our sources for this are mysterious or obscure ancient texts (and some more historically proximate ones) as well as highly dynamic communities that try to stay accountable to each other for their practices. When done well, theological reflection keeps a vital pragmatism in mind: it knows that what it is theorizing about are beliefs and practices of actual communities.

Even when done well, our theological efforts to understand via observation, participation, reading, and contemplation miss as much as they capture, or perhaps more. It’s hard not to have some kind of intellectual humility when attempting to grasp all the dynamics of religious expression and practice and history, as any student of religion or theology well knows. A seasoned humility may be the most important thing theological inquiry can offer to the science of data.

Data analytics has, in recent years, had a hard time acknowledging its limitations. Part of this is due to what the market demands: we sell our data analytics on what can be known and discovered, not on data’s limitations. The market is also driven by a public conversation that seems to ascribe power and insight to data without acknowledging that much of that power and insight is still crafted by humans. This is why one of the greatest dangers lurking in our moment is ascribing too much power and potential to artificial intelligence.

Obtuse Abstractions

We are most likely to get into trouble—ethical, legal, financial—when we refuse or ignore the flawed human element, both in the development of artificial intelligence and in how we interpret its outputs. Writer-critic-farmer Wendell Berry once remarked that agribusiness executives “don’t imagine farms or farmers.”[2] As he saw it, managers immersed themselves in statistical knowledge that was remote from the actual work of planting and caring, watering, and harvesting. But the remoteness wasn’t just in the executive’s distance and insulation from the farmer: the executive himself became remote and isolated from his own product. Berry’s agriboss is the commercial equivalent of Dante’s Lucifer: frozen and inert, isolated from all that makes him vital.

But neither is it accurate to think that simply getting into the dirt with the farmer would provide some kind of final, definitive perspective. In fact, one would find, as many do in data analytics, that once you start peeling back the analytical layers and interacting with whatever counts as the data, more, not fewer questions arise. The mysteries of life and society don’t disappear once one gets sufficiently close to the ground, as it were: no, they typically multiply, and quickly. Indeed, what Augustine said in the Confessions about the nature of time could be said about data: “What is time? Provided no one asks me, I know. If I want to explain it to an enquirer, I do not know.”

Collective Corrective

It is, in one respect, truly terrifying that we have such a loose grasp on the fundamental inputs of artificial intelligence, especially as these emerging technologies seem to inform, govern, and, in some cases, control our lives today. But I have found that this very perplexity is what can (and should) remind us of the human element that is, in another respect, ineradicable. While philosophers of information are most certainly right when they argue that there are multiple meaningful concepts of data, it is, at least for the practitioner, helpful to remember that the sophistication of how we analyze, store, compress, encrypt, and hide data does not absolve us from understanding how data relates to life and appraising what machines will do with it.

Studying religion and theology has been helpful to me in sharpening this perspective, and I still find it hard to imagine doing the work that I do apart from that foundation. Yet all disciplines must continually confront the challenge of remaining in touch with these very human questions and problems. And staying close to the local communities one’s data is purporting to represent is, I suspect, about the only way we might keep the human use of data from dehumanizing us.

Jonathan D. Teubner ’09 M.A.R. is the Founder and CEO of FilterLabs.AI, a data analytics company based in Cambridge, MA, and is research faculty at Harvard’s Human Flourishing Program. He is the author of Prayer after Augustine: A Study in the Development of the Latin Tradition (Oxford University Press, 2018), which received the Manfred Lautenschlaeger Award for Theological Promise.

1. R.G. Collingwood, An Autobiography (Oxford University Press, 1939), pp. 24-5.

2. Wendell Berry, It All Turns on Affection: The Jefferson Lecture and Other Essays (Counterpoint, 2012), p. 16. Berry’s Jefferson Lecture can be viewed at the National Endowment for the Humanities site here.

Top Menu

Reflections

The Mystery of Data

Order/Subscribe

Download and Print

You are here

The Mystery of Data