You may have heard buzz on campus about text analysis, artificial intelligence, or big data. But if you’re a scholar, librarian, student, or staff member who has never used data in their research, it is not obvious why text analysis matters. Many disciplines have eschewed text analysis for decades (and we all know a few faculty members likely to continue for decades more), but there is no doubt that interest in the field is growing. Why should you learn text analysis when you could be doing traditional research right now?

The Future of Humanities Research Depends on Data Literacy

Twenty years ago, working with data was mostly reserved for the sciences and social sciences, but today it is clear that digital data has become the primary publishing and archival format for humankind. The ability to search and manipulate that record is what will allow future researchers to find important information. Historians of the future looking at our present moment will need to be able to analyze emails, text messages, websites, social media, and other digital files. How well can we really understand history and society if we are not prepared to search, read, and interpret digital records in the form of primary sources?

Data Skills are in Demand

​In a world where tenure-track positions in the humanities are the exception rather than the norm, humanities scholars need a flexible skillset that prepares them for working inside and outside the academy. Text analysis is a valuable skill. (There is a reason that Data Scientist has been chosen as the best job according to Glassdoor in 2016, 2017, 2018, and 2019.)

Text analysis is behind the auto-suggest on your phone; it filters the spam out of your email; and it helps suggest movies for you on Netflix. In this information age, with the onset of “big data,” text analysis helps us “read” and “interpret” more than is humanly possible. Text analysis skills are valuable in the academic world and the commercial sector.

Assessing the Social Effects of Data-Driven Decision-Making is a Humanities Problem

​Even if you don’t use text analysis for your own research, it is important to understand a little about how it works because text analysis already drives the way decisions are made in research, in business, and in government. Text analysis and machine algorithms are deciding what webpages you see, who gets a loan from the bank, and how politicians make policy decisions. The issues surrounding text analysis are humanist issues, not merely technical, but social, ethical, and legal.

Text analysis enables new research insights

​For researchers, the primary advantage that text and data mining offer is an ability to consider knowledge at non-human scales (both very big and very small). Text analysis can enable us to consider a million books across a hundred-dimensional space, revealing aspects of our records that are not obvious to human readers whether those aspects are imperceptibly small, diffused across centuries, or simply within records never read. What does that mean in practice though?

The short answer is more evidence (and more kinds of evidence) for interrogating humanities problems. Dan Cohen, Professor of History at Northeastern University, asks, 'how much evidence is enough?':

Many humanities scholars have been satisfied, perhaps unconsciously, with the use of a limited number of cases or examples to prove a thesis. Shouldn’t we ask, like the Victorians, what can we do to be most certain about a theory or interpretation? If we use intuition based on close reading, for instance, is that enough?
Should we be worrying that our scholarship might be anecdotally correct but comprehensively wrong? Is 1 or 10 or 100 or 1000 books an adequate sample to know the Victorians? What might we do with all of Victorian literature—not a sample, or a few canonical texts, as in Houghton’s work, but all of it. "Searching for the Victorians" (2010)

To operate as a researcher in the 21st century is to be confronted with the challenges and opportunities of data—at once both being overwhelmed by too much and yet not nearly enough of the right kind.

As Miriam Posner, Assistant Professor in Information Studies at UCLA, has pointed out,

even if they don’t call their sources data, traditional humanists do have pretty pressing data-management needs. ("Humanities Data: A Necessary Contradiction" 2015)

Tom Scheinfeldt, Associate Professor of Digital Humanities at University of Connecticut, suggests that data concerns are becoming the primary concern of the humanities:

The new technology of the Internet has shifted the work of a rapidly growing number of scholars away from thinking big thoughts to forging new tools, methods, materials, techniques, and modes or work that will enable us to harness the still unwieldy, but obviously game-changing, information technologies now sitting on our desktops and in our pockets. These concerns touch all scholars. "Sunset for Ideology, Sunrise for Methodology" (2008)

Indeed, humanists cannot afford to ignore computational methods since they are, for better or worse, at the heart of modern culture and industry. Future humanists will not be able to study our digital present without becoming adept at reading and manipulating the burgeoning data of our historical record. Ted Underwood, Associate Professor in Information Science at The University of Illinois, describes this new horizon:

It is becoming clear that we have narrated literary history as a sequence of discrete movements and periods because chunks of that size are about as much of the past as a single person could remember and discuss at one time. Apparently, longer arcs of change have been hidden from us by their sheer scale—just as you can drive across a continent noticing mountains and political boundaries but never the curvature of the earth. A single pair of eyes at ground level can't grasp the curve of the horizon, and arguments limited by a single reader's memory can't reveal the largest patterns organizing literary history.
Distant Horizons: Digital Evidence and Literary Change (2019)

For many scholars, text analysis sounds potentially powerful and useful, but the reality remains that learning text analysis is not a trivial task. Most learners are forced to learn on their own since there are few college courses in this work outside of data science courses. That means most learners lack access to the proper learning resources and struggle to find community support when things get difficult.

The good news is that text analysis, like any skill, can be learned to a greater or lesser degree. For historians to study the early modern period, it is very helpful to have a command of Latin. Still, there are plenty of successful early modern scholars that never learn the language (or learn enough to navigate the resources significant to their research).

Depending on your research question, you may not need to learn much coding to do text analysis. The problem for many scholars is the possible applications for text analysis are not clear, so they are not in a good position to decide what to learn (and how much). At the same time, the sophistication needed for doing text analysis is a moving target. Topic modeling was once a very complicated task, requiring an understanding of the command line. Today, it can be accomplished in minutes using just a mouse.