Why Learn Text Analysis? for Digital Humanities
You may have heard buzz on campus about text analysis, artificial intelligence, or big data. But if you’re a scholar, librarian, student, or staff member who has never used data in their research, it is not obvious why text analysis matters. Many disciplines have eschewed text analysis for decades
What Text Analysis Methods Should I Learn? for Digital Humanities
This introduction explains the various kinds of text analysis for a humanities audience. What are they? Why would you use them? How long will it take to apply them? (The methods presented here are among the most well-known but certainly not exhaustive.) Afterward, you'll be better-prepared to decide how much
Key Terms for Text Analysis
Application Programming Interface (API) A protocol that defines communication between a client and server, often used to request data. APIs can help retrieve data from remote repositories, anything from weather to Twitter and Facebook. Argument (in Python) An input that is passed into a function. For example, print('Hello World')
Can I use R?
Absolutely. Our Binder environment supports using R in Jupyter notebooks and RStudio. We have chosen to start developing Python Jupyter notebooks first, but we would love to see community-developed, open educational resources created in R. If you'd like to help us get started, see How can I create/adapt a
What is a Jupyter Notebook?
Jupyter notebooks are documents that contain both executable computer code (in a language like Python or R) and rich explanatory elements (i.e. text, images, videos, links). Jupyter notebooks are very popular for teaching and learning to code because they have significant advantages. Minimal Setup Traditional code editors may require
What is the data file format?
CSV vs. JSON Lines Files The dataset builder creates two files: A CSV file containing only metadata A JSON Lines file containing metadata and the textual data The textual data includes: Unigrams Bigrams Trigrams Full Text (where available) The metadata includes: Column Name Description id a unique item ID (In
How can I create/adapt a lesson?
We are excited to see the first open community-created lessons! Our top goal is to help create more open educational resources (OER) that make learning text analysis easier. The following information will help you create your own lessons (either from scratch or adapting one of our existing lessons for your
Datasets by Discipline
Archaeology American Journal of Archaeology (1897-2020) 02b8c5c7-64bd-efe3-01d8-88c9efe7d17c Classics Classical Quarterly (1907-2014) 82014740-8ed9-3c34-5716-d0879b8317f6 English Negro American Literature Forum (1967-1976) + Black American Literature Forum (1976-1991) + African American Review (1992-2016) b4668c50-a970-c4d7-eb2c-bb6d04313542 Shakespeare Quarterly (1950-2013) f6ae29d4-3a70-36ee-d601-20a8c0311273 ELH (1934-2014) 4999901a-fa17-31da-cfe5-2abf3a429df7 College English (1939-2016) a161f384-720b-b6bf-a0cc-4d7d3b857e1c PMLA (1889-2014) 1aea53b9-26d5-fe54-e35c-8259156ce6cd History Past & Present (1952-2014) 5e117960-e384-b705-b143-5a667fe614f0 English Historical
Can I Upload a Dataset?
You can download your dataset from the corpus builder in the link shown below. (You may also have a link to your dataset in your email.) If you wish, you can modify your dataset on your local machine before the next upload phase. This gives you some more flexibility than
Why Learn Text Analysis? for Business and Data Science
You may have heard buzz in the business world about text analysis, artificial intelligence, or big data. But if you’re skilling up to advance your career or to solve an existing company problem, it is not obvious why text analysis matters. Many businesses operate without data science insights from
What Text Analysis Methods Should I Learn? for Business and Data Science
This introduction explains the various kinds of text analysis methods for a business and data science audience. What are they? Why would you use them? How long will it take to apply them? (The methods presented here are among the most well-known but certainly not exhaustive.) Afterward, you'll be better-prepared
JSTOR License: Non-Consumptive (Unigrams, Bigrams, Trigrams) Size: ~13.8 million documents Historical Range: 1665-present, primarily 20-21st century Metadata Quality: High Text Accuracy: High Website: https://www.jstor.org JSTOR academic sources, primarily academic journal articles in the humanities, mathematics, sciences, and business. Browse by discipline Browse by publisher Portico License: