JSTOR

License: Non-Consumptive (Unigrams, Bigrams, Trigrams)
Size: ~13.8 million documents
Historical Range: 1665-present, primarily 20-21st century
Metadata Quality: High
Text Accuracy: High
Website: https://www.jstor.org

JSTOR academic sources, primarily academic journal articles in the humanities, mathematics, sciences, and business.

Portico

License: Non-Consumptive (Unigrams, Bigrams, Trigrams)
Size: ~12.5 million documents
Historical Range: primarily 20-21st century
Metadata Quality: Variable
Text Accuracy: High
Website: https://www.portico.org/

A community-supported archive of e-journals, e-books, and digital collections.

Chronicling America

License: Open Access (Full Text)
Size: ~1.1 million documents
Historical Range: 1789-1963
Metadata Quality: High
Text Accuracy: Variable
Website: https://chroniclingamerica.loc.gov/

A collection of historic American newspapers from the Library of Congress.

Cord-19

License: Open Access (Full Text)
Size: ~8000 documents
Historical Range: 1970s-present
Metadata Quality: High
Text Accuracy: High
Website: https://www.semanticscholar.org/cord19/

The commercial use subset of the Cord-19 dataset from March 25th, 2020. This dataset is primarily peer-reviewed, academic journal articles focused on Covid-19 research such as those found in bioRxiv, medRxiv, and others.