Research Notebooks

Finding Significant Terms for Research

Open This Research Notebook -> Take me to the Learning Version of this notebook -> Description: Discover the significant words in a corpus using Gensim TF-IDF. The following code is included: Filtering based on a pre-processed ID list Filtering based on a stop words list Token cleaning Computing

Create a Stopwords List for Research

Open this Research Notebook ->.ipynb Take me to the Learning Version of this notebook -> Description: This notebook creates a stopwords list and exports it into a CSV file. The following processes are described: Loading the NLTK stopwords list Modifying the stopwords list in Python Saving a

Exploring Metadata and Pre-Processing for Research

Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description of methods in this notebook: This notebook helps researchers generate a list of IDs and export them into a CSV file. The code below is a starting point for: Importing a CSV

Exploring Word Frequencies for Research

Open this Research Notebook -> Take me to the Learning Version of this notebook -> Description: This notebook finds the word frequencies for a dataset. Optionally, this notebook can take the following inputs: Filtering based on a pre-processed ID list Filtering based on a stop words list Use

Tokenize Text Files with NLTK for Research

Open this Research Notebook -> Description: This notebook takes as input: Plain text files (.txt) in a zipped folder called 'texts' in the data folder Metadata CSV file called 'metadata.csv' in the data folder (optional) and outputs a single JSON-L file containing the unigrams, bigrams, trigrams, full-text, and

Join the community

Join our email list for information about new content, lessons, features, and webinars.

You've successfully subscribed to Constellate
Great! Next, complete checkout for full access to Constellate
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.