Intermediate Lessons

Exploring Metadata and Pre-Processing

Description of methods in this notebook: This notebook shows how to explore and pre-process the metadata of a dataset using Pandas. The following processes are described: Importing a CSV file containing the metadata for a given dataset ID Creating a Pandas dataframe to view the metadata Pre-processing your dataset by

Exploring Word Frequencies

Description: This notebook shows how to find the most common words in a dataset. The following processes are described: Using the tdm_client to create a Pandas DataFrame Filtering based on a pre-processed ID list Filtering based on a stop words list Using a Counter() object to get the most

Finding Significant Words using TF/IDF

Description: This notebook shows how to discover significant words. The method for finding significant terms is tf-idf. The following processes are described: An educational overview of TF-IDF, including how it is calculated Using the tdm_client to retrieve a dataset Filtering based on a pre-processed ID list Filtering based on

LDA Topic Modeling

Description: This notebook demonstrates how to do topic modeling. The following processes are described: Using the tdm_client to retrieve a dataset Filtering based on a pre-processed ID list Filtering based on a stop words list Cleaning the tokens in the dataset Creating a gensim dictionary Creating a gensim bag

Working with Dataset Files

Description: This notebook describes how to: Read and write files (.txt, .csv, .json) Use the tdm_client to read in metadata Use the tdm_client to read in data This notebook describes how to read and write text, CSV, and JSON files using Python. Additionally, it explains how the tdm_

Pandas I

Description: This notebook describes how to: Create a Pandas Series or DataFrame Accessing data rows, columns, elements using .loc and .iloc Creating filters using boolean operators Changing data in rows, columns, and elements This is the first notebook in a series on learning to use Pandas. Use Case: For Learners

Creating a Stopwords List

Description: This notebook explains what a stopwords list is and how to create one. The following processes are described: Loading the NLTK stopwords list Modifying the stopwords list in Python Saving a stopwords list to a .csv file Loading a stopwords list from a .csv file Use Case: For Learners

Counter Objects

Description: This notebook describes: What a Counter object is The difference between counters and dictionaries Using Counter objects for finding the most common elements Use Case: For Learners (Detailed explanation, not ideal for researchers) Difficulty: Intermediate Completion Time: 20 minutes Knowledge Required: Python Basics Series (Start Python Basics I) Knowledge

Join the community

Join our email list for information about new content, lessons, features, and webinars.

You've successfully subscribed to Digital Scholar Workbench
Great! Next, complete checkout for full access to Digital Scholar Workbench
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.