Open this Research Notebook ->
Take me to the Learning Version of this notebook ->
Description:
This notebook finds the word frequencies for a dataset. Optionally, this notebook can take the following inputs:
- Filtering based on a pre-processed ID list
- Filtering based on a stop words list
Use Case: For Researchers (Mostly code without explanation, not ideal for learners)
Difficulty: Intermediate
Completion time: 5-10 minutes
Knowledge Required:
- Python Basics (Start Python Basics I)
Knowledge Recommended:
Data Format: JSON Lines (.jsonl)
Libraries Used:
- tdm_client to collect, unzip, and read our dataset
- NLTK to help clean up our dataset
- Counter from Collections to help sum up our word frequencies
Research Pipeline:
- Build a dataset
- Create a "Pre-Processing CSV" with Exploring Metadata (Optional)
- Create a "Custom Stopwords List" with Creating a Stopwords List (Optional)
- Create the word frequencies analysis with this notebook