We are excited to see the first open community-created lessons! Our top goal is to help create more open educational resources (OER) that make learning text analysis easier. The following information will help you create your own lessons (either from scratch or adapting one of our existing lessons for your teaching). There are a growing body of new lessons being developed for digital humanities, data science, and library science (particularly collections as data).

Datasets for Learning

We encourage lesson writers to use our dataset builder, since it makes ingesting datasets easy for learners. They only need to insert a dataset ID and the tdm_client module pulls in their dataset automatically. If possible, we encourage

The easiest datasets for writing lessons are those created by our dataset builder that already have dataset IDs. You can also find example datasets by discipline and explore datasets created by others that may fit your lessons.

Datasets for Research

Researchers may want to bring in their own datasets through an existing API. In the future, we would like to develop notebooks that can help researchers migrate their data to work with our existing notebooks. Read more about our standard dataset format if you would like to try migrating your data into the format. Having a dataset in this format will allow you to apply any of the methods in our existing notebooks (such as TF-IDF, Topic Modeling, etc.). Otherwise, you'll need to write your own notebook for the analysis or adapt an existing one.

Lesson Hosting

To create your own lessons, you'll need to save your own version of the lesson files (such as Jupyter noteboook .ipynb files) to a git repository like GitHub or GitLab. If you're not familiar with git, it is a system for saving versions of computer code. We recommend learning git through the curriculum from the Digital Humanities Research Institutes. You'll want to finish these two lessons:

There is also a desktop application for GitHub that does not require the command line. It is called GitHub Desktop.

We recommend cloning our lesson repository and then modifying it to your own needs. You can clone the repository with the following command:

git clone https://github.com/ithaka/tdm-notebooks

The repository does not (as of June 2020) include the tdm_client that downloads datasets. In the near future, this will be included by default, but for now you can download it from here:

https://gist.github.com/lawlesst/9ccb340f15c1aab6846983738cffb4cc#file-tdm_client-py

and place it in your repository:

/tdm-notebooks/tdm_client.py

Depending on the libraries your notebook will use, you may need to include some additional information in your repository that describes the build parameters (such as a requirements.txt). The official Binder website has more information on build requirements including repository samples for Python and R.

Running Your Lesson

To launch your lesson/notebook from your own repository, simply fill in your:

  • GitHub repository name or URL
  • Git branch, tag, or commit
  • Path to a notebook file (optional)

on the main binder page at: https://binder.tdm-pilot.org/

Press "launch" to start the lesson immediately or copy the link to share with others.

Sharing Your Lesson

We are still working on the best process for sharing lessons in a community space. Please do reach out and share your repositories so we can include them in a community space for Open Educational Resources. Contact Nathan Kelber (nathan.kelber@ithaka.org) if you'd like to share educational materials. Anything submitted should have an open form of licensing suitable for Open Educational Resources such as a Creative Commons license.