Absolutely. You can upload a dataset and run any code on it you like. Keep in mind, however, that the notebooks JSTOR has written may need to be modified if your dataset is not in the right format. Read more about our format.
We also recognize that folks may want to work with data in other formats. Maybe you're interested in Twitter data or emails or something that doesn't quite fit in our format. That's great. Our environment is open for uploading, so you're free to upload notebooks and data to the environment that differ from our format.
How can I get my data into the JSTOR format so they work with existing notebooks?
We offer an Advanced Research notebook called Tokenize Text Files with NLTK that should get you most of the way. To use the notebook, you will need:
The notebook will output:
- A JSON-L dataset containing the unigrams, bigrams, trigrams, full-text, and metadata for your entire dataset
- A gzip compressed version of the JSON-L file for easier file transfer
This notebook may require some modifications to work with your data, so we offer it as a starting for those interested. If you're not able to code Python at an intermediate or Advanced level, you may need some help to implement it.
You can then download the compressed version of your dataset to your local machine for safe-keeping. Keep in mind that any data created in your Jupyter session will disappear after you close the tab.
How do I get my dataset into
If you have a dataset on your local machine, you can upload your dataset into JupyterLab by clicking the upload button in the file pane on the left.
Make sure to upload your dataset to the "data" folder.