This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
*is the name of the dataset. Supported prefixes include
download_*()function should take 1 argument named
folder_path. It has 2 tasks, first it should check if the file is already downloaded. If it is already downloaded it should return
invisible(). If the file isn’t at the path it should download the file to said path.
process_*()function should take 2 arguments,
folder_pathdenotes the the path to the file returned by
name_pathis the path to where the polished data should live. Main point of
process_*()is to turn the downloaded file into a .rds file containing a tidy tibble.
dataset_*()function should wrap the
process_*()function to the named list
process_functionsin the file process_functions.R.
download_*()function to the named list
download_functionsin the file download_functions.R.
print_infolist in the info.R file.
dataset_*.Rto the @include tags in
What are the guidelines for adding datasets?
wordsfor column names.
For datasets that comes with a testing and training dataset. Let the user pick which one to retrieve with a
split argument similar to how
dataset_ag_news() is doing.