Getting Started with datagovindia

datagovindia is a wrapper around >80,000 APIS of the Government of India’s open data platform data.gov.in. Here is a small guide to take you thorugh the package. Primarily,the functionality is centered around three aspects :

Setup

library(datagovindia)

API Discovery

The APIs from the portal are scraped every week to update a list of all APIs and the information attached to them like sector, source, field names etc. The website data.gov.in provides a search functionality through string searches and drop down menus but these are very limited. The functions in this package allows one to have more robust string based searches.
A user can search by API title, description, organization type, organization (ministry), sector and sources. Briefly there are two types of functions here, the first lets the user get a list of all available and unique organization type, organization (ministry), sector and sources and the other lets one “search” by these criteria and more.

Here is a demonstration of the former (getting only the first few values)

###List of organizations (or ministries)
get_list_of_organizations() %>% 
  head
#> [1] "Ministry of Environment and Forests"             
#> [2] "Central Pollution Control Board"                 
#> [3] "Ministry of Home Affairs"                        
#> [4] "Department of Home"                              
#> [5] "Registrar General and Census Commissioner, India"
#> [6] "Ministry of Agriculture and Farmers Welfare"
###List of sectors 
get_list_of_sectors() %>% 
  head
#> [1] "Industrial Air Pollution" "Census and Surveys"      
#> [3] "Census"                   "Statistics"              
#> [5] "Agriculture"              "Agricultural Marketing"

Searching for the right API

Once you have an idea about what you want to look for in the API, search queries can be constructed using titles, descriptions as well as the categories explored earlier. A data.frame with information of APIs matching the search keywords is returned. Multiple search functions can be applied over each other utilising the data.frame structure of the result.

##Single Criteria
search_api_by_title(title_contains = "pollution") %>% head(2)
index_name title description org_type org sector source created_date updated_date
583f10fa-a19e-4a08-85f1-69dcf64438f4 Details of Number of industries inspected and Directions issued under Section 5 of Environment (Protection) Act, 1986 by Central Pollution Control Board (CPCB) since 2016-17 till 14.06.2019 (From: Ministry of Environment, Forest and Climate Change) Details of Number of industries inspected and Directions issued under Section 5 of Environment (Protection) Act, 1986 by Central Pollution Control Board (CPCB) since 2016-17 till 14.06.2019 (From: Ministry of Environment, Forest and Climate Change) Central Rajya Sabha All data.gov.in 2021-03-04T06:52:31Z 2021-03-12T17:56:27Z
b8e4ff80-ec3c-439c-aebb-f27eabe410b3 State/UT-wise Number of Complying and Non-Complying Locations w.r.t. Heavy Metals According Central Pollution Control Board (CPCB) during 2017 (From : Ministry of Environment, Forest and Climate Change) State/UT-wise Number of Complying and Non-Complying Locations w.r.t. Heavy Metals According Central Pollution Control Board (CPCB) during 2017 (From : Ministry of Environment, Forest and Climate Change) Central Rajya Sabha All data.gov.in 2021-03-04T06:37:26Z 2021-03-04T06:37:26Z
##Multiple Criteria
dplyr::intersect(search_api_by_title(title_contains = "pollution"),
                 search_api_by_organization(organization_name_contains = "pollution"))
index_name title description org_type org sector source created_date updated_date
0579cf1f-7e3b-4b15-b29a-87cf7b7c7a08 Details of Comprehensive Environmental Pollution Index (CEPI) Scores and Status of Moratorium in Critically Polluted Areas (CPAs) in India NA Central Ministry of Environment and Forests|Central Pollution Control Board Industrial Air Pollution|Water Quality|Natural Resources|Environment and Forest data.gov.in 2017-06-08T16:36:24Z 2018-11-30T02:35:16Z

Once you have found the right API for your use, take a a note of the “index_name” of that API, for example, “0579cf1f-7e3b-4b15-b29a-87cf7b7c7a08” corresponds to the API for “Details of Comprehensive Environmental Pollution Index (CEPI) Scores and Status of Moratorium in Critically Polluted Areas (CPAs) in India”. index_name will be essential for both getting to know more about the API or to even get data from it.