datagovindia is a wrapper around >100,000 APIs of the Government of India’s open data platform data.gov.in. Here is a small guide to take you through the package. Primarily,the functionality is centered around three aspects :
The package is now on CRAN, download using :
install.packages("datagovindia")
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("econabhishek/datagovindia")
Once you have the API key ready, and have chosen the API you want and have its index_name (vignette for more details) using the search functions in the package, you are ready to extract data from it.
The function get_api_data is really the powerhouse in this package which allows one to do things over and above a manually constructed API query can do by utilizing the data.frame structure of the underlying data. It allows the user to filter, sort, select variables and to decide how much of the data to extract. The website can itself filter on only one field with one value at a time but one command through the wrapper can make multiple requests and append the results from these requests at the same time.
But before we dive into data extraction, we first need to validate our API key relieved from data.gov.in. To get the key, you need to register first register and then get the key from your “My Account” page after logging in. More instruction can be found on this official guide. Once you get your API key, you can validate it as follows (only need to do this once per session, this is a sample key from the website for demonstration) :
##Using a sample key
register_api_key("579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b")
#> Connected to the internet
#> The server is online
#> The API key is valid and you won't have to set it again
Once you have your key registered, you are ready to extract data from a chosen API. Here is what each argument means :
In a nutshell, first find the API you want using the search functions, get the index_name of the API from the results, optionally take a look at the fields present in the data of the API and then use the get_api_data function to extract the data. Suppose we choose the API “Real time Air Quality Index from various location” with index_ name 3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69. First we will look at which fields are available to construct the right query.
Suppose We want to get the data from only 2 cities Chandigarh and Gurugram and pollutants PM10 and NO2. We will let all fields to be returned (dataset columns).
We now look at the fields available to play with.
get_api_fields("3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69")
id | name | type |
---|---|---|
document_id | document_id | double |
id | id | double |
country | country | keyword |
state | state | keyword |
city | city | keyword |
station | station | keyword |
pollutant_id | pollutant_id | keyword |
last_update | last_update | date |
pollutant_min | pollutant_min | double |
pollutant_max | pollutant_max | double |
pollutant_avg | pollutant_avg | double |
resource_uuid | resource_uuid | keyword |
We accordingly select the city and pollution_id fields for constructing our query. Note that we use only field id to finally query the data.
get_api_data(api_index="3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69",
results_per_req=10,filter_by=c(city="Gurugram,Chandigarh",
polutant_id="PM10,NO2"),
field_select=c(),
sort_by=c('state','city'))
#> Connected to the internet
#> The server is online
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Gurugram&filters[polutant_id]=PM10
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Chandigarh&filters[polutant_id]=PM10
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Gurugram&filters[polutant_id]=NO2
#> gave the API a rest
#> url-https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b&format=json&offset=0&limit=10&filters[city]=Chandigarh&filters[polutant_id]=NO2
#> gave the API a rest
#> No results returned - check your api_index
id | country | state | city | station | pollutant_id | last_update | pollutant_min | pollutant_max | pollutant_avg |
---|---|---|---|---|---|---|---|---|---|
550 | India | Haryana | Gurugram | NISE Gwal Pahari, Gurugram - IMD | PM10 | 25-09-2021 05:00:00 | 22 | 102 | 50 |
555 | India | Haryana | Gurugram | Sector-51, Gurugram - HSPCB | PM10 | 25-09-2021 05:00:00 | 59 | 119 | 81 |
562 | India | Haryana | Gurugram | Teri Gram, Gurugram - HSPCB | PM10 | 25-09-2021 05:00:00 | 36 | 100 | 61 |
103 | India | Chandigarh | Chandigarh | Sector 22, Chandigarh - CPCC | PM10 | 25-09-2021 05:00:00 | 13 | 102 | 49 |
110 | India | Chandigarh | Chandigarh | Sector-25, Chandigarh - CPCC | PM10 | 25-09-2021 05:00:00 | 19 | 84 | 42 |
551 | India | Haryana | Gurugram | NISE Gwal Pahari, Gurugram - IMD | NO2 | 25-09-2021 05:00:00 | 13 | 25 | 17 |
556 | India | Haryana | Gurugram | Sector-51, Gurugram - HSPCB | NO2 | 25-09-2021 05:00:00 | 8 | 13 | 10 |
563 | India | Haryana | Gurugram | Teri Gram, Gurugram - HSPCB | NO2 | 25-09-2021 05:00:00 | 8 | 10 | 8 |
569 | India | Haryana | Gurugram | Vikas Sadan, Gurugram - HSPCB | NO2 | 25-09-2021 05:00:00 | 17 | 40 | 28 |
104 | India | Chandigarh | Chandigarh | Sector 22, Chandigarh - CPCC | NO2 | 25-09-2021 05:00:00 | 15 | 83 | 42 |
111 | India | Chandigarh | Chandigarh | Sector-25, Chandigarh - CPCC | NO2 | 25-09-2021 05:00:00 | 4 | 29 | 13 |