microData

What?

I am developing the microData package to search, browse, and extract metadata from microdata provided by the World Bank (WB), Food and Agriculture Organization (FAO), International Household Survey (IHSN), United Nations High Commissioner for Refugees (UNHCR), and International Labor Organization (ILO) via the NADA API. Any researcher who has used microdata from these organizations knows how difficult and time-consuming it is to understand and import these data and variables into R. If you are a user or plan to use micradata, then this is the life-saving R package for you.

Abstract

The purpose of microData is to simplify the process of extracting complex metadata from data provided by various organizations, thereby improving data preparation efficiency. At the moment, it supports five international organizations, namely the World Bank, FAO, UNHCR, IHSN, and ILO. It has the ability to search, filter, extract, and perform other tasks that you can do on the web, but it cannot download the data file itself. This is because, to my knowledge, there is currently no available documentation for use with the API. I think it is due to data license issue because there are few accessible datasets through the API. Furthermore, this package has the ability to assist in obtaining the names of variables from a specific survey, as well as their labels. It also allows you to select only variables that you are interested in and rename them, while assigning variable descriptions as label attributes. You can set custom names and labels for the dataset. Labels play a crucial role when exporting tables and graphs, as they save you from setting long names in manuscripts manually. Therefore, this package is available to alleviate all these difficulties.

Warning: Since this package is still under development, I don’t recommend you use it in reproducible code, as any changes can happen in the future.

Installation

You can install the development version of microData from GitHub with:

# install.packages("devtools")
devtools::install_github("GutUrago/microData")

Collection

All organizations supported by this package use the NADA API to publish micro-data, which makes use of similar terminologies. Collection simply means gathering multiple related studies or data sets. To see all available collections, you can use collections() function.

Note: I used customized gt table theme that I created in this blog.

library(microData)

collections(org = "wb") |> 
  head() |> 
  my_gt_theme()

id	repo_id	title
26	afrobarometer	Afrobarometer
2	datafirst	DataFirst , University of Cape Town, South Africa
22	dime	Development Impact Evaluation (DIME)
1	microdata_rg	Development Research Microdata
4	enterprise_surveys	Enterprise Surveys
30	fao	FAO - Food and Agriculture Microdata Catalog

Searching

This package gives all flexibility of searching on the web. For more see the documentation for search_catalog().

search_catalog(
  keyword = "food",
  org = "unhcr",
  from = 2015,
  to = 2024,
  country ="Ethiopia",
  sort_by = "year",
  sort_order = "desc", 
  results = 10)

There is also handy function to check latest publications of these datasets.

latest_entries(org = "wb", limit = 15)

You can use data_files to see the data files included in the study. Let’s see one of the popular survey on the WB. We can also use id number of the study, which is 3110 instead of the name (See next code chunk).

data_files(id = "TZA_1991_KHDS_v01_M", org = "wb") |> 
  head() |>
  my_gt_theme()

id	sid	file_id	file_name	description	case_count	var_count
81328	359	F1	Wave1_HH_S_____HH	Miscellaneous	981	163
81329	359	F2	Wave1_HH_S00B_OTH	Section verification	18258	16
81330	359	F3	Wave1_HH_S1___IND	Household Roster	5373	25
81331	359	F4	Wave1_HH_S2___KID	Children Residing Elsewhere	3394	28
81332	359	F5	Wave1_HH_S3___IND	Parents	5298	27
81333	359	F6	Wave1_HH_S4___BUS	Overview of Household Businesses	334	7

How about variables included in the data file? Of course you can check them as well.

variables(id = 359, file_id = "F3") |> 
  head() |> 
  my_gt_theme()

uid	sid	fid	vid	name	labl
265957	359	F3	V180	cluster	Cluster
265958	359	F3	V181	hh	Household Number
265959	359	F3	V182	id	Individual ID Code in HH
265960	359	F3	V183	wave	Wave
265961	359	F3	V184	passage	Passage
265962	359	F3	V185	sex	S1Q2: Sex

Setting Attributes

Variables in microdata are often named something that has nothing to do with the variable except question order like this.

id	v1	v2	v3	v4
1	44	male	master	6395.007
2	48	female	phd	7402.144
3	43	female	master	5496.753
4	32	female	phd	4200.946
5	39	male	master	5391.046
6	47	female	phd	7186.892

Then you can prepare another data that contains metadata like this. It will be explained in detail in vignettes later.

var_id	var_name	label
id	individual_id	Respondent ID
v1	age	Age of respondent
v2	sex	Sex of respondent
v3	education	Educational level
v4	salary	Monthly salay ($)

You can use set_attributes function to rename and set labels to these variables.

my_data <- set_attributes(
  mdt, 
  mtdt,
  old_name = var_id,
  new_name = var_name,
  label = label)

head(my_data) |> my_gt_theme()

individual_id	age	sex	education	salary
1	44	male	master	6395.007
2	48	female	phd	7402.144
3	43	female	master	5496.753
4	32	female	phd	4200.946
5	39	male	master	5391.046
6	47	female	phd	7186.892

labels are also assigned.

str(my_data)
#> 'data.frame':    100 obs. of  5 variables:
#>  $ individual_id: int  1 2 3 4 5 6 7 8 9 10 ...
#>   ..- attr(*, "label")= chr "Respondent ID"
#>  $ age          : int  44 48 43 32 39 47 40 34 49 43 ...
#>   ..- attr(*, "label")= chr "Age of respondent"
#>  $ sex          : Factor w/ 2 levels "female","male": 2 1 1 1 2 1 2 2 1 2 ...
#>   ..- attr(*, "label")= chr "Sex of respondent"
#>  $ education    : Factor w/ 3 levels "bachelor","master",..: 2 3 2 3 2 3 1 1 1 1 ...
#>   ..- attr(*, "label")= chr "Educational level"
#>  $ salary       : num  6395 7402 5497 4201 5391 ...
#>   ..- attr(*, "label")= chr "Monthly salay ($)"

More coming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
R		R
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
codecov.yml		codecov.yml
microData.Rproj		microData.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

microData

What?

Abstract

Installation

Collection

Searching

Setting Attributes

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

GutUrago/microdata

Folders and files

Latest commit

History

Repository files navigation

microData

What?

Abstract

Installation

Collection

Searching

Setting Attributes

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages