Introducing censusr

Here at Transport Foundry we regularly use data from the U.S. Census Bureau to validate our input data and our simulation engine outputs. To make this task easier for us, we wrote an R package that downloads data from the U.S. Census Bureau directly into a user's R environment. We have published this package – censusr – on CRAN under an open-source license. We hope others can use it to streamline their analyses. Contributions to the source code are welcome on GitHub.

Use

The package works by sending a list of requested variables and a list of geographies. The call below requests the number of households owning 0, 1, 2, 3, or 4 or more vehicles in Wake County, North Carolina (geoid = 37183). We specify that we want this table for 2012 5-year summary level.

library(censusr)
call_census_api(
  paste0("B08201_", sprintf("%03d", 2:6),  "E"),
  names = paste0("est_", c(0:4)), 
  geoids = "37183",  
  data_source = "acs", year =  2012, period = 5) 
## Source: local data frame [1 x 6]
## 
##   geoid  est_0   est_1  est_2  est_3  est_4
##   (chr)  (dbl)   (dbl)  (dbl)  (dbl)  (dbl)
## 1 37183  15813  111992 149742  47222  16534

We can use the allgeos argument to say that we actually want these variables for all census tracts within Wake County.

call_census_api(
  paste0("B08201_", sprintf("%03d", 2:6),  "E"),
  names = paste0("est_", c(0:4)), 
  geoids = "37183",  allgeos = "tr",
  data_source = "acs", year =  2012, period = 5) 
## Source: local data frame [187 x 6]
## 
##          geoid est_0 est_1 est_2 est_3 est_4
##          (chr) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1  37183050100   248   516   310    37     0
## 2  37183050300   293   826   489    51    19
## 3  37183050400    44   369   328    23     9
## 4  37183050500   181   885   436    87    30
## 5  37183050600   289   600   209    69    19
## 6  37183050700   503   584   218   118     0
## 7  37183050800   359   227   162    74     0
## 8  37183050900   442   249    80     3     0
## 9  37183051000   202   543   329    68    28
## 10 37183051101   149   201   208    54    64
## ..         ...   ...   ...   ...   ...   ...

If we want the margins of error on this table instead of the estimates, we can change the variable to call the M type instead of the E type.

call_census_api(
  paste0("B08201_", sprintf("%03d", 2:6),  "M"),
  names = paste0("moe_", c(0:4)), 
  geoids = "37183",  allgeos = "tr",
  data_source = "acs", year =  2012, period = 5) 
## Source: local data frame [187 x 6]
## 
##          geoid moe_0 moe_1 moe_2 moe_3 moe_4
##          (chr) (dbl) (dbl) (dbl) (dbl) (dbl)
## 1  37183050100    87   169    81    52    13
## 2  37183050300   101   163   106    53    22
## 3  37183050400    25    80    62    21    13
## 4  37183050500    75   143    98    54    29
## 5  37183050600    95   112    76    47    14
## 6  37183050700   109    97    82    61    13
## 7  37183050800    94    85    73    58    13
## 8  37183050900    91    78    43     6    13
## 9  37183051000    96   117    99    61    44
## 10 37183051101    78    75    80    83    79
## ..         ...   ...   ...   ...   ...   ...

For a list of variable codes, see the U.S. Census Bureau API page. For a tutorial on how to setup the censusr package with an API key, see the package vignette.