A brief guide to using the nutriR package

Christopher Free

1. Installation

Let’s begin by loading the nutriR package and its dependencies (make sure the ‘tidyverse’ package is installed and loaded).

# Packages
library(nutriR)
#> Loading required package: tidyverse
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
#> ✓ tibble  3.1.5     ✓ dplyr   1.0.7
#> ✓ tidyr   1.1.4     ✓ stringr 1.4.0
#> ✓ readr   2.0.2     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
library(tidyverse)

2. Extracting habitual intake distributions

If you want to get the full dataset developed by Passarelli et al. (in prep), you can do either of the following:

nutriR::dists_full
nutriR::get_dists()

If you want to retrieve a subset of the data, you can filter the dataset yourself or you can use the ?get_dists() function to specify intake distributions for your countries, nutrients, and sex-age groups of interest. A few examples are provided below:

# All USA data
get_dists(isos="USA")

# All USA Vitamin B6/B12 data
get_dists(isos="USA", nutrients=c("Vitamin B6", "Vitamin B12"))

# All USA Vitamin B6/B12 data for women ages 20 to 40
get_dists(isos="USA", nutrients=c("Vitamin B6", "Vitamin B12"), sexes = "F", ages = 20:40)

In this vignette, we’re going to compare habitual iron intakes in the USA and Bangladesh and calculate the magnitude of diet shifts required to increase adequacy of iron intakes in both locations. To begin, we’ll use the ?get_dists() function to retrieve habitual intake distributions for both countries.

dists <- get_dists(isos=c("USA", "BGD"), nutrient="Iron")

3. Visualizing habitual intake distributions

To quickly plot and compare the habitual intake distributions in this dataset, you can use the ?plot_dists() function:

plot_dists(dists=dists)

From this quick visualization, we see a few things: (1) intakes are available for men and women in the US but are only available for women in Bangladesh; (2) habitual iron intakes strongly decrease with age in Bangladesh; and (3) men have higher but also more variable habitual intakes than women in the US.

The ?plot_dists() function provides a quick method for understanding the distributions but you might be interest in tweaking the plots for your own work. To create custom plots of the habitual intake distributions, you can either extract the parameters for the best distribution and generate the curves yourself, or you can use the ?generate_dists() function to build the curves and build your own plot. An example is shown below:

# Generate data for plotting
dist_data <- generate_dists(dists=dists)

# Create your own plot using this data
ggplot(dist_data %>% filter(country=="United States"), mapping=aes(x=intake, y=density, color=age_group)) +
  facet_wrap(~sex) +
  geom_line() +
  labs(x="Habitual intake (mg/day)", y="Density", title="Habitual iron intake in the United States") +
  scale_color_ordinal(name="Age group") +
  theme_bw()

4. Measuring properties of habitual intake distributions

The shape of the habitual nutrient intake distributions can have important implications for the health of a subnational population. Thus, we have included functions for calculating the following properties of habitual intake distributions:

An example is show below where we measure and compare values for 20-25 yr women in Bangladesh versus the United States. Both distributions are best described by a gamma distribution so we use the gamma distribution shape and rate parameters to describe the properties of the intake distributions.

# Get parameters for describing the habitual iron intake by 15-19 yr women in Bangladesh
bdg <- dists %>%
  filter(country=="Bangladesh" & sex=="Females" & age_group=="15-19")
bdg$best_dist
#> [1] "gamma"
shape_bdg <- bdg$g_shape
rate_bdg <- bdg$g_rate

# Get parameters for describing the habitual iron intake by 15-19 yr women in United States
usa <- dists %>%
  filter(country=="United States" & sex=="Females" & age_group=="15-19")
usa$best_dist
#> [1] "gamma"
shape_usa <- usa$g_shape
rate_usa <- usa$g_rate

# Compare means (~4 mg/day higher in USA)
mean_dist(shape=shape_bdg, rate=rate_bdg)
#> [1] 8.244971
mean_dist(shape=shape_usa, rate=rate_usa)
#> [1] 12.06376

# Compare variance
variance(shape=shape_bdg, rate=rate_bdg)
#> [1] 6.571496
variance(shape=shape_usa, rate=rate_usa)
#> [1] 15.50031

# Compare CV
cv(shape=shape_bdg, rate=rate_bdg)
#> [1] 0.3109159
cv(shape=shape_usa, rate=rate_usa)
#> [1] 0.326353

# Compare skewness
skewness(shape=shape_bdg, rate=rate_bdg)
#> [1] 0.6218319
skewness(shape=shape_usa, rate=rate_usa)
#> [1] 0.652706

# Compare kurtosis
kurtosis(shape=shape_bdg, rate=rate_bdg)
#> [1] 0.5800124
kurtosis(shape=shape_usa, rate=rate_usa)
#> [1] 0.6390378

We can also measure the similarity between the two distributions as the percent overlap between the distribution. The ?overlap() function calculates percent overlap using the equations for the Bhattacharyya coefficient.

overlap(dist1=list(shape=shape_usa, rate=rate_usa),
        dist2=list(shape=shape_bdg, rate=rate_bdg),
        plot=T)

#> [1] 83.8021

5. Quantifying prevalence of inadequate nutrient intakes

We can use Estimated Average Requirements (EARs) and the probability method (NRC 1986) to calculate the prevalence of inadequate nutrient intakes associated with each of these distributions. This measure of inadequate intake prevalence is often known as the summary exposure value (SEV).

Looking up dietary reference values

The first step to estimating the prevalence of inadequate intakes is to determine the appropriate reference value. To make this step easy, we provide two databases of dietary reference values directly inside the R package:

  1. Dietary reference intakes (DRIs) from NAS (2020): ?dris
  2. Nutrient reference values (NRVs) from Allen et al. (2020): ?nrvs

We can use these databases to look up the EAR for iron for 15-19-yr-old women. Because the EAR for iron depends on the level of bioavailabiliy, we will extract different EARs for the USA and Bangladesh.

We will extract the EAR for women in the U.S. using the NAS (2020) dataset which is specific to US women:

?dris
usa_women15_iron_ear <- dris %>%
  filter(dri_type=="Estimated Average Requirement (EAR)" & nutrient=="Iron" & sex_stage=="Women" & age_range=="14-18 yr") %>%
  pull(value)
usa_women15_iron_ear
#> [1] 7.9

We will extract the EAR for women in Bangladesh using the Allen et al. (2020) dataset which provide iron EARs based on bioavailability. In this case, we select the EAR for iron in populations with low bioavailability:

?nrvs
bgd_women15_iron_ear <- nrvs %>%
  filter(nrv_type=="Average requirement" & nutrient=="Iron (low absorption)" & stage=="Females" & age_group=="15-17 y") %>%
  pull(nrv)
bgd_women15_iron_ear
#> [1] 22.4

Because of the low bioavailability of iron in Bangladesh, we see a much higher EAR (22.4 mg/day) as compared to the U.S. (7.9 mg/day).

We remind the user that not every person in a population has the same recommended intake and that calculating the prevalence of inadequate intakes requires accounting for this variability. We recommend using a coefficient of variation (CV) of 0.25 for the EAR for Vitamin B12 and a CV of 0.10 for the EARs of all other nutrients (Renwick et al. 2004). However, in the calculations below, you’ll see that you can specify whatever CV you deem appropriate.

Performing the prevalence of inadequate intake calculation

We can calculate and compare inadequate iron intakes in the USA and Bangladesh using the ?sev() function as follows:

# Calculate prevalence of inadequate intakes in Bangladesh
sev(ear=bgd_women15_iron_ear, cv=0.1, shape=shape_bdg, rate=rate_bdg, plot=T)

#> [1] 99.97978

# Calculate prevalence of inadequate intakes in the US
sev(ear=usa_women15_iron_ear, cv=0.1, shape=shape_usa, rate=rate_usa, plot=T)

#> [1] 14.21439

We see that even though 15-19-year-old women in the U.S. and Bangladesh have similar habitual iron intakes (83% overlap), the significantly larger EAR needed for women in the Bangladesh results in a significantly higher rate of intake inadequacy.

6. Shifting distributions in response to an intervention

We can use the ?shift_dist() function to shift the mean of habitual intake distributions while maintaining their coefficient of variation (CV) to see how different interventions might change health outcomes.

For example, we can see how shifting the mean of the habitual intake for 15-19-yr-old women in Bangladesh BY 10 mg/day would effect intake adequacy:

# Shift distribution by 10 mg/day
dist_bgd_shifted <- shift_dist(shape=shape_bdg, rate=rate_bdg, by=10, plot=T)


# What are the health benefits? Decrease from 100% deficient to 77% deficient
sev(ear=bgd_women15_iron_ear, cv=0.1, shape=dist_bgd_shifted$shape, rate=dist_bgd_shifted$rate, plot=T)

#> [1] 76.78915

Similarly, we can also ask how shifting the mean of the habitual intake for 15-19-yr-old women in Bangladesh TO 25 mg/day would effect intake adequacy:

# Shift distribution to 25 mg/day
dist_bgd_shifted2 <- shift_dist(shape=shape_bdg, rate=rate_bdg, to=25, plot=T)


# What are the health benefits? Decrease from 100% deficient to 41% deficient
sev(ear=bgd_women15_iron_ear, cv=0.1, shape=dist_bgd_shifted2$shape, rate=dist_bgd_shifted2$rate, plot=T)

#> [1] 40.50349

7. Calculating the shift required to obtain a target level of nutrient inadequacy

We can use the ?shift_req() function to calculate the magnitude of the shift required to achieve a user-specified level of inadequate intake within a population. The function also provides the parameters for describing this new distribution.

For example, we can use this function to ask, how much does the habitual iron intake distribution for 15-19-yr-old women in the U.S. need to shift to have only a 5% prevalence of inadequate iron intakes:

# Shift required to achieve 5% prevealence of inadequate intakes in 15-19 yr women in United States
shift_req(ear=usa_women15_iron_ear, cv=0.1, target=5, shape=shape_usa, rate=rate_usa, plot=T)

#> $shape
#> [1] 9.389116
#> 
#> $rate
#> [1] 0.6197687
#> 
#> $mean
#> [1] 15.14939

The function outputs a list of the parameters describing the shifted distribution and provides a plot illustrating the impact of the shift. The mean would have to shift to 13.1 mg/day of iron to result in a 5% prevalence of inadequate intakes.

As another example, we can ask how much does the habitual iron intake distribution for 15-19-yr-old women in the U.S. need to shift to have a 25% prevalence of inadequate iron intakes:

# Shift required to achieve 25% prevealence of inadequate intakes in 15-19 yr women in Bangladesh
shift_req(ear=bgd_women15_iron_ear, cv=0.1, target=25, shape=shape_bdg, rate=rate_bdg, plot=T)

#> $shape
#> [1] 10.34461
#> 
#> $rate
#> [1] 0.3562877
#> 
#> $mean
#> [1] 29.03442

In this case, the mean would have to shift to 29 mg/day of iron to result in a 25% prevalence of inadequate intakes.

8. Citing the nutriR package

Please cite the R package functions as:

Please cite the data served in the R package as:

9. References

Allen, L.H., Carriquiry, A.L., Murphy, S.P. (2020) Perspective: proposed harmonized nutrient reference values for populations. Advances in Nutrition 11(3): 469-483.

Food and Nutrition Board, National Academy of Sciences, Institute of Medicine (2020). Dietary Reference intakes: Estimated Average requirements and recommended intakes. Accessed at https://www.nal.usda.gov/sites/default/files/fnic_uploads//recommended_intakes_individuals.pdf.

National Research Council (1986) Nutrient Adequacy: Assessment Using Food Consumption Surveys. Washington, DC: The National Academies Press. https://doi.org/10.17226/618.

Renwick AG, Flynn A, Fletcher RJ, Müller DJ, Tuijtelaars S, Verhagen H (2004) Risk-benefit analysis of micronutrients. Food and Chemical Toxicology 42(12): 1903-22. https://doi.org/10.1016/j.fct.2004.07.013