--- title: "What is different from Stata's mdepriv?" output: rmarkdown::html_vignette: keep_md: true self_contained: no vignette: > %\VignetteIndexEntry{what_is_different_from_stata} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` ```{r setup, echo=FALSE} # install.packages("mdepriv") library(mdepriv) ``` The `mdepriv` function is an adaptation in **_R_** of a homonymous user-written **_Stata_** command [(Pi Alperin & Van Kerm, 2009)](http://medim.ceps.lu/stata/mdepriv_v3.pdf) for computing basic synthetic scores of multiple deprivation from unidimensional indicators and/or basic items of deprivation. To facilitate orientation and usage of `mdepriv`, this **_R_** implementation follows the **_Stata_** features as closely as possible. There are only a small number of differences: * The options for the second factor of the double weighting schemes differ formally, but with virtually no practical consequences: + In **_R_** as in **_Stata_**, if the second weighting factor is set to **_mixed_**, the correlation type for each pair of items is automatically determined by the following rules: + **_pearson_**: both items have > 10 distinct values. + **_polyserial_**: one item has $\le$ 10, the other > 10 distinct values. + **_polychoric_**: both items have $\le$ 10 distinct values. + In **_R_** **_tetrachoric_**, the appropriate correlation type for pairs of binary items, is not available as the second weighting factor. This is so because the **_R_** function [`weightedCorr`](https://CRAN.R-project.org/package=wCorr), on which the calculation of the second factor relies, treats **_tetrachoric_** correlations as **_polychoric_**. The different handling of **_tetrachoric_** correlations in **_R_** and **_Stata_** causes minuscule differences in the weights in models that include more than one binary item. + In **_Stata_**, if **_polychoric_** is forced on (partly) continuous pairs of items, it switches under the hood to **_mixed_**. Thus **_polychoric_** in **_Stata_** is pointless as an enforecable option. `mdepriv` in **_R_** does not reproduce this spurious option. + **_pearson_** is the only really enforceable correlation type in **_Stata_** and, therefore, it is maintained as such in **_R_**. + **_diagonal_**, in both **_Stata_** and **_R_**, sets all off-diagonal elements to zero, making **_wb_** independent of any item correlations. ```{r echo=FALSE, Fig_wa_wb_R, fig.height=7/2.54, fig.width=18/2.54} wa_wb_combi <- function(wa = c("cz", "ds", "bv", "equal"), wb = c("mixed", "pearson", "diagonal"), xlim = c(-1.25, 11), ylim = c(-0.5, 4.5), col_double = "cornsilk", col_single = "mistyrose", col_method_wa = "lightcyan", col_wb = "#C1FFC1A6", # rgb(t(col2rgb("darkseagreen1"))/255, alpha = 0.65) col_bv_corr_type = "seagreen", string_bv_corr_type = "bv_corr_type", options = "argument options", legend = TRUE) { wa <- factor(wa, wa) wb <- factor(wb, wb) combi <- merge(wa, wb) names(combi) <- c("wa", "wb") combi$method <- (combi$wa != "bv" & combi$wb == "diagonal") | (combi$wa == "bv" & combi$wb != "diagonal") plot(0, 0, type = "n", xlim = xlim, ylim = rev(ylim), asp = 1, axes = F, ann = F) x_mar <- 0.5 xleft <- x_mar xright <- length(wa) + x_mar y_mar <- x_mar ybottom <- length(wb) + y_mar ybreak <- length(wb) - 1 + y_mar ytop <- y_mar rect(xleft, ybreak, xright, ytop, col = col_double, border = NA) rect(xleft, ybottom, xright, ybreak, col = col_single, border = NA) points(as.numeric(wb) ~ as.numeric(wa), data = combi, pch = ifelse(combi$method, 16, 1), cex = 3 ) rect(xleft, ytop - 0.5 * y_mar, xright, ytop - 1.5 * y_mar, col = col_method_wa, border = NA) rect(xleft - 4 * x_mar, ytop, xleft - 0.5 * x_mar, ybottom, col = col_wb, border = NA) rect(xleft - 3.75 * x_mar, ytop + 0.25 * y_mar, xleft - 0.75 * x_mar, ybreak, border = col_bv_corr_type, lwd = 2) text(as.numeric(wa), 0, wa, adj = c(0.5, 0.5)) text(-1.2, as.numeric(wb), wb, adj = 0) if (legend) { legend(xright + x_mar, ytop - 1.5 * y_mar, c( "method option", "only wa & wb option", "double\nweighting schemes", "effective single\nweighting schemes" ), pch = c(16, 1, 15, 15), pt.cex = c(3, 3, 4.5, 4.5), col = c("black", "black", "cornsilk", "mistyrose"), bty = "n", y.intersp = 2, x.intersp = 2 ) text(xright + x_mar, ytop - 1.5 * y_mar, adj = c(0, 1), "wa-wb-combinations", font = 3) legend(8.5, ytop - 1.5 * y_mar, c( "method (default: cz)\nor wa", paste0(string_bv_corr_type, "\n(default: mixed)"), "wb" ), pch = c(15, 0, 15), pt.cex = 4.5, col = c("lightcyan", "seagreen", col_wb), bty = "n", y.intersp = 2, x.intersp = 2, xpd = TRUE ) text(8.5, ytop - 1.5 * y_mar, adj = c(0, 1), options, font = 3) } } par(mar = c(0, 0, 1, 5.5), oma = c(0.5, 0.5, 0.5, 0), cex = 0.8) wa_wb_combi() title("mdepriv in R: possible weighthing schemes", line = 0, adj = 0) ``` ```{r echo=FALSE, Fig_wa_wb_Stata, fig.height=10/2.54, fig.width=18/2.54} par(mar = c(1, 0, 0, 2.5), oma = c(0.5, 0.5, 0.5, 0), cex = 0.8) wa_wb_combi( wb = c("mixed", "pearson", "tetrachoric", "polychoric", "diagonal"), string_bv_corr_type = " bv sub-options corr. type", options = "options" ) title("mdepriv in Stata: possible weighthing schemes", line = -2, adj = 0) ``` * `mdepriv` in **_R_** admits both non-integer and integer sampling weights for all **_method_**s. **_mdepriv_** in **_Stata_** admits integer frequency weights for all **_method_**s, as well as non-integer analytic weights for **_method_**s without double-weighting (which include **_method_** = **_cz_**, **_ds_** or **_equal_**). * The option **_force_** allowing calculations in **_Stata_**, even if items are not limited to the [0, 1] range, is not implemented; such item sets produce invalid aggregate deprivation statistics. In **_R_** therefore, in preparation, any item with values on [0, max], where max > 1, has to be transformed. For more detailed information on suitable transformations have a look at the section 'Details' on `mepriv`'s help page. ```{r eval=TRUE} help("mdepriv") ``` * Differently from **_Stata_**, in **_R_** observations with missing item values have to be removed in preparation. Rationale and code can be found under section [**_Handling Missing Values_**](./mdepriv_get_started.html#handling-missing-values) in the vignette **_Get Started with `mdepriv`_**. * Models with double-weighting work with an internal parameter known as **_rhoH_**. **_rhoH_** is determined by the central point in the largest gap in the ordered sequence of distinct correlation coefficients between all item / indicator pairs. As such, by default, **_rhoH_** is a data-driven quantity. The user has the option to set a value for **_rhoH_**; this is rarely called for, except when a constant **_rhoH_** is desired for the comparison of several such models. The implementation between **_Stata_** and **_R_** differs: + By default, **_Stata_** uses -2 as starting value for the computation of **_rhoH_**. In **_R_**, the default value is NA, causing `mdepriv` to calculate the data-driven value in models with double-weighting, or else leave it as NA. + Optional values in **_Stata_** must fall in the interval [-$\infty$, +1]. In **_R_**, they are limited to [-1,+1], the range of correlation coefficients. * The **_Stata_** option **_vec_** for passing user-defined weights to items is called **_user_def_weights_** for a more intuitive argument in **_R_**. ## References Pi Alperin, M. N. and Van Kerm, P. (2009), 'mdepriv - Synthetic indicators of multiple deprivation', v2.0 (revised March 2014), CEPS/INSTEAD, Esch/Alzette, Luxembourg. (2020-01-02).