Aleš Vomáčka - What Makes a Democracy (According to Europeans)?

We will need several packages to do all our work. Good {tidyverse} for data wrangling and visualization, {brms} for bayesian model fitting, {tidybayes} for model post-processing and finally {gt} for fancy tables:

Code

library(tidyverse)
library(brms)
library(tidybayes)
library(gt)
library(avom)

theme_set(theme_avom(text = element_text(family = "Fira Sans")))

The Data

The data come from European Social Survey, an academically driven cross-national survey that has been conducted across Europe every two years for almost two decades, with the main purpose being comparative attitude and value research. We will use data from the tenth wave, gathered in 2020. As of the time of the writing, the dataset is not complete, as only ten countries are present - Bulgaria, Czechia, Estonia, Finland, France, Croatia, Hungary, Lithuania, Slovenia and Slovakia. Still, it it will work as a proof of concept:

Code

ess <- haven::read_spss("data/ESS10-2.sav")
ess <- as_factor(ess)

The respondent were asked how important they consider various aspects of democracy. They answered on a scale from 0 (“Not at all important for democracy in general”) to 10 (“Extremely important for democracy in general”). The full wording of each item is in the table below.

Code

democracy <- select(ess, fairelc:keydec)

democracy <- tibble(var   = names(democracy),    
                    label = map_chr(democracy, attr, "label"))

democracy %>% 
  gt() %>% 
  tab_header(title = "Please tell me how important you think it is for democracy in general...") %>% 
  tab_footnote(footnote = "Europeans Social Survey, Wave 10") %>% 
  tab_style(style = list(cell_fill("#282828"),
                         cell_text("#fbf1c7",
                                   style = "italic")),
            locations = list(cells_footnotes(),
                             cells_title())) %>% 
  tab_style(style = cell_text(align = "center"),
            locations = cells_column_labels()) %>% 
  cols_label(var = "Variable",
             label = "Question Wording")

Variable	Question Wording
Please tell me how important you think it is for democracy in general...
fairelc	National elections are free and fair
dfprtal	Different political parties offer clear alternatives to one another
medcrgv	The media are free to criticise the government
rghmgpr	The rights of minority groups are protected
votedir	Citizens have the final say on political issues by voting directly in referendums
cttresa	The courts treat everyone the same
gptpelc	Governing parties are punished in elections when they have done a bad job
gvctzpv	The government protects all citizens against poverty
grdfinc	The government takes measures to reduce differences in income levels
viepol	The views of ordinary people prevail over the views of the political elite
wpestop	The will of the people cannot be stopped
keydec	Key decisions are made by national governments rather than the European Union
Europeans Social Survey, Wave 10

For modeling purposes, we need to transform them into long format. What we need is one variable with peoples’ responses (response), one with item name (item) and one with country name (cntry). Last, we create a respondent ID variable (respondent):

Code

dem_long <- ess %>% 
  select(cntry, fairelc:keydec) %>% 
  mutate(across(.cols = fairelc:keydec,
                .fns  = ~fct_recode(.,
                                    "10" = "Extremely important for democracy in general",
                                    "0" = "Not at all important for democracy in general")),
         across(.cols = fairelc:keydec,
                .fns = as.numeric),
         respondent = 1:n()) %>% 
  pivot_longer(cols = -c(cntry, respondent),
               names_to = "item",
               values_to = "response")

slice_sample(dem_long, n = 5)

# A tibble: 5 × 4
  cntry    respondent item    response
  <fct>         <int> <chr>      <dbl>
1 Bulgaria       1270 wpestop       10
2 France         9757 grdfinc        9
3 France         9317 wpestop        9
4 Estonia        5227 medcrgv       11
5 Bulgaria       1495 medcrgv       11

In theory, we could run our model on the dataset above. However, the data are pretty big - the long version has 216 720 rows, bit too much for my laptop. Instead we collapse rows with the same combinations of country, item and response, bringing the number of rows to a much more manageable 1 311 rows (after dropping missing values). The number of times each combinations appeared will be used as weight, to ensure the results of the model computed on the collapsed data are the same as on the non collapsed ones. This speeds up computations a lot, although it’s not without drawbacks - we are assuming that responses coming from the same respondent are not correlated. Perhaps unrealistic, but let’s accept it for the sake of prototyping:

Code

dem_agg <- dem_long %>% 
  count(cntry, item, response, name = "weight") %>% 
  filter(!is.na(response))

slice_sample(dem_agg, n = 5)

# A tibble: 5 × 4
  cntry    item    response weight
  <fct>    <chr>      <dbl>  <int>
1 France   gvctzpv       11    964
2 Bulgaria gvctzpv        9    308
3 Slovakia viepol         7     87
4 Slovenia rghmgpr        3      1
5 Slovenia gptpelc        1      6

The Method

Since we are interested not only in how important is every item considered on average, but also to what degree respondents agree with each other, we will need some way to measure attitude agreement. Fortunately, attitude agreement maps pretty nicely to variance - the less respondents agree with each other, the higher will the variance of their responses will be. A variable will have zero variance if all respondents selected the same response. On the other hand, a variable with maximum possible variance is one where half of the people selected the lowest possible option and the other half selected the highest possible one.

Code

tibble(x = rep(0:10, 3),
       y = c(rep(0, 5), 1, rep(0,5), 1, rep(0, 10), 0.5, rep(0,9), 0.5),
       label = c(rep("Absolute Agreement", 11),
                 rep("Also Absolute Agreement", 11),
                 rep("Absolute Disagreement", 11))) %>% 
  mutate(label = fct_relevel(label,
                             "Absolute Agreement",
                             "Also Absolute Agreement",
                             "Absolute Disagreement")) %>% 
  ggplot(aes(x = x, y = y)) +
  geom_col() +
  facet_wrap(~label) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_x_continuous(breaks = 0:10) +
  labs(x = "Response category",
       y = element_blank()) +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank())

There are several ways we could model variance, but in this post I have opted for distributional models, a little known but pretty useful extension of classical (generalized) linear models.In a normal linear regression model, we predict the expected mean of dependent variable $Y$ , based on a set of predictors. The residual standard deviation is also estimated, but assumed to be constant across all values of predictors $X$ (this is the well known assumption of homoscedasticity):

$\begin{aligned} y_{i} & = N (μ, σ) \\ μ & = X β \end{aligned}$

The above is just a different way of expressing the perhaps better known formula $Y = X B + ϵ$ , with residual assumed to be normally distributed. In a distributional model, we conditionally estimate not only the mean, but also the standard deviation:

$\begin{aligned} y_{i} & = N (μ, σ) \\ μ & = X β \\ σ & = X β \end{aligned}$

And that’s pretty much it. Actually estimating distributional models can be a pretty challenging task using maximum likelihood approaches (though it can be done using {nlme} or {gamlss} packages), but it’s nothing special for Bayesian models using MCMC, which is what we are going to use.

The Model

For simplicity, we will assume the good old Gaussian likelihood, i.e. we will build upon classical linear regression. Since the democracy items are bounded between 0 and 10, a more appropriate model would be something something like ordered beta. However I felt that the problem is complicated enough as is. We will also utilize random effects to estimate how would people rate an “average” item or how would items be rated in an “average” country.

Item Polarization

Our model will predict response based on fixed effect of item and random effect of cntry. We will use the same specification for modeling both means and standard deviations, and the observations will be weighted by weight. The rest of the arguments are technical. We’ll use cmstanr for back-end calculations - it’s faster than plain Stan and allows for within-chain parallelization. Speaking of which, we’ll run 4 chains in parallel, with each chain having two cores working on it (that’s what the thread = threading(2) is for). We also save the model so that’s not recomputed every time this text is rendered and turn all the report messages off:

Code

m_item <- brm(bf(response | weights(weight) ~ item + (item|cntry),
              sigma ~ item + (item|cntry)),
           data = dem_agg,
           family = gaussian(),
           backend = "cmdstanr",
           cores = 4,
           chains = 4,
           iter = 2000,
           warmup = 1000,
           threads = threading(2),
           seed = 1234,
           file = "models/item-model",
           file_refit = "on_change",
           refresh = 0,
           silent  = 2)

About 7 minutes later, the model is ready. I won’t bore you with details, but the models turned out pretty nice. Chains are mixing well without any warnings about divergent transitions or exceeding maximum tree depth.

First, let’s look on the average standard deviation per item. In other words, we’ll look how people across countries on average agree with each other (Figure 1). Somewhat surprisingly, there aren’t that many differences. People agree with most statements more or less the same, with the only two items breaking the mold being “National elections are free and fair” and “The courts treat everyone the same”. These two enjoy a higher level of agreement than the rest. The results are hardly big though.

Code

epred_draws(m_item,
            newdata = tibble(item  = unique(dem_agg$item)),
            dpar = TRUE,
            re_formula = NA) %>% 
  ungroup() %>% 
  left_join(., democracy, by = c("item" = "var")) %>% 
  mutate(label = fct_reorder(label,
                             sigma,
                             .fun = mean),
         label = fct_relabel(label,
                             str_wrap, 50)) %>% 
  ggplot(aes(x = sigma, y = label)) +
  stat_halfeye() +
  labs(x = "Standard Deviation (Higher Means Less Agreement)",
       y = element_blank(),
       caption = "European Social Survey, Wave 10, n = 216 720.\nDensity plots show posterior distributions, lines show 95% Credible intervals.") +
  theme(panel.grid.minor = element_blank())

Figure 1: Average Agreement Across Countries

Another way to look at the results is to check predictions intervals - what would the item agreements be for a yet unobserved country, perhaps one that has yet to release its data (Figure 2). As we can see, the interval estimates are quite wide - individual countries differ a lot, which makes makes predicting results for unobserved countries difficult (the posteriors are also lumpy - this due to the small number of countries in the dataset).

Code

epred_draws(m_item,
            newdata = tibble(item  = unique(dem_agg$item),
                             cntry = NA),
            dpar = TRUE) %>% 
  ungroup() %>% 
  left_join(., democracy, by = c("item" = "var")) %>% 
  mutate(label = fct_reorder(label,
                             sigma,
                             .fun = mean),
         label = fct_relabel(label,
                             str_wrap, 50)) %>% 
  ggplot(aes(x = sigma, y = label)) +
  stat_halfeye() +
  labs(x = "Standard Deviation (Higher Means Less Agreement)",
       y = element_blank(),
       caption = "European Social Survey, Wave 10, n = 216 720.\nDensity plots show posterior distributions, lines show 95% Credible intervals.") +
  theme(panel.grid.minor = element_blank())

Figure 2: Predicted Agreement For Unobserved Countries

Wrapping Up

Surprisingly (to me at least), Europeans seem to agree the same on the importance of pretty much all items. Slightly bigger agreement has been expressed about statements “National elections are free and fair” and “The courts treat everyone the same”, making them less controversial than the rest.