Violin Plots

Why choose violin plots over density plots?

Last updated on Dec 10, 2019

Intro

Violin plots allow us to look at the distribution of our data. But I know what you’re thinking, “Can’t I just use a density plot to do the same thing?”. While it’s true you can use the density plot to show the same information, violin plots are better if you have multiple groups or conditions you need to plot in the same chart.

Let me show you why:

Density Plot

The example data I will use here comes from a manuscript that I am currently preparing for publication. Here I have two conditions that I’m plotting: 1. Categorization accuracy for old items, 2. Categorization accuracy for new items.

pal = wes_palette("Darjeeling1", 2, type = "discrete") # Wes Anderson Palette's are fun! Check them out! 

## Build density plot
cat_density1 <- ggplot(cat_plot_data, aes(x = accuracy, fill = condition)) +
  geom_density() +
  scale_fill_manual(labels = cat_labels, values = pal) +
  labs(title = "Categorization",
       x = "Accuracy", 
       y = "Density") +
  theme(legend.position = "bottom",
        legend.title = element_blank(),
        legend.justification = "center",
        plot.title = element_text(hjust = .5))
cat_density1

You can see that when we use a density plot, we get a nice look at the distribution of the two groups. However, they are overlapping. This may not be a big deal when we only have two conditions/groups we are comparing. But imagine how much more difficult this would be to visualize our data if we had 3 or more groups.

One thing we could do is use the facet_wrap function to split our distributions into separate but side-by-side charts.

cat_density2 <- ggplot(cat_plot_data, aes(x = accuracy, fill = condition)) +
  geom_density() +
  scale_fill_manual(labels = cat_labels, values = pal) +
  labs(title = "Categorization",
       x = "Accuracy", 
       y = "Density") +
  theme(legend.position = "bottom",
        legend.title = element_blank(),
        legend.justification = "center",
        plot.title = element_text(hjust = .5)) +
  facet_wrap(~condition, ncol = 1) +
  theme(strip.background = element_blank(), #Remove the condition labels since we have a legend
        strip.text.x = element_blank())
cat_density2

This looks pretty nice! But, violin plots allow us to look at the same information but with all groups included in the same chart. No duplicate y-axis!

Violin Plots

We can plot the same data on a single graph like so:

cat_v_basic <- ggplot(cat_plot_data, aes(x = condition, y = accuracy, fill=condition)) +
  geom_violin(trim = FALSE) +
  scale_fill_manual(values = pal) +
  ylab("Categorization Accuracy (% Correct)") +
  theme(legend.position = "none",
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 15),
        text = element_text(family = "Arial",
                            size = 25)) + 
  scale_y_continuous(breaks = c(0,.25, .5, .75, 1)) +
  scale_x_discrete(labels= cat_labels)
cat_v_basic

Adding dots for individual differences

I can also superimpose individual dots for each subject to help visualize individual differences in the data.

cat_l_basic <- ggplot(cat_plot_data, aes(x = condition, y = accuracy, fill=condition)) +
  geom_violin(trim = FALSE) +
  geom_dotplot(binaxis = 'y', stackdir = 'center', dotsize = .75, fill = "black") + #added dots 
  scale_fill_manual(values = pal) +
  ylab("Categorization Accuracy (% Correct)") +
  theme(legend.position = "none",
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 15),
        text = element_text(family = "Arial",
                            size = 25)) + 
  scale_y_continuous(breaks = c(0,.25, .5, .75, 1)) +
  scale_x_discrete(labels= cat_labels)
cat_l_basic

Adding mean and reference line

Want to know the average accuracy? I can also add a marker to denote the mean for each group and a reference line to show where chance performance lies (33% for three categories). I’ll also space the dots further apart from one another so they’re no longer touching.

pal = wes_palette("Darjeeling1", 2, type = "discrete")

## Build plot
cat_v_fancy <- ggplot(cat_plot_data, aes(x = condition, y = accuracy, fill=condition)) +
  geom_violin(trim = FALSE, scale = "count") +
  geom_dotplot(binaxis = 'y', stackdir = 'center', dotsize = .75,stackratio = 1.5, fill = "black") +
  stat_summary(fun.y = mean, geom = "point", size = 3, shape = 23, fill = "Gold") + #adding mean marker
  scale_fill_manual(values = pal) +
  labs(title = "Categorization", 
       y = "Categorization Accuracy (% Correct)") +
  theme(legend.position = "none",
        plot.title = element_text(size = 20, hjust = .5),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 15),
        text = element_text(family = "Arial",
                            size = 25)) + 
  scale_y_continuous(breaks = c(0,.25, .5, .75, 1)) +
  scale_x_discrete(labels= cat_labels) +
  geom_hline(yintercept = .333, linetype = "dashed", color = "black") #added reference line
cat_v_fancy

Stefania Ashby

Doctoral Candidate (Cognitive Neuroscience)

Stefania is a doctoral candidate studying cognitive neuroscience in the Brain and Memory Lab at the University of Oregon. She loves coding in R and Matlab, data analysis and visualization, and is interested in pursuing a career in data science after graduation.