class: center, middle, inverse, title-slide # Figure customization in R and Stata ### Gustavo Castillo, Dan Killian ### June 2021 --- ### Session objective Show finished visualization in R Narrate construction of visualization step by step (coding out loud) Compare / contrast against figure construction in Stata --- ### Visualization using a grammar of graphics The ggplot2 package is based on a specific grammar of graphics. Under this grammar of graphics, a visualization comprises six layers: - data - mapping (aesthetic) - geometry - facet - statistic - theme --- ### Template of a data visualization All plots require the first three layers: - data to visualize - a mapping of variables from the data to the visualization grid - different types of mappings are referred to as 'aesthetics' - a geometry that gives a specific implementation of the specified aesthetic ```r ggplot(data = [DATASET], mapping = aes(x = [X VARIABLE], y = [Y VARIABLE])) + geom_SOMETHING() ``` --- ### Example: Governance trends before and after the Arab Spring .panelset[ .panel[.panel-name[Raw data] |Country_Name |Indicator |Subindicator_Type | 1996| 1998| 2000| 2002| |:------------|:------------------------|:-----------------|----------:|----------:|----------:|----------:| |Egypt |Voice and Accountability |Estimate | -0.8441842| -0.9117861| -0.8945062| -1.1022900| |Egypt |Voice and Accountability |StdErr | 0.2088098| 0.2144461| 0.1929377| 0.1569617| |Egypt |Voice and Accountability |Rank | 22.0000000| 21.3930300| 22.3880600| 16.4179100| |Libya |Voice and Accountability |Estimate | -1.4973060| -1.6791560| -1.6886630| -1.8240650| |Libya |Voice and Accountability |StdErr | 0.2088098| 0.2144461| 0.1972829| 0.1591490| |Libya |Voice and Accountability |Rank | 9.5000000| 4.4776120| 4.4776120| 2.4875620| Let's plot the World Bank Voice and Accountability index for each country, across years ] .panel[.panel-name[Prepared data] .pull-left[ ```r va1 <- va %>% filter(Subindicator_Type == "Estimate") %>% pivot_longer(!Country_Name & !Indicator_Id & !Indicator & !Subindicator_Type, names_to = "Year", values_to = "Estimate") %>% select(1,5,6) %>% filter(Year > 2007, Country_Name == "Egypt"| Country_Name == "Tunisia"| Country_Name == "Libya") %>% mutate(Year=as.numeric(Year)) %>% arrange(Year) %>% as.data.frame() kable(va1) ``` ] .pull-right[ |Country_Name | Year| Estimate| |:------------|----:|----------:| |Egypt | 2008| -1.2122390| |Libya | 2008| -1.9414910| |Tunisia | 2008| -1.3505450| |Egypt | 2009| -1.1575010| |Libya | 2009| -1.9102790| |Tunisia | 2009| -1.3583390| |Egypt | 2010| -1.1880540| |Libya | 2010| -1.9404160| |Tunisia | 2010| -1.4161650| |Egypt | 2011| -1.1399830| |Libya | 2011| -1.5944120| |Tunisia | 2011| -0.3705043| |Egypt | 2012| -0.7650635| |Libya | 2012| -0.9007420| |Tunisia | 2012| -0.1716522| |Egypt | 2013| -1.0515590| |Libya | 2013| -0.9716001| |Tunisia | 2013| -0.0844274| |Egypt | 2014| -1.1806000| |Libya | 2014| -1.1137260| |Tunisia | 2014| 0.1927840| |Egypt | 2015| -1.1904170| |Libya | 2015| -1.3416290| |Tunisia | 2015| 0.2414132| |Egypt | 2016| -1.2045020| |Libya | 2016| -1.4300810| |Tunisia | 2016| 0.3045838| |Egypt | 2017| -1.2506860| |Libya | 2017| -1.4426350| |Tunisia | 2017| 0.1620905| |Egypt | 2018| -1.3115360| |Libya | 2018| -1.5224920| |Tunisia | 2018| 0.2107755| |Egypt | 2019| -1.4286730| |Libya | 2019| -1.4583350| |Tunisia | 2019| 0.2814057| ] ] .panel[.panel-name[Finished plot] ```r include_graphics("Voice and Accountability - R.png") ``` <img src="Voice and Accountability - R.png" width="60%" /> ] ] --- class: middle # Coding out loud - data - mapping (aesthetic) - geometry - facet - statistic - theme --- .midi[ > 1. **Start with the data** ] .pull-left[ ```r *ggplot(data = va1) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. **map year to the x-axis** ] .pull-left[ ```r ggplot(data = va1, * mapping = aes(x = Year)) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-7-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. **map Voice and Accountability Index to y-axis** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, * y = Estimate)) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. **map Country to the color aesthetic** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, * color = Country_Name)) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-9-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. **assign a point geometry to display the data mapping** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color = Country_Name)) + * geom_point(size=3) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-10-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. **assign a line geometry to display the data mapping** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color = Country_Name)) + geom_point(size=3) + * geom_line(size=1) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-11-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. **assign a color blind-friendly palette** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color = Country_Name)) + geom_point(size=3) + geom_line(size=1) + * scale_color_viridis_d() ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-12-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. **highlight a break in the timeline with a vertical line** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color = Country_Name)) + * geom_vline(xintercept=2010, * size=1.2, * color="darkgrey", alpha=.8) + #<< add transparency geom_point(size=2) + geom_line(size=1) + scale_color_viridis_d() ``` Pop quiz: Why was the new code snippet inserted above the geometries? ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-13-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. **format the x-axis** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color = Country_Name)) + geom_vline(xintercept=2010, size=1.2, color="darkgrey", alpha=.8) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() + * scale_x_continuous(limits=c(2008,2019), * breaks=seq(2008,2018,2), * labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-14-1.png" width="100%" /> ] --- .midi[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. **format the y-axis** ] .pull-left[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color=Country_Name)) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() + scale_x_continuous(limits=c(2008,2019), breaks=seq(2008,2018,2), labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) + * scale_y_continuous(limits=c(-2.0,0.5), * breaks=seq(-2.0,1,0.5)) ``` ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-15-1.png" width="100%" /> ] --- .midi[ .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. **add end-point labels** ] ] .pull-right[ ```r va_name <- va1 %>% group_by(Country_Name) %>% summarise(value1 = last(Estimate), value2 = nth(Estimate,11)) %>% mutate(value3 = c(-1.25, -1.55, .281), color=viridis(3)) va_name ``` ``` ## # A tibble: 3 x 5 ## Country_Name value1 value2 value3 color ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 Egypt -1.43 -1.31 -1.25 #440154FF ## 2 Libya -1.46 -1.52 -1.55 #21908CFF ## 3 Tunisia 0.281 0.211 0.281 #FDE725FF ``` ] --- .midi[ .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. **add end-point labels** ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color=Country_Name, group=Country_Name)) + geom_vline(xintercept=2010, size=1.2, color="darkgrey", alpha=.8) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() # + ``` ] .pull-right[ ```r # continued from left column # scale_x_continuous(limits=c(2008,2019), # breaks=seq(2008,2018,2), # labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) + # scale_y_continuous(limits=c(-2.0,0.5), # breaks=seq(-2.0,1,0.5), *# sec.axis=sec_axis(~., *# breaks=va_name$value3, *# labels=va_name$Country_Name)) ``` <img src="figure_customization_files/figure-html/unnamed-chunk-19-1.png" width="100%" /> ] ] --- .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. add end-point labels > 12. **remove legend** ] .pull-right[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color=Country_Name, group=Country_Name)) + geom_vline(xintercept=2010, size=1.2, color="darkgrey", alpha=.8) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() + scale_x_continuous(limits=c(2008,2019), breaks=seq(2008,2018,2), labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) + scale_y_continuous(limits=c(-2.0,0.5), breaks=seq(-2.0,1,0.5), sec.axis=sec_axis(~., breaks=va_name$value3, labels=va_name$Country_Name)) + * theme(legend.position="NULL") ``` <img src="figure_customization_files/figure-html/unnamed-chunk-20-1.png" width="100%" /> ] --- .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. add end-point labels > 12. remove legend > 13. **apply a thematic style** ] .pull-right[ ```r base <- theme_bw() + theme(panel.grid.minor.x=element_blank(), panel.grid.minor.y=element_blank(), plot.title=element_text(face="bold",size=18, hjust=.5, family = "Source Sans Pro"), plot.subtitle = element_text(size=16, family="Source Sans Pro"), plot.caption=element_text(size=12, family="Source Sans Pro"), axis.title=element_text(size=16, family="Source Sans Pro"), axis.text=element_text(size=14, family="Source Sans Pro"), legend.text=element_text(size=14, family="Source Sans Pro"), strip.text=element_text(size=14, family="Source Sans Pro"), panel.border=element_blank(), axis.ticks = element_blank()) ``` ] --- .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. add end-point labels > 12. remove legend > 13. **apply a thematic style** Pop quiz: why is the theme function placed after calling the base object? ] .pull-right[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color=Country_Name, group=Country_Name)) + geom_vline(xintercept=2010, size=1.2, color="darkgrey", alpha=.8) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() + scale_x_continuous(limits=c(2008,2019), breaks=seq(2008,2018,2), labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) + scale_y_continuous(limits=c(-2.0,0.5), breaks=seq(-2.0,1,0.5), sec.axis=sec_axis(~., breaks=va_name$value3, labels=va_name$Country_Name)) + * base + theme(legend.position="NULL") ``` ] --- .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. add end-point labels > 12. remove legend > 13. **apply a thematic style** ] .pull-right[ <img src="figure_customization_files/figure-html/unnamed-chunk-22-1.png" width="100%" /> ] --- .pull-left[ > 1. Start with the data, > 2. map year to the x-axis > 3. map Voice and Accountability Index to y-axis > 4. map Country to the color aesthetic > 5. assign a point geometry to display the data mapping > 6. assign a line geometry to display the data mapping > 7. assign a color blind-friendly palette > 8. highlight a break in the timeline with a vertical line > 9. format the x-axis > 10. format the y-axis > 11. add end-point labels > 12. remove legend > 13. apply a thematic style > 14. **remove axis labels, add title and caption** ] .pull-right[ ```r ggplot(data = va1, mapping = aes(x = Year, y = Estimate, color=Country_Name, group=Country_Name)) + geom_vline(xintercept=2010, size=1.2, color="darkgrey", alpha=.8) + geom_point(size=3) + geom_line(size=1) + scale_color_viridis_d() + scale_x_continuous(limits=c(2008,2019), breaks=seq(2008,2018,2), labels=c("2008", "Arab\nSpring", "2012", "2014", "2016", "2018")) + scale_y_continuous(limits=c(-2.0,0.5), breaks=seq(-2.0,1,0.5), sec.axis=sec_axis(~., breaks=va_name$value3, labels=va_name$Country_Name)) + base + theme(legend.position="NULL") + labs(x="", y="", title="Voice and Accountability", caption="Voice and Accountability ranges from -2.5 (weak) to 2.5 (strong)") ``` <img src="figure_customization_files/figure-html/unnamed-chunk-23-1.png" width="100%" /> ] --- ### Final products .pull-left[ Stata (see separate documentation for tutorial content) <img src="Voice and Accountability - Stata.png" width="80%" /> ] .pull-right[ R <img src="Voice and Accountability - R.png" width="80%" /> ] --- class: center, middle # Thank you!