Data Science and Advanced Analytics

Melanie Murphy, Brian Calhoon, Dan Killian

What is Data Science? Advanced analytics?

  • Data science: The extraction of learning from data

  • Advanced analytics: The application of methods that learn from the data

Robust external demand for advanced analytics

  • USAID’s 2011 evaluation policy remains in force, offering the opportunity for impact evaluations

  • USAID references analytics in its solicitations (simulation, social network analysis, modeling)

  • New data sources require deeper understanding, processing, and analysis

Robust internal demand for advanced analytics

  • We lost our analysts!

  • MSI junior / midlevel staff interested in gaining skills and experience / dedicated experience path

  • MSI needs to document and build its past performance and capabilities for new business

  • Need to justify / defend analytical choices

Data science and advanced analytics

Three primary areas of action

  • Best practice application of technique to extract learning
  • Build capacity of interested staff
  • Document past practice to help win new business

Application of technique to extract learning

Technique Use
Multiple regression Explain an outcome of interest, after accounting for the influence of other factors
Ensemble regression Predict an outcome of interest, using a collection of regression models
Factor / Principal Component Analysis Reduce a set of correlated variables into a fewer number of 'factors' or 'components'
Item response theory Validate the hypothesized measurement construct of an item
Conditional inference tree Search for statistically significant splits in the data across different pathways
Random forest Identify the most salient predictors of an outcome of interest using a collection of conditional inference trees
Causal tree Identify variation in a treatment effect in the form of a decision tree
Causal forest Identify the most salient variation in treatment effects using a collection of causal trees
Bayesian network Estimate probabilities between a set of variables
Bayesian priors Use stakeholder knowledge to co-create baseline values of an outcome of interest

Build capacity of interested staff

  • Structured trainings, hopefully at least once per year
  • 4-12 brown bag sessions per year
  • Weekly office hours / on-demand consultation

Document past practice and identify new opportunities

  • Inventory of techniques, activities, and analysts
  • Review existing task pipeline for opportunities to apply technique
  • Demonstrate new techniques to expand performance and capabilities

Tasks

  • Build and maintain trackers of techniques, where they were used, which staff implemented them
  • Identify interested and capable staff
  • Develop comms on current performance and capabilities
  • Develop how-to manual / cookbook on analytical workflow
  • Develop content for training and brown bag sessions

Interested? Please follow link in the chat.

Thank you!