May 09, 2019

8:00 AM - 9:00 AM

Registration, Breakfast & Opening Remarks

9:00 AM - 5:00 PM

Machine Learning with Caret
Max Kuhn, RStudio

Join Max Kuhn on a tour through Machine Learning in R. You'll learn about data preparation, model fitting, model assessment and predictions. Prior experience with lm is enough to get started and learn advanced modeling techniques.

9:00 AM - 5:00 PM

Geospatial Statistics and Mapping in R
Kaz Sakamoto, Lander Analytics

Geospatial expert and Columbia Professor Kaz Sakamoto is leading this class on all things GIS. You'll learn how about map projections, spatial regression, plotting interactive heatmaps with leaflet and working with shapefiles.

9:00 AM - 5:00 PM

Introduction to Survival Analysis
Emily Zabor, Memorial Sloan Kettering

Time-to-event outcomes are common in a variety of statistical applications, but the statistical techniques needed to appropriately analyze data in the presence of censoring or when predictor variables are not observed at baseline are not always taught as part of a standard statistics curriculum. This workshop will introduce the statistical techniques needed to address common questions in the context of time-to-event outcomes. Topics covered will include types of censoring, the Kaplan-Meier estimator of the survival function, Cox proportional hazards regression, analysis of time-dependent covariates, and competing risks methods to handle situations where more than one type of event is possible. All common statistical analyses will be demonstrated in R, including use of the survival and ggsurvplot packages.

9:00 AM - 5:00 PM

Git for Data Science
Dan Chen, Virginia Tech

Daniel Chen, author of Pandas for Everyone, has given multiple talks at the New York R Conference about the data science workflow. In this workshop he'll teach how to use Git and project management for better organization and faster iteration.

May 10, 2019

8:00 AM - 8:50 AM

Breakfast & Open Registration

8:50 AM - 9:00 AM

Opening Remarks

9:00 AM - 9:20 AM

Building the Tidyverse From Scratch: Teaching Data Cleaning and Visualization with R-inspired Custom Scratch Blocks
Ludmila Janda, Amplify

At Amplify, we are developing a series of middle school computer science lessons that will teach students to clean data and make graphs using a new visual programming interface based on Scratch by MIT. In the lessons, students will use manipulable blocks rather than written code. We have developed a new set of code blocks that are inspired by both the verbs from the dplyr package and the grammar of graphics approach used by ggplot. This talk will discuss how I have helped a diverse team of stakeholders draw from the principles of the tidyverse and develop this unique approach to teaching data science practices.

9:25 AM - 9:45 AM

To Be Announced

9:50 AM - 10:10 AM

Emily Dodwell, AT&T Labs Research

10:10 AM - 10:40 AM

Break & Networking
Break & Networking

10:40 AM - 11:00 AM

Emily Robinson, DataCamp

11:05 AM - 11:25 AM

How My Wife's Research Led My Student to Write a Multilevel Lasso Package
Jared P. Lander, Lander Analytics

11:30 AM - 11:50 AM

Dan Chen, Virginia Tech

1:00 PM - 1:20 PM

Krista Watts, United States Military Academy

1:25 PM - 2:05 PM

Andrew Gelman, Columbia

2:05 PM - 2:35 PM

Break & Networking

2:35 PM - 2:55 PM

Wes McKinney, Ursa Labs

3:00 PM - 3:20 PM

Artificial Intelligence Driven Drug Discovery
Michelle Gill, BenevolentAI

At BenevolentAI, we use machine learning to facilitate drug discovery. This talk will introduce the drug discovery process and explain how machine learning maps to these stages. We will then cover challenges specifically related to using machine learning for scientific discovery, and conclude with a specific application of reinforcement learning to generate novel compounds in silico.

3:25 PM - 3:45 PM

David Madigan, Columbia

3:45 PM - 4:15 PM

Break & Networking
Break & Networking

4:15 PM - 4:35 PM

Reproducible Research in Finance with R
Soumya Kalra, R-Ladies NYC

4:40 PM - 5:00 PM

Heather Nolis, Noliss, LLC

5:00 PM - 5:10 PM

Closing Remarks

May 11, 2019

9:00 AM - 9:50 AM

Breakfast & Open Registration

9:50 AM - 10:00 AM

Opening Remarks

10:00 AM - 10:20 AM

I’ll have what she’s having (and other models of consumer behavior)
Gabriela Hempfling, Chop't

This talk will focus on predictive models in R and more broadly their use in industry. As our experience online gets more personalized, organizations are increasingly making assumptions about who we are and how to engage us. Meanwhile, consumers have limited access to the context that drives their online experience. Gabriela sets out to use that context to build some personal analytics and ask if we can use our understanding of data science to become more self aware.

10:25 AM - 10:45 AM

Jacqueline Nolis, Noliss, LLC

10:50 AM - 11:20 AM

Break & Networking

11:20 AM - 11:40 AM

Namita Nandakumar, Philadelphia Eagles

11:45 AM - 12:05 PM

Romain Francois

12:10 PM - 12:30 PM

An Introduction to Statistical Decision Theory, or: #ABYLFOYPE - Always Be Integrating Your Loss Function Over Your Posterior Estimate
Jim Savage, Schmidt Futures

Making sound choices can be thought of as choosing between competing uncertain forecasts, each being the consequence of the choice. Statistical decision theory offers a principled method for making choices: we simply compare how we feel about each forecast--in a formal setting. In this talk, Jim walks through the ingredients required for conducting formal decision analysis in R and Stan. He provides examples from his career in making frontier markets and social impact investments.

1:40 PM - 2:00 PM

Noam Ross, ROpenSci & EcoHealth Alliance

Many data scientists operate at the interface between two cultures and workflows: programmatic data science and WYSIWYG office applications. This noisy interface impedes reproducibility and is often maddening to practitioners in both camps. I will discuss failures and successes of crossing this uncanny valley and present a series of new packages for working collaboratively in mixed teams for reproducibility, joy and harmony.

2:05 PM - 2:25 PM

Emily Zabor, Memorial Sloan Kettering

2:30 PM - 2:50 PM

Max Kuhn, RStudio

2:50 PM - 3:20 PM

Break & Networking

3:20 PM - 3:40 PM

Brooke Watson, ACLU

3:45 PM - 4:05 PM

Nicole Phelan, WeWork

4:10 PM - 4:30 PM

To Be Announced

4:30 PM - 4:40 PM

Closing Remarks