Machine Learning with Caret
Max Kuhn

Join Max Kuhn on a tour through Machine Learning in R. You'll learn about data preparation, model fitting, model assessment and predictions. Prior experience with lm is enough to get started and learn advanced modeling techniques.

Geospatial Statistics and Mapping in R
Kaz Sakamoto

Geospatial expert and Columbia Professor Kaz Sakamoto is leading this class on all things GIS. You'll learn how about map projections, spatial regression, plotting interactive heatmaps with leaflet and working with shapefiles.

Spatial Packages in R

  • sp
  • sf

Spatial Data formats

  • shapefiles
  • geojson

Coordinate Systems

  • geographic coordinate systems
  • projected coordinate systems

Map Making

  • Tmap
  • leaflet

Introduction to Geoprocessing

  • Buffer
  • Clip
  • Intersect
  • Near

Spatial Statistics

  • Spatial Regression
  • Spatial Autocorrelation

Git for Data Science
Dan Chen

Daniel Chen, author of Pandas for Everyone, has given multiple talks at the New York R Conference about the data science workflow. In this workshop he'll teach how to use Git and project management for better organization and faster iteration.

Let's start over and learn Git from the beginning with the goal to use it for tracking collaborative work. But Git can be used for more than tracking code and data science projects. For example, if you're a student you can have a place to store your class notes and materials. Let's learn Git so you can be less afraid, and see how it can integrate into your life.

While Git is mainly thought of as a collaboration tool. There's a lot you can do with Git on your own without collaborating with other people. Many of the Git work flows (e.g., Git-flow) can be done on solo projects too. Thus, we'll focus on the skills of using Git on your own, with remotes (e.g., GitHub), and branches. In essence, you will be "collaborating with yourself", before we go through the process of collaborating with other people.

Let's move beyond "memorizing .. shell command and type[ing] them to sync up... [and when errors occur], sav[ing] your work elsewhere, delet[ing] the project, and download[ing] a fresh copy".

Git on your own

  • Creating a git repository
  • Adding and commiting files
  • Looking at differences between files
  • Looking at your history
  • Moving around your history
    • Reverting changes
    • Undelete files

Working with remotes

  • Going from your computer to a remote (e.g., GitHub, BitBucket, GitLab)
  • Syncing your files by pushing and pulling
  • Conflicts

Git with branches

  • Creating branches
  • Moving around different branches
  • Making commits in branches
  • Merging branches
  • Using branches with remotes
    • Pull requests (aka, merge requests)
    • Merging pull requests
  • Syncing up with your remote

Collaborating with Git

  • Adding collaborators
    • Repository contributor
    • Forking
  • Making changes directly to Master
  • Making changes using branches

Introduction to Survival Analysis
Elizabeth Sweeney

Time-to-event outcomes are common in a variety of statistical applications, but the statistical techniques needed to appropriately analyze data in the presence of censoring or when predictor variables are not observed at baseline are not always taught as part of a standard statistics curriculum. This workshop will introduce the statistical techniques needed to address common questions in the context of time-to-event outcomes. Topics covered will include types of censoring, the Kaplan-Meier estimator of the survival function, Cox proportional hazards regression, analysis of time-dependent covariates, and competing risks methods to handle situations where more than one type of event is possible. All common statistical analyses will be demonstrated in R, including use of the survival and ggsurvplot packages.

Part 1

Introduction to survival time data

  • Types of censoring
  • Components of survival data
  • Dealing with dates in R
  • Introduction to data example

Part 2

The survival function

  • Kaplan-Meier estimate
  • Estimating median survival times
  • Estimating survival times at specific timepoints
  • Testing for between group differences
  • Kaplan-Meier plots in R

Part 3

Cox proportional hazards regression

  • The proportional hazards assumption
  • Incorporating time dependent covariates
  • Interpreting hazard ratios
  • Alternatives to Cox, time permitting

Part 4

Competing risks methods

  • Cause-specific hazards
  • Cumulative incidence
  • Cumulative incidence plots in R
  • Competing risks regression