Skip to content

CANDEV Data Challenge

Ottawa 2020 - Saturday - January 18

14:00 – 15:00
C308, CRX

GitHub Crash Course

Get your GitHub repository up and running right from the gate! Learn the basics of GitHub and how to submit your solution.

By: Jean-Philippe Tissot

Required: Git, any Git GUI Client

15:00 – 16:00
C407, CRX

FastText: Text Classification Speedrun

Fasttext effectively learns word representation to classify text quickly. This presentation will walk you through a simple pipeline from cleaning raw text to predicting labels.

By: Joanne Yoon

Repository

Required: any C++ compiler // Libraries: FastText (Python)

15:00 – 16:00
C308, CRX

Developing a Dashboard using R-Shiny

A step by step guide on creating an R shiny dashboard. There will be a demo of a R Shiny product developed by the Agriculture division of StatCan that allows for crosscutting analysis of the agriculture industry.

By: Omar Youssouf

Repository

Required: R/R Studio // Packages: reshape2, spdplyr, tidyverse, shinythemes, shiny, leaflet, rgdal, magrittr, rgeos, httr, stringi, readxl, plotly, ggplot2, lubridate

16:00 – 17:00
C407, CRX

Intro to RegEx: String searching for data extraction and cleaning

An introduction to string searching using regular expressions, with a focus on extracting and cleaning data from atypical datasets. Basic regex concepts will be introduced and put into practice through exercises.

By: Margarita Bozhinova

Repository

No Required Software

16:00 – 17:00
C308, CRX

PowerBI Building Blocks

Get up to speed with PowerBI and learn how to import and manipulate data to create dynamic data visuals and dashboards.

By: Raphael Duteau

Repository

No Required Software

17:00 – 18:00
C407, CRX

Topic Modelling: Latent Dirichlet Allocation in R

A hierarchical Bayesian model to infer underlying topics for a collection of documents and assign corresponding inferred topical proportions to each document.

By: Ken Chu

Repository

Required: R/R Studio // Packages: text2vec, dplyr, tidyr, ComplexHeatmap, ggplot2, gplots, circlize, xml2, stopwords (and dependencies)

17:00 – 18:00
C308, CRX

Getting Data from the Internet with Python: APIs, Requests, and HTML Parsing

In this workshop, you will be introduced to using Python to access data from APIs (namely, geocoding and travel directions from OpenRouteService), downloading data from internet links, and performing basic web scraping of tabular data using BeautifulSoup.

By: Joseph Kuchar

Repository

Required: Anaconda (for Python) // Libraries: BeautifulSoup

Sunday - January 19

EN 9:30-9:55 // FR 10:00-10:25
C308, CRX

How to Pitch Efficiently

For CANDEV’s final lap you don’t want to be wasting time spinning your wheels, driving in circles, or idling on unimportant information. Learn how you can tailor your message to propel you to the finish line!

By: Midia Shikh and Anthony Daigle

No Required Software

How can you prepare?

The CANDEV Data Challenge is a great opportunity for students to learn about new technologies and statistical methods. We are offering workshops geared towards providing students with short, efficient, and specialized tutorials that will help them develop their solution and pitch it to judges. To get the best out the workshops, we ask the students to download the following software (please consult the workshops schedule to know the specific requirements):

For more information:

Visit the Statistics Canada website.

Follow us using #CANDEV: LinkedIn | Twitter | Facebook | Instagram | YouTube

Need assistance? Contact us.