CANDEV Data Challenge
Ottawa 2020 - Saturday - January 18
GitHub Crash Course
Get your GitHub repository up and running right from the gate! Learn the basics of GitHub and how to submit your solution.
By: Jean-Philippe Tissot
Required: Git, any Git GUI Client
FastText: Text Classification Speedrun
Fasttext effectively learns word representation to classify text quickly. This presentation will walk you through a simple pipeline from cleaning raw text to predicting labels.
By: Joanne Yoon
Required: any C++ compiler // Libraries: FastText (Python)
Developing a Dashboard using R-Shiny
A step by step guide on creating an R shiny dashboard. There will be a demo of a R Shiny product developed by the Agriculture division of StatCan that allows for crosscutting analysis of the agriculture industry.
By: Omar Youssouf
Required: R/R Studio // Packages: reshape2, spdplyr, tidyverse, shinythemes, shiny, leaflet, rgdal, magrittr, rgeos, httr, stringi, readxl, plotly, ggplot2, lubridate
Intro to RegEx: String searching for data extraction and cleaning
An introduction to string searching using regular expressions, with a focus on extracting and cleaning data from atypical datasets. Basic regex concepts will be introduced and put into practice through exercises.
By: Margarita Bozhinova
No Required Software
PowerBI Building Blocks
Get up to speed with PowerBI and learn how to import and manipulate data to create dynamic data visuals and dashboards.
By: Raphael Duteau
No Required Software
Topic Modelling: Latent Dirichlet Allocation in R
A hierarchical Bayesian model to infer underlying topics for a collection of documents and assign corresponding inferred topical proportions to each document.
By: Ken Chu
Required: R/R Studio // Packages: text2vec, dplyr, tidyr, ComplexHeatmap, ggplot2, gplots, circlize, xml2, stopwords (and dependencies)
Getting Data from the Internet with Python: APIs, Requests, and HTML Parsing
In this workshop, you will be introduced to using Python to access data from APIs (namely, geocoding and travel directions from OpenRouteService), downloading data from internet links, and performing basic web scraping of tabular data using BeautifulSoup.
By: Joseph Kuchar
Required: Anaconda (for Python) // Libraries: BeautifulSoup
Sunday - January 19
How to Pitch Efficiently
For CANDEV’s final lap you don’t want to be wasting time spinning your wheels, driving in circles, or idling on unimportant information. Learn how you can tailor your message to propel you to the finish line!
By: Midia Shikh and Anthony Daigle
No Required Software
How can you prepare?
The CANDEV Data Challenge is a great opportunity for students to learn about new technologies and statistical methods. We are offering workshops geared towards providing students with short, efficient, and specialized tutorials that will help them develop their solution and pitch it to judges. To get the best out the workshops, we ask the students to download the following software (please consult the workshops schedule to know the specific requirements):
- R and Rstudio
- Anaconda (for Python 3.7)
- Microsoft Power BI Desktop
- Notepad++
- For more information:
-
Visit the Statistics Canada website.
Follow us using #CANDEV: LinkedIn | Twitter | Facebook | Instagram | YouTubeNeed assistance? Contact us.