Training

Empower your research journey with expert-led courses

Welcome to our training page dedicated to advancing your skills in Research Computing, Data Services, and Research Software Engineering. Offered university-wide through the Harvard Training Portal, our courses cover a wide range of essential topics designed to enhance your research capabilities. From the fundamentals of Python and R Programming to advanced techniques in Optimization, our curriculum also includes Cloud Computing, Containerization, Version Control, High-Performance Computing (HPC), and Workflow Optimization. Whether you’re a novice researcher or an experienced academic, our tailored courses ensure you stay at the forefront of modern research methodologies and tools.

Learn

Courses

At this time we have paused our training program due to resource constraints; we plan to restart trainings at a later point in time. Please check back for any updates, or reach out to regulated_data_services@harvard.edu with inquiries. In the meantime, we will make materials available if at all possible in the course listings below.

Past Courses
Version Control for Researchers: Mastering Git in Scientific Workflow
Course description: This course is designed specifically for the research community. Dive deep into the world of version control to streamline your research processes, ensuring data integrity, collaboration efficiency, and reproducibility. This course will introduce the fundamentals of Git, showcase its relevance in research scenarios, and guide participants through hands-on exercises to integrate Git seamlessly into their research workflows.
Pre-Requisites: Basic familiarity with command-line interfaces and prior experience with any programming or scripting language.
Additional Information: This is a virtual event on Zoom, and attendees should have access to a laptop or desktop. Please contact mohammadi_shad@harvard.edu if you have any question.
Instructor: Naeem Khoshnevis
Date/Time: 12/05/2023 , 11am – 1pm
Register: Link
Containerization and Research Computing
Course description: Containers are a key tool in modern software deployment and dependency management. However, they are often underutilized in academic environments. This session will cover what containers are, common container technologies (such as Docker and Singularity/Apptainer), and how containerization can enable reproducible scientific computing on your computer, on HPC systems, and in the cloud.
Pre-Requisites: Comfort with basic Linux commands, familiarity with HPC systems (like the Cannon cluster), and experience with a scientific programming language, such as Python, is encouraged for this course.
Additional Information: Students and researchers can either utilize the FAS Research Computing CANNON cluster, other HPC systems or their own laptop/PC to run the codes. Please contact mohammadi_shad@harvard.edu if you have any question.
Instructor: Ben Sabath
Date/Time: 11/29/2023 , 11am – 1pm
Register: Link
R Code Optimization
Course description: R is an excellent tool for various computational and statistical analysis tasks, including data processing, data analysis, and data wrangling. Like all other computational tools, R could sometimes be slow and delay our computational jobs and analysis workflow. This workshop offers a review of techniques and methods to optimize and speed up our analysis with R. This way; we can shorten the computational time and get results faster. The workshop also addresses a few optimization techniques for memory intensive or CPU intensive jobs on the HPC clusters as well as best practices to optimize R codes in general.
Pre-Requisites: Students can utilize the FAS Research Computing CANNON cluster or any other HPC clusters to run the R codes or their PC/laptops.
Additional Information: This is a virtual event on Zoom, and attendees should have access to a laptop or desktop. Due to time constraints, we will not be able to set up accounts in the workshop or troubleshoot login issues. Contact mohammadi_shad@harvard.edu with any questions.
Instructor: Mahmood Shad
Date/Time: 11/30/2023 , 11am – 1pm
Register: Link
Data Handling in Python
Course description: Python is a versatile and powerful general-purpose programming language, and people with a broad spectrum of skills use it for their research and work. Handling data in Python is sometimes challenging in terms of working with large datasets and efficient parsing of data files. In this course, we’ll go deep in Python and best practices when working with datasets and especially tools for handling large datasets. We cover intermediate to advanced topics from data structures to working efficiently with large data files.
Pre-Requisites: Basic familiarity with Python is required. Participants should be able to use an integrated development environment (IDE) or a command-line text editor, such as Vim/Nano/Emacs, without assistance.
Additional Information: Students and researchers can either utilize the FAS Research Computing CANNON cluster, other HPC systems or their own laptop/PC to run the Python codes. Please contact mohammadi_shad@harvard.edu if you have any question.
Instructor: Samah Karim
Date/Time: 12/01/2023 , 11am – 1pm
Register: Link
Introduction to Data Processing Workflow Languages
Course description: Dive into the exciting world of data processing workflows! This course provides an easy-to-understand introduction to different workflow languages. Learn the differences between API-based and command-based pipelines, explore topologies, and get hands-on with popular descriptive languages like CWL, WDL, and Nextflow. By the end, you’ll craft your own simple pipelines in CWL and Nextflow.
Pre-Requisites: Basic understanding of data processing concepts. No prior knowledge of specific workflow languages required.
Additional Information: Please contact mohammadi_shad@harvard.eduif you have any question.
Instructor: Michael Bouzinier
Date/Time: 12/06/2023 , 11am – 1pm
Register: Link