DIRECT Courses

DIRECT begins with a foundational two-course sequence designed to prepare students for their interdisciplinary projects. These courses were newly designed for DIRECT trainees.

Scientists, engineers, and other technical professionals require skills in computing and data analysis to do their jobs. We refer to these as data science skills.

These two courses teaches graduate students the software engineering skills to do research in data science fields and to be successful technical professionals in the 21st Century. In particular, this course teaches how to approach computational research with reproducibility in mind: to create sharable and reusable research projects that incorporate both computation and data. The courses also provide students with a survey of machine learning methods including supervised and unsupervised methods.

 

Software Engineering for Molecular Data Scientists (ChemE 546)
Students will learn the following skills:

  • Developing software in a way that it can be used by others, including documentation, installing packages, automating setup, and running computational studies.
  • Creating technical specifications for what a program should do (its use cases) and how this is accomplished (software design).
  • Creating, updating, and sharing a project using version control (specifically GitHub) for collaborative software development.
  • Programming using the Python scientific stack, including numpy, pandas, and matplotlib.
  • Developing unit tests that validate important aspects of the project implementation, and, more broadly, using test-driven development to build software.
  • Searching, evaluating, and integrating into a project an externally developed Python packages as well as creating your own Python packages.

Data Science Methods for Clean Energy Research (ChemE 545)
Students will learn the following skills:

  • Basic overview of statistical reasoning and methods including distributions, hypothesis testing and error analysis for multiple data types
  • Basic introduction to data visualization methods
  • Introduction to a wide range of machine learning methods with direct applications for problems in the design, synthesis and characterization of materials for clean energy.
  • Hands on experience with the use of the Python library scikit learn to apply ML methods on real world data sets related to design of materials for energy storage and conversion
  • Basic introduction to data management strategies

The courses emphasize a hands-on learning approach in which class time is often used for problem solving in small groups. The first 7 weeks will teach the skills described above. The remaining three weeks are devoted to the group class project, creating a computational research project designed and executed by student teams in DIRECT.

  • Using the shell (command line): http://swcarpentry.github.io/shell-novice/
  • General Python overview: http://swcarpentry.github.io/python-novice-inflammation/

 

Data Science Option

Thanks to DIRECT, the Data Science Option is now available in select departments. The Science Option is the mechanism by which students get credit on their transcripts for their focus on data science within their majors.