Excerpt from course description

(Big) Data Curation, Pipelines, and Management


Complete course description will be ready summer 2022.

To gain consistent benefits from machine learning models in business, it is essential to move data science projects from experimentation to production by building automated machine learning pipelines. A standard machine learning pipeline consists of data preparation, model training, model evaluation and validation. The tasks of the automated pipeline range from collecting real-time streaming data to model and output management.

In this course, you will learn the life cycle of a data science project and the responsibilities of different roles in a data science team. You will also learn how to build an efficient end-to-end data science project. Particularly focus will be put on what is typically the most time consuming part, namely data curation, cleaning, and management, including different database infrastructures and SQL-style queries.

Course content

Data science project life cycle

  • Understanding business problem
  • Data curation, collection, and preprocessing
    • Handling missing values, outliers, and other data anomalies
    • Database infrastructures and explore and curate data by SQL-style querying
    • Make analysis queries that answer business questions
    • Create training data using queries
  • Feature engineering
  • Model building and deployment
  • Roles and responsibilities in data science project


This is an excerpt from the complete course description for the course. If you are an active student at BI, you can find the complete course descriptions with information on eg. learning goals, learning process, curriculum and exam at portal.bi.no. We reserve the right to make changes to this description.