Skip to content

Exploratory Data Analysis and Classifiers in Educational Data Science

Learn how to conduct exploratory data analysis of educational data and how to create classifiers

Start Any Time

Work on your pace and you will have instructors available to help you answer any questions.


Approximately 6 weeks, 3-4 hours/week


$1500 Professional Rate
$500 Full-time Student Rate

*Proof of full time student enrollment required. Acceptable forms of id include a letter from your university’s registrar office or an unofficial transcript. Email your documents to

Certificate Course Description:

In this course, you will learn how to conduct exploratory data analysis of educational data and how to create and apply classifiers. After a brief introduction to Learning Analytics and Educational Data Science, you will learn how to define measures of success and learning, create models that can capture learning, and apply these to existing datasets.
Following this foundation, you will learn different types of classifiers that can be applied to student data to identify processes and outcomes and how to evaluate their predictions and implement them with existing datasets.

Module 1: Introduction to Learning Analytics and Educational Data Science
  • Compare and contrast Learning Analytics and Educational Data Science
  • Explain the significance and different types of data in enhancing student learning experiences and outcomes
  • Use LLMs like Copliot and Codex for writing code


Module 2: Exploratory Data Analysis
  • Define student success and evaluate different measures for success in educational settings
  • Differentiate between explanatory and predictive models in the context of learning analytics
  • Implement a predictive modeling lifecycle to predict student success in educational datasets
  • Apply data analysis techniques and tools to explore and preprocess educational data for predicting student success


Module 3: Classifiers for Course Level Data
  • Explain how decision trees, random forest, bayesian models, and logistic regression work analyzing their benefits and drawbacks
  • Evaluate the performance of predictive models using appropriate metrics and strategies to mitigate overfitting and bias
  • Implement a predictive model using Python


Module 4: Course Project or Final Exam

At the end of the course, you’ll have an opportunity to do a little project where you will have choice to analyze “Open University Learning Analytics dataset” for predicting at-risk students or “Student Performance in Portuguese Schools dataset” for predicting student success. That will provide you with a nice experience to apply the fundamentals you will learn in the modules to a larger, more authentic, context. It will be self-graded and you will receive a sample solution.

You will have an alternative option to take a final exam where you will answer 20 questions. The exam can be taken multiple times and each time new questions are randomly selected from a pool of questions.

You are also free to do both the course project and the final exam, we will consider the one in which you score more for counting towards the certificate.

No prerequisites but experience with a programming language (e.g Python) will be helpful.

Researchers, educational data scientists, learning analysts, instructional designers, and students who want to learn about various techniques and considerations for handling educational datasets. Anyone interested in edtech.

What you'll learn

This course will help you:

  • Acquire hands-on skills in exploratory data analysis tailored to educational datasets
  • Analyze specific predictive classifiers such as decision trees, random forests, Bayesian models, and logistic regression, evaluating their suitability, strengths, and limitations in educational contexts
  • Apply knowledge gained throughout the course to real-world datasets
  • Evaluate the performance of predictive models, considering ethical dimensions and accuracy

Course Instructors

Dr. John Stamper

is an Associate Professor of Human-Computer Interaction at Carnegie Mellon University. Dr. Stamper has a PhD in Computer Science from the University of North Carolina at Charlotte. His main area of research is focused on using “Big Data” from educational systems to improve learning. He is also the lead researcher behind DataShop, which is the largest open repository of log data from learning systems….

[Read More]


Upon successful completion of the program, participants will receive a verified digital certificate of completion from Carnegie Mellon University’s Open Learning Initiative.

In addition to the knowledge and immediately applicable frameworks you will gain by attending your selected courses, you will benefit from:

  • A digital, verified version of your Executive Certificate (Smart Certificate) you can add to your resume and LinkedIn
  • Networking with a global group of your peers and instructors for advancing your career

Register Now

Register and start taking the course in four steps:

1. Enter your email address 

2. Click on this link to Carnegie Mellon University’s Open Initiative to register and try out the course for 48 hours before payment is due.