Exploratory Data Analysis, Regression, and Classification for Education


Start Any Time
Work on your pace and you will have instructors available to help you answer any questions.

Duration
Approximately 6 weeks, 3-4 hours/week

Fee*
$750 Professional Rate
$300 Full-time Student Rate**
*Have you taken one of our courses before? Refer a friend or colleague and get 20% off any future course – they’ll get 20% off a course of their choosing, too! Just have the person you refer email us at learnlab-help@lists.andrew.cmu.edu with your name and email address.
**Proof of full time student enrollment required. Acceptable forms of ID include a letter from your university’s registrar office or an unofficial transcript. Email your documents to learnlab-help@lists.andrew.cmu.edu.
Certificate Course Description:
In this course, you will learn how to conduct exploratory data analysis of educational data, apply linear regression techniques, and create classification models. After a brief introduction to Learning Analytics and Educational Data Science, you will learn how to define measures of success and learning, create models that can capture these constructs, and apply them to real-world educational datasets.
Building on this foundation, you will explore how linear regression can be used to predict continuous learning outcomes and how various classifiers can be applied to categorize student data, identify learning patterns, and forecast educational performance. You will also learn how to evaluate model accuracy and fairness, and how to implement these models using authentic data from educational settings.
Module 1: Understanding the Nature of Educational Data
- Compare and contrast Learning Analytics and Educational Data Science
- Explain the significance and different types of data in enhancing student learning experiences and outcomes
- Use LLMs like Copliot and Codex for writing code
Module 2: Exploratory Data Analysis
- Define student success and evaluate different measures for success in educational settings
- Differentiate between explanatory and predictive models in the context of learning analytics
- Implement a predictive modeling lifecycle to predict student success in educational datasets
- Apply data analysis techniques and tools to explore and preprocess educational data for predicting student success
Module 3: Classifiers for Course Level Data
- Explain how decision trees, random forest, bayesian models, and logistic regression work analyzing their benefits and drawbacks
- Evaluate the performance of predictive models using appropriate metrics and strategies to mitigate overfitting and bias
- Implement a predictive model using Python
Module 4: Regression Models
- Apply linear regression in appropriate contexts
- Use simple linear regression to predict a continuous dependent variable
Module 5: Course Project or Final Exam
At the end of the course, you’ll have an opportunity to do a little project where you will have choice to analyze “Open University Learning Analytics dataset” for predicting at-risk students or “Student Performance in Portuguese Schools dataset” for predicting student success. That will provide you with a nice experience to apply the fundamentals you will learn in the modules to a larger, more authentic, context. It will be self-graded and you will receive a sample solution.
You will have an alternative option to take a final exam where you will answer 20 questions. The exam can be taken multiple times and each time new questions are randomly selected from a pool of questions.
You are also free to do both the course project and the final exam, we will consider the one in which you score more for counting towards the certificate.
No prerequisites but experience with a programming language (e.g Python) will be helpful.
Researchers, educational data scientists, learning analysts, instructional designers, and students who want to learn about various techniques and considerations for handling educational datasets. Anyone interested in edtech.
What you'll learn
This course will help you:
- Conduct exploratory data analysis to uncover trends and prepare educational datasets for modeling
- Apply linear regression and classification techniques to make data-driven predictions about student outcomes
- Evaluate model performance using appropriate metrics while addressing bias and ethical concerns
- Implement end-to-end data analysis workflows on real-world educational data using Python or R
Course Instructors

Dr. John Stamper
is an Associate Professor of Human-Computer Interaction at Carnegie Mellon University. Dr. Stamper has a PhD in Computer Science from the University of North Carolina at Charlotte. His main area of research is focused on using “Big Data” from educational systems to improve learning. He is also the lead researcher behind DataShop, which is the largest open repository of log data from learning systems….

Dr. Paulo Carvalho
is an assistant professor in the Human-Computer Interaction Institute. His research explores how AI can revolutionize learning through the creation of engaging, practice-first and practice-only environments. Using data analytics and computational modeling, he investigates patterns in student learning, motivation, and interest to develop precise models that enhance educational experiences. His current work examines how generative AI can transform practice-focused approaches, simultaneously boosting student engagement while enabling teachers to provide more personalized support…..
Certificate
Upon successful completion of the program, participants will receive a verified digital certificate of completion from Carnegie Mellon University’s Open Learning Initiative.

In addition to the knowledge and immediately applicable frameworks you will gain by attending your selected courses, you will benefit from:
- A digital, verified version of your Executive Certificate (Smart Certificate) you can add to your resume and LinkedIn
- Networking with a global group of your peers and instructors for advancing your career
Register Now
Register and start taking the course in three steps:
1. Enter your name and email address.
2. Create your account here to access our learning platform.