Core Course

Learning Analytics Foundations: Predicting Student Success

Beginner level

No prior experience required

Flexible schedule

3 weeks, 6 to 8 hours per week

Instructor feedback

Get guidance on your work

Verified certificate

Share on LinkedIn

*Proof of full-time student enrollment required. Acceptable forms of ID include a letter from your university’s registrar office or an unofficial transcript. Email your documents to learnlab-help@lists.andrew.cmu.edu.

What you will learn

  • Conduct exploratory data analysis to uncover trends and prepare educational datasets for modeling.
  • Apply regression and classification methods to predict student outcomes.
  • Interpret model outputs, accuracy measures, and limitations in an educational context.
  • Use predictive findings to inform interventions, course redesign, and learner support decisions.

Course description

Educational data can reveal which learners are struggling, which patterns matter, and where instructional improvements are most needed. For many teams, the challenge is not access to data but knowing how to explore it, model it, and turn it into decisions that improve learning.

In this course, you will build a practical foundation in learning analytics by exploring educational datasets, applying regression and classification methods, and interpreting model results in context. The course is designed to help you move from descriptive reporting to evidence-based prediction that can inform interventions, product decisions, and course improvement.

Syllabus

Module 1: Understanding the Nature of Educational Data
  • Compare and contrast Learning Analytics and Educational Data Science.
  • Explain the significance and different types of data in enhancing student learning experiences and outcomes.
  • Use LLMs like Copliot and Codex for writing code.
  • <b>Hands-on assignment:</b> Load and explore a real educational dataset using Python/R. Examine structure, clean missing values, and visualize student performance data.
Module 2: Exploratory Data Analysis
  • Define student success and evaluate different measures for success in educational settings.
  • Differentiate between explanatory and predictive models in the context of learning analytics.
  • Implement a predictive modeling lifecycle to predict student success in educational datasets.
  • Apply data analysis techniques and tools to explore and preprocess educational data for predicting student success.
  • <b>Hands-on assignment:</b> Merge multiple Open University datasets, encode and preprocess features, and build a Random Forest model to predict student success. Reflect on feature importance and interventions.
Module 3: Classifiers for Course Level Data
  • Explain how decision trees, random forest, bayesian models, and logistic regression work analyzing their benefits and drawbacks.
  • Evaluate the performance of predictive models using appropriate metrics and strategies to mitigate overfitting and bias.
  • Implement a predictive model using Python.
  • <b>Hands-on assignment:</b> Identify at-risk students by engineering features, training Random Forest and Logistic Regression models, and comparing model performance using classification metrics.
Module 4: Regression Models
  • Given an example scenario, evaluate whether it would be appropriate to utilize linear regression.
  • Use simple linear regression to predict a continuous dependent variable through the least squares method.
  • Evaluate how well a regression line fits a dataset by examining visual patterns and interpreting common fit metrics.
  • <b>Hands-on assignment:</b> Build and interpret both simple and multiple regression models to predict student exam scores. Visualize relationships, check assumptions, and evaluate fit using RMSE and cross-validation
Module 5: Course Project or Final Exam

At the end of the course, you'll have an opportunity to do a little project where you will analyze a dataset for predicting student success. That will provide you with a nice experience to apply the fundamentals you will learn in the modules to a larger, more authentic, context. It will be self-graded and you will receive a sample solution.

Alternatively, you may take a final exam with 20 randomized questions. The exam can be taken multiple times, and your highest score will count.

You are also free to do both the course project and the final exam, we will consider the one in which you score more for counting towards the certificate.

Meet the instructor

Dr. John Stamper

Dr. John Stamper

Associate Professor & MSLE Program Director
Carnegie Mellon University

John Stamper is a faculty member in the Human-Computer Interaction Institute at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is the Director of the Master of Science in Learning Engineering (MSLE) degree program and the Technical Director of the DataShop.
Dr. Paulo Carvalho

Dr. Paulo Carvalho

Assistant Professor
Carnegie Mellon University

Paulo Carvalho is an Assistant Professor in the Human-Computer Interaction Institute at Carnegie Mellon University. His research explores how AI can revolutionize learning by creating engaging, practice-first environments. He uses data analytics and computational modeling to understand student learning, motivation, and interest and to develop precise models that inform better learning experiences. He is currently investigating how generative AI can empower these practice-focused approaches, boost engagement, and free teachers to provide personalized support. His research is funded by the National Science Foundation, IES, Schmidt Futures, the Walton Foundation, and Google.