Exploratory Data Analysis, Regression, and Classification for Education

Start Any Time
Work on your pace and you will have instructors available to help you answer any questions.

Duration
Approximately 6 weeks, 3-4 hours/week

Fee
$750 - Professional Rate
$300 - Full-time Student Rate*
Referral & Group Discounts Available**
*Proof of full time student enrollment required. Acceptable forms of ID include a letter from your university’s registrar office or an unofficial transcript. Email your documents to learnlab-help@lists.andrew.cmu.edu.
**Have you taken one of our courses before? Refer a friend or colleague and get 20% off any future course – they’ll get 20% off a course of their choosing, too! Just have the person you refer email us at learnlab-help@lists.andrew.cmu.edu with your name and email address. Contact us about group discounts.
Certificate Course Description:
In this course, you’re going to learn how to make sense of educational data by exploring, modeling, and predicting student learning outcomes. You’ll start by examining what kinds of data we can collect from learning environments and how to explore it using techniques like visualizations and summary statistics. Then, you’ll learn how to use linear regression to model relationships between variables, and how to build classifiers that can predict important outcomes like student success or engagement.
Each module includes hands-on programming assignments that guide you through analyzing real educational datasets using Python or R. These assignments are designed to help you apply what you learn immediately using the same tools and methods that researchers and learning engineers use every day.
Module 1: Understanding the Nature of Educational Data
- Compare and contrast Learning Analytics and Educational Data Science
- Explain the significance and different types of data in enhancing student learning experiences and outcomes
- Use LLMs like Copliot and Codex for writing code
Hands-on assignment: Load and explore a real educational dataset using Python/R. Examine structure, clean missing values, and visualize student performance data.
Module 2: Exploratory Data Analysis
- Define student success and evaluate different measures for success in educational settings
- Differentiate between explanatory and predictive models in the context of learning analytics
- Implement a predictive modeling lifecycle to predict student success in educational datasets
- Apply data analysis techniques and tools to explore and preprocess educational data for predicting student success
Hands-on assignment: Merge multiple Open University datasets, encode and preprocess features, and build a Random Forest model to predict student success. Reflect on feature importance and interventions.
Module 3: Classifiers for Course Level Data
- Explain how decision trees, random forest, bayesian models, and logistic regression work analyzing their benefits and drawbacks
- Evaluate the performance of predictive models using appropriate metrics and strategies to mitigate overfitting and bias
- Implement a predictive model using Python
Hands-on assignment: Identify at-risk students by engineering features, training Random Forest and Logistic Regression models, and comparing model performance using classification metrics.
Module 4: Regression Models
- Given an example scenario, evaluate whether it would be appropriate to utilize linear regression
- Use simple linear regression to predict a continuous dependent variable through the least squares method
- Evaluate how well a regression line fits a dataset by examining visual patterns and interpreting common fit metrics
Hands-on assignment: Build and interpret both simple and multiple regression models to predict student exam scores. Visualize relationships, check assumptions, and evaluate fit using RMSE and cross-validation
Module 5: Course Project or Final Exam
At the end of the course, you’ll have an opportunity to do a little project where you will analyze a dataset for predicting student success. That will provide you with a nice experience to apply the fundamentals you will learn in the modules to a larger, more authentic, context. It will be self-graded and you will receive a sample solution.
Alternatively, you may take a final exam with 20 randomized questions. The exam can be taken multiple times, and your highest score will count.
You are also free to do both the course project and the final exam, we will consider the one in which you score more for counting towards the certificate.
No prerequisites but experience with a programming language (e.g Python) will be helpful.
Researchers, educational data scientists, learning analysts, instructional designers, and students who want to learn about various techniques and considerations for handling educational datasets. Anyone interested in edtech.
What you'll learn
This course will help you:
- Conduct exploratory data analysis to uncover trends and prepare educational datasets for modeling
- Apply linear regression and classification techniques to make data-driven predictions about student outcomes
- Evaluate model performance using appropriate metrics while addressing bias and ethical concerns
- Complete hands-on coding assignments with real-world datasets using Python/R and LLM tools like Copilot/Codex
Course Instructors
Dr. John Stamper
is an Associate Professor of Human-Computer Interaction at Carnegie Mellon University. Dr. Stamper has a PhD in Computer Science from the University of North Carolina at Charlotte. His main area of research is focused on using “Big Data” from educational systems to improve learning. He is also the lead researcher behind DataShop, which is the largest open repository of log data from learning systems….
Dr. Paulo Carvalho
is an assistant professor in the Human-Computer Interaction Institute. His research explores how AI can revolutionize learning through the creation of engaging, practice-first and practice-only environments. Using data analytics and computational modeling, he investigates patterns in student learning, motivation, and interest to develop precise models that enhance educational experiences. His current work examines how generative AI can transform practice-focused approaches, simultaneously boosting student engagement while enabling teachers to provide more personalized support…..
Certificate
Upon successful completion of the program, participants will receive a verified digital certificate of completion from Carnegie Mellon University’s Open Learning Initiative.
In addition to the knowledge and immediately applicable frameworks you will gain by attending your selected courses, you will benefit from:
- A digital, verified version of your Executive Certificate (Smart Certificate) you can add to your resume and LinkedIn
- Networking with a global group of your peers and instructors for advancing your career
Register Now
Register and start taking the course in three steps:
1. Enter your name and email address.
2. Create your account here to access our learning platform.
Have questions? Our learning engineers are here to answer them at our monthly live AMA events! Join us at 4 PM EST on First Fridays, or 10 AM EST on Third Mondays. Registration required.