UWB Logo

Computer Science and Software Engineering Capstone Presentations

Fall Quarter

December 18, 2020

Ryan Larson

"Virtual Academic Advisor Data Generation"

(UWB CSS Faculty Research)

Faculty Advisor: Dr. Erika Parsons

Abstract

The Virtual Academic Advisor is a software system that aims to automate the process of scheduling for academic institutions. In many community colleges in particular, the availability of resources for advising can be limited, and this may cause issues for both faculty and students. By implementing a strategy that can either fully automate or assist in the process of scheduling students, both faculty and students' benefit as the faculty may spend their time on other responsibilities, and students may have an easier time evaluating whether a schedule works for them or not.

Both a deterministic and machine learning approach are utilized in order to either produce training samples or use a trained model to generate student schedules based on their preferences. The goal of this research project was to design and implement a synthetic data generator to be used by the recommender system. The data generator must be able to model as accurately as possible, the distributions of student preferences, and be able to scale to new preferences. A Bayesian Network approach was used in this project due to its capability of modelling conditional probabilities as opposed to joint probability distributions for variables of discrete values. This is beneficial because with networks that are not highly connected - which in the case of the student preferences used in this project, they are not - the conditional probability tables are reduced to sizes in the hundreds instead of in the potentially billions for the joint distribution.

This Bayesian Network model of student preferences can be used for both statistical analysis of both the synthetic and true samples, in addition to providing a means for generating arbitrary sized sample sets to be used for training the Machine Learning model. By using semi-synthetic data - a combination of both the estimated distributions and the distributions sampled from real users - the data generator is able to converge (over time) to the true population proportions and accurately represent the student population well enough to generate samples ideally indistinguishable from real samples.

Updated December 16, 2020