Computer Science and Software Engineering Capstone Presentations
Fall Quarter
December 18, 2020
Ryan Larson "Virtual
Academic Advisor Data Generation" (UWB CSS Faculty Research) Faculty Advisor: Dr. Erika Parsons |
Abstract The Virtual Academic Advisor is a software system
that aims to automate the process of scheduling for academic institutions. In
many community colleges in particular, the availability of resources for
advising can be limited, and this may cause issues for both faculty and
students. By implementing a strategy that can either fully automate or assist
in the process of scheduling students, both faculty and students' benefit as
the faculty may spend their time on other responsibilities, and students may
have an easier time evaluating whether a schedule works for them or not. Both a deterministic and machine learning approach
are utilized in order to either produce training samples or use a trained
model to generate student schedules based on their preferences. The goal of
this research project was to design and implement a synthetic data generator
to be used by the recommender system. The data generator must be able to
model as accurately as possible, the distributions of student preferences,
and be able to scale to new preferences. A Bayesian Network approach was used
in this project due to its capability of modelling conditional probabilities
as opposed to joint probability distributions for variables of discrete
values. This is beneficial because with networks that are not highly
connected - which in the case of the student preferences used in this
project, they are not - the conditional probability tables are reduced to
sizes in the hundreds instead of in the potentially billions for the joint
distribution. This Bayesian Network model of student preferences
can be used for both statistical analysis of both the synthetic and true
samples, in addition to providing a means for generating arbitrary sized
sample sets to be used for training the Machine Learning model. By using
semi-synthetic data - a combination of both the estimated distributions and
the distributions sampled from real users - the data generator is able to
converge (over time) to the true population proportions and accurately
represent the student population well enough to generate samples ideally
indistinguishable from real samples. |
|
Updated December 16, 2020