1. Introduction
In a classroom, students interact and discuss directly with teachers and with each other, allowing them to gain an understanding of their performance. However, in a digital environment, students may struggle to determine if their work is on par with that of their peers or whether it meets the teacher’s expectations. Establishing groups that allow students to engage in constructive conversations, reflect on their approaches to the subject of study, and move into deeper learning is an issue that teachers face in physical and digital classrooms [
1]. Since each student may participate in many activities over long periods of time, datasets that record such activities are usually high-dimensional, large, and complex.
Learning analytics (LA) has evolved to help in the interpretation of educational data [
2]. Siemens et al. [
3] (p. 1) define LA as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs”. To help teachers interpret LA more effectively, data can be visualized in dashboards by visual learning analytics (VLA) tools. Those are defined by Schwendimann et al. [
4] (p. 8) as “a single display that aggregates different indicators about learner(s), learning process(es), and/or learning context(s) into one or multiple visualizations”. Therefore, VLA can be used to obtain a graphical understanding of what a teacher or a student can do to improve motivation, self-direction, learning effectiveness, student performance, and teacher engagement [
5].
In this research, we improved the SBGTool, a similarity-based grouping VLA tool that we presented earlier [
6], by conducting a user study to learn more about teachers’ demands with such a pedagogical tool. This improvement includes adding sorting options to the dashboard table, adding a dropdown component to group students into classrooms, improving some visualizations, and considering several color palettes to support color blindness. We also evaluated the effectiveness of the proposed tool. Performance level in this tool is defined by the percentage of correct answers to different subjects, which varies for each student. The tool is meant to support analysts (e.g., teachers) in grouping students in meaningful ways to possibly support collaborative learning, that is, by performance levels and by activity level. The activity level is defined by the total number of answers to all subjects per student.
The research questions that drive our work are formulated as follows: (
1) How can we categorize students into various groups based on their learning outcomes? (2) How can we find the most difficult and easiest subjects among all the others? (3) How can we compare individual students’ learning activities? In our data, the activity of each student reflects the number of questions she/he answers within a given subject, and the total numbers of correct and incorrect answers for each student in different subjects is indicated in the student’s learning outcome. In our previous publication [
6], we have determined that by using the SBGTool, teachers may obtain a comprehensive summary of the students’ activities and the number of correct/incorrect responses for the entire year (on a weekly basis). They may filter the features and extract detailed information about one student, a certain week, a subject, a student answer, and a result. Additionally, they could find the most difficult and easiest subjects with the fewest and most correct responses, respectively. This analysis workflow may help the teacher to achieve meaningful decision-making in terms of pedagogical interventions.
The rest of the paper is organized as follows:
Section 2 describes the related work in this field.
Section 3 describes the dataset and preprocessing of the data.
Section 4 and
Section 5, respectively, present the design of the SBGTool v2.0 and a use case on how a teacher could quickly gain information about the students’ learning activities, group students in different performance levels, and compare two individual students’ learning achievements.
Section 6 provides a user study and a discussion on the result, and
Section 7 describes our conclusions.
2. Related Work
VLA is an active research area, with tools being proposed to aid teachers’ and students’ decision-making about learning processes by considering diverse approaches, datasets, and scenarios. Ez-zaouia et al. [
7], for instance, developed the EMODA dashboard based on a postprocessing (offline) approach that allows the teacher to track and comprehend the emotions of students throughout an online learning session. The purpose is to better understand how emotions change during the synchronous learning session. In [
8], Govaerts et al. established the Student Activity Meter (SAM), a visualization tool for awareness and self-reflection for teachers and students to assist in evaluating how much time and resources students spend on learning activities. On the other hand, He et al. [
9] proposed LearnerVis based on their offered engagement calendar matrix model to let users view the temporal characteristics of the learning process and examine how students schedule their multicourse activities in an online learning environment. Mohseni et al. [
10] developed the SAVis tool by interpreting the visualization of machine learning (ML) algorithms to enable teachers to explore students’ learning and activities by interacting with various visualizations of student data.
Publications such as [
11] have emphasized the development of tools to generate meaningful groups for students’ activities, i.e., groups that will allow students to participate in productive discussions, allowing them to reflect on their approaches to the problem and justify and evaluate their answers, and ultimately lead to deeper learning. However, relatively little attention has been paid to the grouping issue by the VLA community. Studies such as [
12,
13] have looked at the relevance of forming groups based on an ontological description of the learning goals and collaborative environment. In this study, instead, we group students based on their performance levels and activity levels.
In [
14], Ochoa utilized several models to cluster the students and the planned semester with other similar students and/or semesters in the historical dataset in their proposed VLA tool. The author wanted to estimate the probability of students failing one course in an academic semester. The risk was calculated using the previous frequency of similar students failing at least one course in similar semesters. Gutiérreza et al. [
15] described the overall design and implementation of a learning analytics dashboard for advisers (LADA). The scope is to assist academic advisers in their decision-making process that aids students in the definition of their career/life goals. In addition, advisers can aid the development of an educational plan to achieve these goals through comparative and predictive analyses. LADA used multilevel clustering, specified with adaptive specificity levels, to predict “chance of success” following the methodology given by Ochoa [
14] for the prediction of the academic risk of failing a course. The number of mildly/severely failed courses is used to determine students’ similarity. While [
14,
15] worked on a similarity-based group approach, their main objective was to predict students’ performance. In our study, instead, we are interested in providing the teacher with a multidimensional perspective of the students’ similar learning outcomes and activities. This allows for an exploratory approach that can go beyond simple performance prediction, the kind of task that is better suited for visual analytics tools than for automated, ML-based ones.
Gutierrez-Santos et al. [
16] described a computer-based grouping tool that assists secondary school math teachers in grouping their students for collaborative activities based on their diverse approaches to an exploratory learning task. The research is carried out in a microworld for the development of algebraic ways of thinking. The proposed tool enables teachers to quickly form groups for collaborative activities, increasing the likelihood of meaningful discussions by combining students with very different perspectives, e.g., low similarity. The mentioned work is based on information about the collaboration context, and it groups students based on their own interactions with an exploratory learning environment to support exploratory learning of algebraic generalization. In contrast to our approach, they focus on the strategies of the students, while our study focuses on students’ activity in terms of the numbers of correct and incorrect answers, which is a general concept in many educational datasets and can be expanded for other educational scenarios. Moreover, since we use a visual analytics approach to support the teachers in the exploration of similarities (instead of giving them automatic recommendations), our approach is inherently interpretable, tackling not only how to form the groups, but also the question of what these groups have in common (or not).
Kazemitabar et al. [
17] presented an online, asynchronous learning environment called Helping Others with Argumentation and Reasoning Dashboard (HOWARD), including two interfaces for both students and teachers. The teacher interface included a VLA tool designed to make it easier to keep track of group activities. On the dashboard interface for teachers, each group is represented by four visualizations lined together in a row that give teachers access to information on group participation and progress, the latest activities of group members, and the interaction trends between group members in the discussion space. Despite the fact that the visualizations in [
17] are based on group activities and are comparable to the work we have done to divide students into different classrooms, they did not take into account any methods to group the students based on their learning outcomes.
3. Dataset and Preprocessing
Intelligent tutoring systems (ITSs) can improve students’ learning activities by providing a personalized curriculum that addresses the individual needs of every student [
18]. EdNet (
https://github.com/riiid/ednet, accessed on 13 July 2022) [
19] is a public large-scale hierarchical educational dataset containing 131,417,236 interactions from 784,309 students from South Korea, collected over a period of two years by Santa (Riiid TUTOR (
https://aitutorsanta.com, accessed on 13 July 2022)). Santa is a multiplatform, self-study solution with an artificial intelligence (AI) tutoring system that helps students prepare for the Test of English for International Communication (TOEIC) exam. Since the EdNet dataset is large, complex, and heterogeneous, it can be difficult and challenging for teachers/students to interpret the data. SBGTool was designed and developed precisely with such a scenario in mind: to further aid teachers in grouping students with similar activities, keep track of the activities of the groups, and compare students’ achievements.
The student actions contained in the EdNet dataset are divided into four levels of abstraction: KT1, KT2, KT3, and KT4. In this paper, we use the KT1 dataset and question information table. KT1 contains the students’ question-solving logs, as well as the time spent by the student in solving a given problem (elapsed time). In addition, the question information table contains information on the correct answer to each multiple-choice question with four answer choices, as well as the subject number (or question part). The subject number is the assigned part of the lecture, which is a single integer from 1 to 7. The data preprocessing phase included cleaning, instance selection, normalization, transformation, feature extraction, and selection [
20]. Because each student’s interactions are recorded in a single CSV file, the first step in preprocessing data was to merge them into a single data frame and add a column to show the students’ IDs. The student answer durations to the multiple-choice questions in the dataset were between 0 and 851 min (around 14 h). This indicates that many students started answering a question but never finished it, generating some noisy data.
Figure 1 illustrates the distribution of answer duration for around 14 million random samples. As can be seen, almost all students take between 0 and 5 min to answer questions. To reflect this, a filter for answer duration was applied, and it considered the duration between 0 and 5 min. We also applied a filter to questions with less than 100 answers in general, keeping only questions with a significant amount of activity. The students’ answers to the questions were then analyzed to identify similarities and differences among answers.
For the analysis performed in this study, to improve performance and avoid the overload of users in the dashboard, 10,000 random samples were selected from the dataset. We believe this should be a representative sample of the whole dataset. Because the EdNet dataset does not contain information on the students’ school or classroom, we separated the students’ IDs in the 10,000 random samples into eight classrooms, allowing teachers to choose a class name from a list and focus on the students inside that class. Each sample (or row) of the final dataset presented in
Table 1 contained various features such as
student ID,
date,
date week,
month, day,
hour,
question ID,
subject number,
user answer,
correct answer,
result,
answer duration, and
class. The proposed SBGTool v2.0 can read data in CSV format; therefore, it is possible that other learning management systems using the same data format can also be used. Translation of data format would also increase the possibility to use our solution.
4. Overall Design of SBGTool v2.0
In this section, we describe the design of the SBGTool v2.0. The proposed tool is
interactive, and the views are
coordinated and interconnected, which allows much deeper interactions than simply a set of graphs generated in Excel. Interactive, coordinated views allow for the testing of hypotheses and multilevel exploration of complex data [
21]. As can be seen in
Figure 2, we employed a strategy of increasing details, starting from
Key Metrics, followed by
Overview and finally
Detail. These three levels follow Shneiderman’s mantra [
22], “overview first, zoom and filter, then details on demand”, which drives visual information-seeking behavior and interface design. The most important global information about the dataset is displayed in the
Key Metrics section (
Figure 2A), which includes the total numbers of correct and incorrect answers, number of students, questions, and number of answers in the four answer choices (A, B, C, D).
To make it easier to recognize key metrics in different visualizations, we used the following colors:
● dark blue for correct answer,
● red for incorrect answer,
● light blue for option A,
● green for option B,
● pink for option C, and
● orange for option D. Since color blindness affects about 1 in every 20 people, we chose colors that are appropriate for people who are colorblind (
Figure 3) [
23]. We utilized the possibility of choosing the color of the website [
23] to examine what our chosen color palette would seem like to colorblind viewers. The colors in the leftmost column of
Figure 3 are the “true” colors, which are shown in the remaining three columns as they would seem to someone with protanopia, deuteranopia, or tritanopia, i.e., color blindness or less sensitivity to red, green, or blue-yellow light, respectively [
23].
The
Overview section of SBGTool v2.0 (
Figure 2B) includes a view that displays the total numbers of correct and incorrect answers, the total number of students’ answers for the four answer choices, and the total number of correct answers for the four answer choices over time. This view is an overlay of grouped bar and line charts. By selecting a point (week) in the overview, all the key metrics and visualizations in the
Detail section (
Figure 2C) of SBGTool v2.0 are updated accordingly. In addition, a range slider on the overview enables the teacher to limit the
x-axis value within a given range (minimum and maximum values). By moving the mouse in the overview and looking at the number of students’ answers and correct answers (in the four answer choices for multiple-choice questions in a given week), the teachers can determine how near the students’ learning activities were to a correct response. In order to focus on the students’ activity inside a classroom, a class name should be selected from the dropdown component on the top right side of the tool. To reload the entire selected 10,000 samples, the “reset” button on the tool’s top right side can be pressed. Likewise, the “learn more” button next to the “reset” button may be used to obtain a general description of the tool that has been developed.
SBGTool v2.0 allows the user to interact with each visualization individually and drill down to more detailed levels of information. Brushing, zooming, and filtering are all supported by the majority of the visualizations. The main purpose of brushing is to emphasize brushed data elements in the various tool views. The user can obtain more information by picking a portion of each graphic and zooming in. The user can also filter the view by clicking on the legend (square or circle) in the right half of each display.
The
Detail section of the proposed tool (
Figure 2C) includes a table and two bar charts on the left-hand side, as well as three tabs with specific visualizations to aid in the discovery of insight about various subjects, students’ groups with similar activities in terms of the number of correct answers, number of student interactions, number of interactions in different features, and the difference between students’ learning activities.
Teachers can obtain thorough information on students’ activities and the time they set aside to answer a question by glancing at the table shown in
Figure 2C. In addition, they can filter and sort the table according to the student ID, date week, date, subject number, user answer, correct answer, result, and answer duration, to have more focused information. The first bar chart shown in
Figure 2C on the left-hand side depicts the overall percentages of correct and incorrect responses. As can be seen in the bar chart presented in
Figure 2C, the percentages of correct and incorrect answers for students in class 1 are 70.47% and 29.53%, respectively. The second bar chart on the left-hand side of the tool with multicategory axis type presents the percentages of difficulty and ease in seven subjects. For example, as seen in this bar chart, the percentages of difficulty and ease in subject 1 are 23% and 77%, respectively. By comparing these percentages to those in other subjects, we can determine the most difficult and easiest subjects. In order to determine the percentage of difficulty in different subjects, we use Equation (1) where
and
are the numbers of correct and incorrect answers, respectively, in each subject. Equation (1) is based on the difficulty index [
24] which allows us to calculate the proportion of students who correctly answered the questions belonging to a subject.
Figure 4 displays the “Students’ performance” tab which contains a scatter plot and selections (radio buttons) for rendering a set of features in the scatter plot. By selecting the student ID from the checkbox list, the performance levels are shown (
Figure 4a). To define the percentage of performance, we apply Equation (2) where
and
are the numbers of correct and incorrect answers, respectively, for each student.
For the color scale, we considered five levels of performance, in increasing order: “Very low”, “Low”, “High, “Very high”, and “Accurate”. We used the proportion of performance to group the students into these different levels of performance. For color blindness, we chose the color palette presented in
Figure 5 where the colors in the leftmost column are the “true” colors, which are shown in the remaining three columns as they would seem to someone with protanopia, deuteranopia, or tritanopia. Students with a performance of 100% are in the ‘Accurate” level, students with a performance of 86% to 99% are in the “Very high” level, students with a performance of 66% to 85% are in the “High” level, students with a performance of 51% to 65% are in the “Low” level, students with a performance of 1% to 50% are in the “Very low” level. According to the performance levels, we considered “Accurate” and “Very high” levels as high-performing, “High” as average-performing, and “Low” and “Very low” as low-performing.
Teachers can use the visualizations in
Figure 4b–d to determine the total number of interactions (the total numbers of correct and incorrect answers) and the numbers of correct and incorrect answers for each date, day, and hour.
The numbers of correct and incorrect answers and the total number of interactions for each subject are shown in
Figure 4e. Since the
x-axis of the view in the Overview section represents the weekly activity, the visualizations in this tab allow teachers to dig deeper into the numbers of correct and incorrect answers for each feature. The “Students’ engagement” tab shown in
Figure 6 displays the students’ activity levels. Because the heatmap is arranged from left to right, teachers can quickly identify students who have similar numbers of activities in each of the four answer choices, as well as those who have the most and least interactions with the digital learning material. Students with the most interactions are represented by navy blue, while those with the fewest interactions are represented by light blue.
Figure 7 illustrates the “Comparison” tab that includes two dropdown components and two Sankey diagrams. A Sankey diagram can be thought of as a flow diagram in which the width of arrows is proportional to the flow quantity. In these diagrams, we visualize the contributions to a flow by designating “Student ID” as a source; “Result” as a target; and “Date Week”, “Subject number”, and “Student answer” as flow volume. Using this tab enables a teacher to compare the students’ learning outcomes by choosing their IDs from the dropdown components.
5. Use Case
In this section, we address the research questions and describe how a user (in this hypothetical example a teacher) may utilize SBGTool v2.0 to group students in different performance levels, compare two individual students’ outcomes, and gain insights into the students’ learning activities by providing an explanation of the various visualizations of SBGTool v2.0.
Figure 8 depicts an example use case of the proposed tool. Choosing a class name from the dropdown component on the top right side of the tool and selecting a week from the grouped bar and line charts allow teachers to focus on a specific week (e.g., the exam week) that is relevant for the analysis of a classroom. For this use case, we choose class 1 and select week 27 of 2019 in the Overview section.
By selecting the week, all the key metrics and visualizations in the detail section have been updated. The information for the selected week has been displayed in the table on the tool’s left-hand side. As can be seen in
Figure 8a, 55 students out of 182 students shown in
Figure 3 participated in some learning activities in that time period, and they answered 90 questions out of the 1232 questions in week 27. During this week, the total numbers of correct and incorrect answers were 62 and 28, respectively, for a total of 90. The answers to the multiple-choice questions were A (20), B (25), C (20), and D (25). The percentages of ease of different subjects presented in the bar chart on the left-hand side of
Figure 8a were 66.67%, 41.67%, 93.75%, 64.29%, 70.59%, 66.67%, and 70% in different subject numbers from 1 to 7, respectively. As can be seen in
Figure 8a, the bar chart is filtered by the ease category, and the second subject had the lowest percentage of ease, indicating that the most difficult questions were in this subject (since the students had the most incorrect answers).
As previously stated, we divide the performance into five levels, ranging from “Accurate” to “Very low”. To showcase the investigation of specific students, here we choose student IDs S119 and S622. Since the percentage of performance for student ID S119 was 80% percent, he/she was placed in the “High” level category as an average-performing student. He/she responded to a total of five questions in week 27, 2019. We sorted the table based on student IDs, and as can be seen in
Figure 8a, he/she had four correct answers and one incorrect answer. By glancing at the scatter plot, teachers can easily categorize students into five groups based on their similar learning outcomes. In addition, teachers can obtain more particular information about the subject, the numbers of correct and incorrect answers, and the answer durations by filtering and sorting the table.
Figure 8b depicts a heatmap displaying the students’ activity levels. Since S119 and S622 had the most interactions with the digital learning material as well as similar activity levels in week 27, their activity levels were placed on the left side of the heatmap. As can be seen in
Figure 8b, two of the students’ answers for both student IDs S119 and S622, belonged to option C. We filtered the table by student ID (in this case S622) and sorted it by answer duration (in this case ascending) to obtain more information about the minimum and maximum answer durations, subjects with the highest and lowest numbers of wrong answers, date of activity, and user answer to the questions belong to each subject.
A comparison of student IDs S119 and S622 is shown in
Figure 8c. In week 27, both S119 and S622 answered five questions. The diagrams presented in
Figure 8c show that S119 and S622 had the same performance since both had four right answers and one wrong answer. S119 had more difficulties in subject 6, having one incorrect answer, while S622 had more difficulties in subject 7, having one incorrect answer. By looking at these Sankey diagrams, teachers can identify the subject and the week in which students engaged in more activities and compare the learning outcomes of the students.