Educational Data Mining and Learning Analytics - EdTech

Preparing to load PDF file. please wait...

0 of 0
100%
Educational Data Mining and Learning Analytics - EdTech

Transcript Of Educational Data Mining and Learning Analytics - EdTech

Educational Data Mining and Learning Analytics
Potentials and Possibilities for Online Education
Ryan S. Baker & Paul Salvador Inventado

Editor’s Note
The following was reprinted from Emergence and Innovation in Digital Learning [https://edtechbooks.org/-uG], an open textbook edited by George Veletsianos.

Baker, S., & Inventado, P. S. (2016). Educational data mining and learning analytics: Potentials and possibilities for online education. In G. Veletsianos (Ed.), Emergence and Innovation in Digital Learning (83–98). doi:10.15215/aupress/9781771991490.01

Over the last decades, online and distance education has become an increasingly prominent part of the higher educational landscape (Allen & Seaman, 2008; O’Neill et al., 2004; Patel & Patel, 2005). Many learners turn to distance education because it works better for their schedule, and makes them feel more comfortable than traditional face-to-face courses (O’Malley & McCraw, 1999). However, working with distance education presents challenges for both learners and instructors that are not present in contexts where teachers can work directly with their students. As learning is mediated through technology, learners have fewer opportunities to communicate to

Foundations of Learning and Instructional Design Technology

1

instructors about areas in which they are struggling. Though discussion forums provide an opportunity that many students use, and in fact some students are more comfortable seeking help online than in person (Kitsantas & Chow, 2007), discussion forums depend upon learners themselves realizing that they are facing a challenge, and recognizing the need to seek help. Further, many students do not participate in forums unless given explicit prompts or requirements (Dennen, 2005). Unfortunately, the challenges of help-seeking are general: many learners, regardless of setting, do not successfully recognize the need to seek help, and fail to seek help in situations where it could be extremely useful (Aleven et al., 2003). Without the opportunity to interact with learners in a face-to-face setting, it is therefore harder for instructors as well to recognize negative affect or disengagement among students.
Beyond a student not participating in discussion forums, ceasing to complete assignments is a clear sign of disengagement (Kizilcec, Piech, & Schneider, 2013), but information on these disengaged behaviors is not always available to instructors, and more subtle forms of negative affect (such as boredom) are difficult for an unaided distance instructor to identify and diagnose. As such, a distance educator has additional challenges compared to a local instructor in identifying which students are at-risk, in order to provide individual attention and support. This is not to say that face-to-face instructors always take action when a student is visibly disengaged, but they have additional opportunities to recognize problems.
In this chapter, we discuss educational data mining and learning analytics (Baker & Siemens, 2014) as a set of emerging practices that may assist distance education instructors in gaining a rich understanding of their students. The educational data mining (EDM) and learning analytics (LA) communities are concerned with exploring the increasing amounts of data now becoming available on learners, toward providing better information to instructors and better support to learners. Through the use of automated discovery methods,

Foundations of Learning and Instructional Design Technology

2

leavened with a workable understanding of educational theory, EDM/LA practitioners are able to generate models that identify at-risk students so as to help instructors to offer better learner support. In the interest of provoking thought and discussion, we focus on a few key examples of the potentials of analytics, rather than exhaustively reviewing the increasing literature on analytics and data mining for distance education.
Data Now Available in Distance Education
One key enabling trend for the use of analytics and data mining in distance education is that distance education increasingly provides high-quality data in large quantities (Goldstein & Katz, 2005). In fact, distance education has always involved interactions that could be traced, but increasingly data from online and distance education is being stored by distance education providers in formats designed to be usable. For example, The Open University (UK), an entirely online university with around 250,000 students, collects large amounts of electronic data including student activity data, course information, course feedback and aggregated completion rates, and demographic data (Clow, 2014). The university’s Data Wranglers project leverages this data by having a team of analytics experts analyze and create reports about student learning, which are used to improve course delivery. The University of Phoenix, a for-profit online university, collects data on marketing, student applications, student contact information, technology support issue tracking, course grades, assignment grades, discussion forums, and content usage (Sharkey, 2011). These disparate data sources are integrated to support analyses that can predict student persistence in academic programs (Ming & Ming, 2012), and can facilitate interventions that improve student outcomes.
Massive Open Online Courses (MOOCs), another emerging distance education practice, also generate large quantities of data that can be

Foundations of Learning and Instructional Design Technology

3

utilized for these purposes. There have been dozens of papers exploiting MOOC data to answer research questions in education in the brief time since large-scale MOOCs became internationally popular (see, for instance, Champaign et al., 2014; Kim et al., 2014; Kizilcec et al., 2013). The second-largest MOOC platform, edX, now makes large amounts of MOOC data available to any researcher in the world. In addition, formats have emerged for MOOC data that are designed to facilitate research (Veeramachaneni, Dernoncourt, Taylor, Pardos, & O’Reilly, 2013).
Increasingly, traditional universities are collecting the same types of data. For example, Purdue University collects and integrates educational data from various systems including content management systems (CMS), student information systems (SIS), audience response systems, library systems and streaming media service systems (Arnold, 2010). This institution uses this data in their Course Signals project, discussed below.
One of the key steps to making data useful for analysis is to preprocess it (Romero, Romero, & Ventura, 2013). Pre-processing can include data cleaning (such as removing data stemming from logging errors, or mapping meaningless identifiers to meaningful labels), integrating data sources (typically taking the form of mapping identifiers—which could be at the student level, the class level, the assignment level or other levels—between data sets of tables), and feature engineering (distilling appropriate data to make a prediction). Typically, the process of engineering and distilling appropriate features that can be used to represent key aspects of the data is one of the most time-consuming and difficult steps in learning analytics. The process of going from the initial features logged by an online learning system (such as correctness and time, or the textual content of a post) to more semantic features (history of correctness on a specific skill; how fast an action is compared to typical time taken by other students on the same problem step; emotion expressed and context in a discussion of a specific discussion forum post) involves

Foundations of Learning and Instructional Design Technology

4

considerable theoretical understanding of the educational domain. This understanding is sometimes encoded in schemes for formatting and storing data, such as the MOOC data format proposed by Veeramachaneni et al. (2013) or the Pittsburgh Science of Learning Center DataShop format (Koedinger, Baker, Cunningham, Skogsholm, Leber, & Stamper, 2010).
Methods for Educational Data Mining and Learning Analytics
In tandem with the development of these increasingly large data sets, a wider selection of methods to distill meaning have emerged; these are referred to as educational data mining or learning analytics. As Baker and Siemens (2014) note, the educational data mining and learning analytics communities address many of the same research questions, using similar methods. The core differences between the communities are in terms of emphasis: whether human analysis or automated analysis is central, whether phenomena are considered as systems or in terms of specific constructs and their interrelationships, and whether automated interventions or empowering instructors is the goal. However, for the purposes of this article, educational data mining and learning analytics can be treated as interchangeable, as the methods relevant to distance education are seen in both communities. Some of the differences emerge in the section on uses to benefit learners, with the approaches around providing instructors with feedback being more closely linked to the learning analytics community, whereas approaches to providing feedback and interventions directly to students are more closely linked to practice in educational data mining.
In this section, we review the framework proposed by Baker and Siemens (2014); other frameworks for understanding the types of EDM/LA method also exist (e.g., Baker & Yacef, 2009; Scheuer & McLaren, 2012; Romero & Ventura, 2007; Ferguson, 2012). The

Foundations of Learning and Instructional Design Technology

5

differences between these frameworks are a matter of emphasis and categorization. For example, parameter tuning is categorized as a method in Scheuer and McLaren (2012); it is typically seen as a step in the prediction modeling or knowledge engineering process in other frameworks. Still, mostly the same methods are present in all frameworks. Baker and Siemens (in press) divide the world of EDM/LA methods into prediction modeling, structure discovery, relationship mining, distillation of data for human judgment, and discovery with models. In this chapter, we will provide definitions and examples for prediction, structure discovery, and relationship mining, focusing on methods of particular usefulness for distance education.
Prediction
Prediction modeling occurs when a researcher or practitioner develops a model, which can infer (or predict) a single aspect of the data, from some combination of other variables within the data. This is typically done either to infer a construct that is latent (such as emotion), or to predict future outcomes. In these cases, good data on the predicted variable is collected for a smaller data set, and then a model is created with the goal of predicting that variable in a larger data set, or a future data set. The goal is to predict the construct in future situations when data on it is unavailable. For example, a prediction model may be developed to predict whether a student is likely to drop or fail a course (e.g., Arnold, 2010; Ming & Ming, 2012). The prediction model may be developed from 2013 data, and then utilized to make predictions early in the semester in 2014, 2015, and beyond. Similarly, the model may be developed using data from four introductory courses, and then rolled out to make predictions within a university’s full suite of introductory courses.
Prediction modeling has been utilized for an ever-increasing set of problems within the domain of education, from inferring students’ knowledge of a certain topic (Corbett & Anderson, 1995), to inferring a student’s emotional state (D’Mello, Craig, Witherspoon, McDaniel, &

Foundations of Learning and Instructional Design Technology

6

Graesser. 2008). It is also used to make longer-term predictions, for instance predicting whether a student will attend college from their learning and emotion in middle school (San Pedro, Baker, & Gobert, 2013).
One key consideration when using prediction models is distilling the appropriate data to make a prediction (sometimes referred to as feature engineering). Sao Pedro et al. (2012) have argued that integrating theoretical understanding into the data mining process leads to better models than a purely bottom-up data-driven approach. Paquette, de Carvalho, Baker, and Ocumpaugh (2014) correspondingly find that integrating theory into data mining performs better than either approach alone. While choosing an appropriate algorithm is also an important challenge (see discussion in Baker, 2014), switching algorithms often involves a minimal change within a data mining tool, whereas distilling the correct features can be a substantial challenge.
Another key consideration is making sure that data is validated appropriately for its eventual use. Validating models on a range of content (Baker, Corbett, Roll, & Koedinger, 2008) and on a representative sample of eventual students (Ocumpaugh, Baker, Gowda, Heffernan & Heffernan, 2014) is important to ensuring that models will be valid in the contexts where they are applied. In the context of distance education, these issues can merge: the population of students taking one course through a distance institution may be quite different than the population taking a different course, even at the same institution. Some prediction models have been validated to function accurately across higher education institutions, which is a powerful demonstration of generality (Jayaprakash, Moody, Lauría, Regan, & Baron, 2014).
As with other areas of education, prediction modeling increasingly plays an important role in distance education. Arguably, it is the most prominent type of analytics within higher education in general, and

Foundations of Learning and Instructional Design Technology

7

distance education specifically. For example, Ming and Ming (2012) studied whether students’ final grades could be predicted from their interactions on the University of Phoenix class discussion forums. They found that discussion of more specialized topics was predictive of higher course grades. Another example is seen in Kovacic’s (2010) work studying student dropout in the Open Polytechnic of New Zealand. This work predicted student dropout from demographic factors, finding that students of specific demographic groups were at much higher risk of failure than other students.
Related work can also be seen within the Purdue Signals Project (Arnold, 2010), which mined content management system, student information system, and gradebook data to predict which students were likely to drop out of a course and provide instructors with near real-time updates regarding student performance and effort (Arnold & Pistilli, 2012; Campbell, DeBlois, & Oblinger, 2007). These predictions were used to suggest interventions to instructors. Instructors who used those interventions, reminding students of the steps needed for success, and recommending face-to-face meetings, found that their students engaged in more help-seeking, and had better course outcomes and significantly improved retention rates (Arnold, 2010).
Structure Discovery
A second core category of learning LA/EDM is structure discovery. Structure discovery algorithms attempt to find structure in the data without an a priori idea of what should be found: a very different goal than in prediction. In prediction, there is a specific variable that the researcher or practitioner attempts to infer or predict; by contrast, there are no specific variables of interest in structure discovery. Instead, the researcher attempts to determine what structure emerges naturally from the data. Common approaches to structure discovery in LA/EDM include clustering, factor analysis, network analysis, and domain structure discovery.

Foundations of Learning and Instructional Design Technology

8

While domain structure discovery is quite prominent in research on intelligent tutoring systems, the type of structure discovery most often seen in online learning contexts is a specific type of network analysis called Social Network Analysis (SNA) (Knoke & Yang, 2008). In SNA, data is used to discover the relationships and interactions among individuals, as well as the patterns that emerge from those relationships and interactions. Frequently, in learning analytics, SNA is paired with additional analytics approaches to better understand the patterns observed through network analytics; for example, SNA might be coupled with discourse analysis (Buckingham, Shum, & Ferguson, 2012).
SNA has been used for a number of applications in education. For example, Kay, Maisonneuve, Yacef, and Reimann (2006) used SNA to understand the differences between effective and ineffective project groups, through visual analysis of the strength of group connections. Although this project took place in the context of a face-to-face university class, the data analyzed was from online collaboration tools that could have been used at a distance. SNA has also been used to study how students’ communication behaviors in discussion forums change over time (Haythornthwaite, 2001), and to study how students’ positions in a social network relate to their perception of being part of a learning community (Dawson, 2008), a key concern for distance education. Patterns of interaction and connectivity in learning communities are correlated to academic success as well as learner sense of engagement in a course (Macfadyen & Dawson, 2010; Suthers & Rosen, 2011).
Relationship Mining
Relationship mining methods find unexpected relationships or patterns in a large set of variables. There are many forms of relationship mining, but Baker and Siemens (2014) identify four in particular as being common in EDM: correlation mining, association rule mining, sequential pattern mining, and causal data mining. In this

Foundations of Learning and Instructional Design Technology

9

section, we will mention potential applications of the first three.
Association rule mining finds if-then rules that predict that if one variable value is found, another variable is likely to have a characteristic value. Association rule mining has found a wide range of applications in educational data mining, as well as in data mining and e-commerce more broadly. For example, Ben-Naim, Bain, and Marcus (2009) used association rule mining to find what patterns of performance were characteristic of successful students, and used their findings as the basis of an engine that made recommendations to students. Garcia, Romero, Ventura, and De Castro (2009) used association rule mining on data from exercises, course forum participation, and grades in an online course, in order to gather data related to effectiveness to provide to course developers. A closely related method to association rule mining is sequential pattern mining. The goal of sequential pattern mining is to find patterns that manifest over time. Like association rule mining, if-then rules are found, but the if-then rules involve associations between past events (if) and future events (then). For example, Perera, Kay, Koprinska, Yacef, and Zaiane (2009) used sequential pattern mining on data from learners’ behaviors in an online collaboration environment, toward understanding the behaviors that characterized successful and unsuccessful collaborative groups. One could also imagine conducting sequential pattern mining to find patterns in course-taking over time within a program that are associated with more successful and less successful student outcomes (Garcia et al., 2009). Sequential patterns can also be found through other methods, such as hidden Markov models; an example of that in distance education is seen in Coffrin, Corrin, de Barba, and Kennedy (2014), a study that looks at patterns of how students shift between activities in a MOOC.
Finally, correlation mining is the area of data mining that attempts to find simple linear relationships between pairs of variables in a data set. Typically, in correlation mining, approaches such as post-hoc statistical corrections are used to set a threshold on which patterns
Foundations of Learning and Instructional Design Technology 10
DataStudentsData MiningDistance EducationAnalytics