Introduction to Graph Models for Clickstream Data

An Educational Example from a Simulated Classroom

Matthias Mueckshoff

Published in

Towards Data Science

9 min readMar 10, 2021

Introduction

Whenever we analyze how people interact with digital content, we have a tendency to view people’s behaviors as an expression of their thoughts and emotions. But that is just one side of the coin. The environment in which people act (online or offline) has an important influence on the way people interact with it.

Pictures and big flashy elements draw attention, small stuff at the bottom of a page often gets ignored. If we want to create meaningful digital journeys and make peoples’ lives easier, we need to understand what influence digital content has on its users and their behaviors.

But how can we disentangle those two sources of influence? At Bamberg University I was part of a research project addressing that issue in an educational context. Let me elaborate.

Research Context

Let’s think about school.

We wanted to know how exactly teachers form their judgement about student achievement. And since real classrooms are a highly complex and interactive environments, we opted for a simulated version that participants can work on at a computer. This way we could record each mouse click and end up with a set of clickstream data. But what to do with the data?

That’s where Exponential Random Graph Models (ERGM) enter the stage. But let’s look at the research set up first.

Experimental Set Up

We chose 12 math tasks of different types with a broad range of difficulties. Student ability was operationalized by the highest level of task difficulty reached by each student. The probability that a student solves a certain task changes from 1 to 0, when a task difficulty is above the particular student’s ability (c.f. deterministic latent-trait-model [1]). Our version of the Simulated Classroom looks like in the picture below.

Screenshot of the Simulated Classroom Software. Image by author.

Initially, participants had to estimate the difficulties of the 12 items. In the second step, six student descriptions were presented indicating the students’ names, age, gender, number of siblings and their parents’ occupation, in order to give potentially relevant and less relevant information. Two example descriptions are presented in the figure below.

In the Simulated Classroom the participants could use the initially evaluated 12 items to assess the abilities of the six simulated students with the goal to make a judgement about each of the six simulated students’ ability. Each item being assigned to a student was counted as one interaction. After having finished the assessment in the Simulated Classroom, participants had to evaluate students’ mathematical abilities on an 11-point Likert-Scale from 0 to 100.

Sample

The sample consists of 37 German secondary school teachers. 59% of the participants were female. The participants on average had 11.37 years (SD = 10.13) of teaching experience.

Let’s Talk about Networks.

Clickstream Networks

Clickstream data can be understood as a sequence of actions taken during the work in a simulation-based or digital environment. Each action in that sequence is recorded in a log-file. For each step, participants can choose from a finite set of possible actions. Future actions are usually influenced by previous actions.

Those clickstream data can be represented as networks. Each node represents one action and directed edges between nodes indicate transitions from one action to the next. Wasserman and Faust [2] describe those types of networks as weighted directed networks or directed graphs. Those can be defined as a set of nodes V = {v1, v2, …., vg), a set of links L = {l2, l2, …, lt) and a set of weights for the links W = {w1, w1,…wg}. Such a network can be described by a g x g adjacency matrix M. The nodes are represented in the rows and columns and the numbers in the cells indicate, how often a transition from the node in the row to the node in the column has been chosen

Defining the nodes is crucial

We defined the nodes as one item being assigned to one student. With 12 items and 6 students this results in a 72-node network, one node for each student-task combination. So each node represents two choice options, one for the item and one for the student.

Since participants were limited to a maximum of 50 actions, resulting in 49 ties representing sequential interactions between nodes, individual networks were very sparse. Thus, the sample was divided into two groups based on the judgement accuracy of the estimated student abilities. Participants with a rank-correlation of r = .70 or higher between their ability judgement and the empirical student abilities were considered to have a high judgement accuracy. A rank-order correlation of r < .70 was considered to be low. This results in n(high) = 25 and n(low) = 12 participants. By adding up the individual adjacency matrices for each group we ended up with two group networks that were more densely connected.

Exponential Random Graph Models

Generally speaking, any system with potential dependencies between observations can be cast as a network. Applying standard regression approaches and thus ignoring the dependencies leads to biased estimates and violates the assumption in regression models that observations must be independent from each other and are identically distributed.

Exponential Random Graph Models (ERGMs) allow us to explicitly model these dependencies. The focus of ERGMs is to describe what dyadic, monadic, and higher-order mechanisms or covariates jointly lead to the observed structure [3]. ERGMs build on network statistics (e.g. number of edges, number of mutual ties (reciprocity) or centrality measures) and assign a probability to graphs according to these statistics:

The probability of a given network N is given by a sum of network statistics divided by the sum of all permutations of N with the same number of edges and the same network statistics as N. θ is a vector with coefficients for the network statistics h(N).

The output of an ERGM, the θ coefficient estimates can be interpreted analogous to the output of regression models. A positive and significant effect for a coefficient corresponding to a structural feature like reciprocity means that this feature occurs more often than would be expected by chance [4]. The interpretation for coefficients corresponding to node attributes (covariates) is different. In that case, the conditional log-odds of an edge connecting two nodes i and j is understood to be increased by the product of the coefficients and the sum of covariate values for the two nodes.

Building the models

We compute the ERGMs for the two groups of participants with a high and low judgement accuracy. The analyses are conducted using the R package statnet [5]. The variables and coefficient estimates are shown in the Table below. Model coefficients in the results below can only interpreted within the two models. Coefficients from different models referring to the same variable are not investigated to be (significantly) different from each other. The two models are equally specified, i.e., they contain the same variables.

Results

To keep in mind: ERGMs can only handle binary edges, i.e., a tie either exists or does not exist. Since we started with weighted edges in the two group-based networks we lose a certain amount of information. The results of the two models are presented in the table below.

Influence by Design

The way the Simulated Classroom is designed participants can only choose from a finite set of actions where items and students have to be chosen alternatingly. If participants do not choose items and students completely at random, the participants produce a student and/or item focus by design. The results show a clear tendency in both groups to assign one item to multiple students (Same item) as well as to focus on one student and assign different items to this student (Same Student). However, those tendencies are likely to be influenced by the design of the Simulated Classroom, i.e., by the limitation to choose items and students alternatingly and do not purely reflect meaningful cognitive selection processes.

Boys or Girls?

The result for the relevance of student gender points in a similar direction. Participants in both groups appear to assign items to boys and girls alternatingly (Same Gender). This could be a reflection of the seating position in the Simulated Classroom (alternating gender attribute) and thus the tendency to choose students in sequence that simply sit next to each other.

Rich Kid Poor Kid

Connecting students with the same social status attribute in a row (Same Social Status) could mean participants wanted to contrast those students. Or it can be a result of assigning items to students that sit next to each other. The students in the simulated classroom are arranged so that three students with the same high or low social status sit next to each other. A significant tendency to assign items to students that have the same social status attribute can be interpreted as a pseudo-effect.

Seating arrangement

Participants moving closely along the seating position in the simulated classroom automatically ask students in a row that have the same social status. This rather reflects a certain behavioral pattern as opposed to underlying cognitive processes that actively make use of the social status information. Compared to participants in the high judgement accuracy group, participants from the low judgement accuracy group show a significant tendency to follow some kind of assessment routine that moves along the seating position of the students in the simulated classroom and show less differentiated assessment behaviors than participants in the high judgement accuracy group.

What we learned so far

Participants in both groups show no tendencies to focus more on boys vs. girls or on students with a high vs a low social status. Furthermore, no simulated student is particularly popular, i.e., is asked more questions compared to any other student. Student characteristics in general do not seem to have relevant influence on the judgement processes of participants in both groups in this Simulated Classroom setting. These findings support the interpretation that the similarity effects for gender and social status reflect behavioral tendencies and not purposeful selection processes.

But what about the items?

So far, the results describe mainly overall tendencies to use student characteristics and general assessment strategies. By including Item terms we can explore if participants use different items after accounting for the aforementioned student characteristics and assessment strategies. The results for the high and the low judgement accuracy group show that there are clear differences in choosing items to assess student abilities.

The model for the high judgement accuracy group shows a significant and negative effect for ten out of eleven items compared to the baseline of item 2 which has a medium empirical difficulty. The other items are of both of low and rather high empirical difficulty. That means that participants from the high judgement accuracy group have a slight preference for items of medium empirical difficulty compared to items with either high or low empirical difficulty after controlling for student characteristics and assessment strategies.

Participants from the low judgement accuracy group do not show differentiating effects of item selection. In the model for the low judgement accuracy group the coefficients for all items are not significantly different from 0. All items have the same chances of being selected.

Wrapping it up

By using ERGMs we are able to account for content variables like item or student attributes as well as behavioral variables resulting from the participants’ interactions with the Simulated Classroom at the same time in one model. This allows us to better understand how much exactly the design of the digital content (in this case the Simulated Classroom) actually influences the way people interacted with it. The results from this study certainly need to be validated since ERGMs in their very nature can’t incorporate all relevant factors, especially weighted edges.

In a sequel article I talk more about technical aspects and the entire workflow from data preparation to model evaluation with code in R.

References

[1] L. Guttman, The basis for scalogram analysis (1950), In S.A. Stouffer (Ed.), The American soldier. Studies in social psychology in World War II, Princeton: Princeton University Press

[2] S. Wassermann & K. Faust, Social network analysis: Methods and applications (2018), Cambridge University Press

[3] G. Robins, P. Pattison, Y. Kalisher & D. Lusher, An introduction to exponential random graph (p*) models for social networks (2007), Social networks

[4] M. Zhu, Z. Shu & A. A. von Davier, Using networks to visualize and analyze process data for educational assessment (2016), Journal of Educational Measurement

[5] M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau & M. Morris, statnet: Software tools for the representation, visualization, analysis and simulation of network data (2008), Journal of statistical software