As a precursor to looking at withdrawal and drop-out points I wanted to visualise learner activity across an academic year. This could help determine the patterns of behaviour for different individuals. The idea being that you can quickly see at a glance which days each learner is using the system. Extending last week’s exploration with activity heatmaps I came across the lattice based time series calendar heatmap which provides a nice way of plotting this for exploration. It is quite a simple process that requires some date manipulation to create extra calendar classifications. Then I made a change to the facet to show each row as a different user rather than a year.
In the calendar heatmap each row is a learner and the grid shows the spread of daily activity across each month in a familiar calendar format. The visualisation quickly reveals patterns such as activity tailing off in the final months for Student4, the extended Easter holiday during Apr for Student8, and the late starter or crammer that is Student10. A couple of students also broke the usual Christmas learning abstinence and logged in during the holidays. There are a few variants of this that are possible to achieve by playing with the facet or applying it to different summaries of the log data for example a facet on activity types within a course or activity participation for a single learner that I may explore in future.
How to guide
The following shares the code used to produce the above visualisations and should work with recent Moodle versions.
Step 1: Data Extraction
This uses the same log data extraction as last week, although it actually only needs a user identifier and date. This makes the process easily repeatable to other time series data sets outside of Moodle logs or to compare multiple systems.
Step 2: Data Wrangling
Load the libraries, files as before.
require(quantmod) require(ggplot2) require(reshape2) require(scales) require(dplyr) require(tidyr) require(magrittr) require(RColorBrewer) setwd("/home//james/infiniter/data/") mdl_log = read.csv(file = "mdl_log.csv", header = TRUE, sep = ",") cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") getPalette = colorRampPalette(brewer.pal(9, "Paired"))
This analysis required an extension to the date manipulations and as I couldn’t get the week of month to work as in the original guide I used an alternative method. So first I created a function to calculate the week of the year from a date.
wk <- function(x) as.numeric(format(x, "%U"));
Then I ran my usual first set of date manipulations
### Create a POSIX time from timestamp mdl_log$time <- as.POSIXlt(mdl_log$timecreated, tz = "Australia/Sydney", origin="1970-01-01") mdl_log$day <- mdl_log$time$mday mdl_log$month <- mdl_log$time$mon+1 # month of year (zero-indexed) mdl_log$year <- mdl_log$time$year+1900 # years since 1900 mdl_log$hour <- mdl_log$time$hour mdl_log$date <- as.Date(mdl_log$DateTime) mdl_log$weekyr <- format(mdl_log$date, '%Y-%U') mdl_log$mon <- format(mdl_log$date, "%b") mdl_log$dts <- as.POSIXct(mdl_log$date) mdl_log$dts_str <- interaction(mdl_log$day,mdl_log$month,mdl_log$year,mdl_log$hour,sep='_') mdl_log$dts_hour <- strptime(mdl_log$dts_str, "%d_%m_%Y_%H") mdl_log$dts_hour <- as.POSIXct(mdl_log$dts_hour)
And extended the date fields to create the year, month, week and day elements for the calendar by applying the guide to the log data. The basic idea is to create some human readable factors for days and months and then to create a week of the month normalised so each month starts with week 1. I reordered the months to follow the academic calendar of the data to more clearly see the patterns in context.
mdl_log$monthf<-factor(mdl_log$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE) mdl_log$weekday = as.POSIXlt(mdl_log$date)$wday mdl_log$weekdayf<-factor(mdl_log$weekday,levels=rev(0:6),labels=rev(c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")),ordered=TRUE) mdl_log$yearmonth<-as.yearmon(mdl_log$date) mdl_log$yearmonthf<-factor(mdl_log$yearmonth) # then find the "week of year" for each day mdl_log$week <- as.numeric(format(mdl_log$date,"%W")) # and now for each monthblock we normalize the week to start at 1 mdl_log$monthweek <- (wk(mdl_log$date) - wk(as.Date(cut(mdl_log$date, "month"))) + 1) mdl_log$monthf<- factor(mdl_log$monthf, levels = c("Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul")) mdl_log$yearmonth<-format(mdl_log$yearmonth, "%Y-%m")
Finally use the group by function to tidy the data into an appropriate format and sums the total activity through the different date windows year, month, week, day respectively.
d <- tbl_df(mdl_log) d %<>% mutate(time = as.POSIXct(time)) %>% mutate(year = as.factor(year)) user_grid <- group_by(d, username, year,monthf, monthweek, weekdayf) %>% summarise(total = n())
Step 3: Data Visualisation
The visualisation works best with up to 15 learners so for larger classes this will need splitting across multiple visualisations. I filtered to a sample of 10 learners which remained readable on a smaller resolution by using the top 10 active users. You can continue slicing the user rows to create larger sets.
users <- group_by(d, username) %>% summarise(total = n()) %>% arrange(desc(total)) u.10 <- users[1:10,] sample <- user_grid %>% filter(username %in% u.10$username) u.20 <- users[11:20,] sample2 <- user_grid %>% filter(username %in% u.20$username)
And finally plot each set using ggplot with a facet grid on user and month.
ggplot(sample, aes(monthweek, weekdayf, fill = total)) + geom_tile(colour = "white") + facet_grid(username~monthf) + scale_fill_gradient(low="lightsteelblue", high="steelblue") + xlab("Week of Month") + ylab("")