Scratching the surface: Moodle analytics in Rstudio Part 1 #moodle #learninganalytics

At some point I always come back to the question of how do we understand use of the VLE/LMS, which I’ve theorised a lot. As part of an interest to learn about Data Science I’ve signed up to Sliderule (@MySlideRule) and am being mentored through a capstone project with some Moodle data. The main goal is for me to learn R, which I’d never touched until 2 weeks ago, but hopefully the data can tell me something about Moodle at the same time. Feedback or advise on techniques is welcomed.

Exploratory Data Analysis on mdl_logstore_standard

For this part I am going to focus on producing some simple two-dimensional analysis. This assumes you have MySQL access to your Moodle database and RStudio.

Daily logins

Hourly access

Module use

Day of week

Frequency distribution

Activity distribution

Step 1. Data Extraction

I started with a full data extraction of all events in the system to a CSV file (mdl_logstore_standard_log.csv).

SELECT c.fullname as courseName,
FROM_UNIXTIME(l.timecreated) as DateTime, l.*
FROM mdl_logstore_standard_log l
LEFT JOIN mdl_course c ON l.courseid = c.id
WHERE origin = 'web'

Step 2. Data Wrangling

In order to do time series analysis the data needs some reformatting.

Install necessary R packages

library(ggplot2)
require(scales)
library(dplyr)
library(tidyr)
library(magrittr)
library(ggthemes)

Set your working folder to the directory with your CSV

setwd("./Documents/")

Import the CSV file

mdl_log <- read.csv('mdl_logstore_standard_log.csv')

Create a POSIXlt time field and break down into day, month, year, and hour components

mdl_log$time <- as.POSIXlt(mdl_log$timecreated,
 tz = "Australia/Sydney", origin="1970-01-01")
mdl_log$day <- mdl_log$time$mday
mdl_log$month <- mdl_log$time$mon+1 # month of year (zero-indexed)
mdl_log$year <- mdl_log$time$year+1900 # years since 1900
mdl_log$hour <- mdl_log$time$hour

Create a date format field and break down into week component

mdl_log$date <- as.Date(mdl_log$DateTime)
mdl_log$week <- format(mdl_log$date, '%Y-%U')

Create a timestamp version for the day for daily time series

mdl_log$dts <- as.POSIXct(mdl_log$date)

Create a timestamp version of hour for hourly time series

mdl_log$dts_str <- interaction(mdl_log$day,mdl_log$month,mdl_log$year,mdl_log$hour,sep='_')
mdl_log$dts_hour <- strptime(mdl_log$dts_str, "%d_%m_%Y_%H")
mdl_log$dts_hour <- as.POSIXct(mdl_log$dts_hour)

Filter to participation education level events for 2015

mdl_2015 <- subset(mdl_log, year == 2015)
participation_2015 <- subset(mdl_2015, edulevel %in% c('2'))

Create the dplyr data table

d <- tbl_df(mdl_log)
d %>% mutate(time = as.POSIXct(time))

Step 3. Data Visualisation

Daily activity data

Create the daily aggregation

daily <- group_by(d, userid, dts) %>% summarise(Total = n())

Create a day of the week factor

daily$dow = as.factor(format(daily$dts, format="%a"))

Plot the day of the week breakdown

I’m using a really small sample size here, which creates similar results across all days, however most real populations have a degree of variance.

ggplot(daily, aes(dow, Total)) +
geom_boxplot(aes(fill=dow)) +
scale_x_discrete(limits=c('Mon','Tue','Wed','Thu','Fri','Sat','Sun')) +
theme_few() +
xlab('Day of week') + ylab('User activity frequency') +
guides(fill=FALSE)

Hourly data

Create the hourly data aggregation

hourly <- group_by(d, dts_hour) %>% summarise(Total = n())

Create the hour of day factors

hourly$dow = as.factor(format(hourly$dts, format="%a"))
hourly$hr = format(hourly$dts_hour, format="%H")

Separate weekends

hourly$weekend = 'weekday'
hourly[hourly$dow=='Sat'|hourly$dow=='Sun',]$weekend = 'weekend'

Plot the hour of the day breakdown

Gives an idea of the spread of activity throughout the day and indicates people sleep in on weekends.

Hourly access

ggplot(hourly, aes(hr,Total)) +
geom_boxplot(aes(fill=weekend)) +
geom_smooth(aes(group=weekend)) +
scale_fill_manual(values=cbPalette) +
xlab('Hour of day') + ylab('Daily activity frequency')

Distinct user data

Create the distinct user aggregation

udaily <- group_by(d, dts) %>% summarise(users = n_distinct(userid))

Plot the daily logins data

Gives an idea of user logins per day – can be analysed as a percentage of your total user base.

logins

ggplot(udaily, aes(dts, users)) +
geom_bar(stat="identity", fill="#60B3CE") +
scale_x_datetime(breaks = date_breaks("1 week"),
minor_breaks = date_breaks("1 day"),
labels = date_format("%d-%b-%y")) +
scale_color_manual(values=cbPalette) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x="Day", y="Distinct number of users")

Activity distribution

Create the user activity aggregation

utotal <- group_by(d, userid) %>% summarise(Total = n())

Plot the activity distribution

Provides a sorted list of activity to see the nature of activity distribution for users. A small number of users creating a large number of events.

Activity by user

ggplot(utotal, aes(reorder(userid, Total), Total)) +
geom_point(alpha=0.5, color = "#FF8300") +
scale_x_discrete(breaks=NULL) +
xlab('User') + ylab('Total activity')

Plot the frequency distribution of activity per user

Provides a histogram of the above plot confirming the skew of the data to the majority being low active users and a minority creating a large number of events. This may be concerning in an educational context and warrants further analysis (in a future post).

Frequency

ggplot(utotal, aes(Total)) +
geom_histogram(binwidth=10, fill="#60B3CE") +
xlab('Total activity') + ylab('Frequency of users')

Module use

Create the component aggregation

component <- group_by(d, userid, component) %>%
 summarise(Total = n())

Plot the module use 

This gives an idea of the spread of tools being used within course or learning design. Quality is more important than quantity here but this might be a useful springboard into further analysis.

Module use

ggplot(subset(component, component != 'core'),
aes(x = component, y = Total)) +
geom_bar(stat="identity", fill="#FF8300") +
coord_polar(theta = "x") +
labs(x="Component", y="Total number of events") +
scale_fill_manual(values=cbPalette)

I’ve used a polar version of the bar chart, but you can also get a bar chart view of this with the following:

ggplot(component, aes(x = component, Total)) +
geom_bar(stat="identity", fill="#60B3CE") +
coord_flip() +
labs(x="Component", y="Total number of events") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Advertisements

10 thoughts on “Scratching the surface: Moodle analytics in Rstudio Part 1 #moodle #learninganalytics

  1. This is great. Source available (on github)?

    Like

  2. Missing ‘,’ in `labs(x=”Day” y=”Distinct number of users”)`?

    Like

  3. I had to remove yearLevel as it doesn’t appear in my logs.

    Like

    • Yeah, this was a custom profile field from the specific project that I was working and I forgot to redact from the general approach here. Updated now.

      Like

  4. I also had to comment a line out for it to compile. Here’s my modified code

    #Error in strptime(mdl_log$date) : argument “format” is missing, with no default
    #mdl_log$dts <- strptime(mdl_log$date)

    Like

Share your feedback

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s