When are they learning? A #Moodle activity calendar heatmap

As a precursor to looking at withdrawal and drop-out points I wanted to visualise learner activity across an academic year. This could help determine the patterns of behaviour for different individuals. The idea being that you can quickly see at a glance which days each learner is using the system. Extending last week’s exploration with activity heatmaps I came across the lattice based time series calendar heatmap which provides a nice way of plotting this for exploration. It is quite a simple process that requires some date manipulation to create extra calendar classifications. Then I made a change to the facet to show each row as a different user rather than a year.

Calendar Heatmap


In the calendar heatmap each row is a learner and the grid shows the spread of daily activity across each month in a familiar calendar format. The visualisation quickly reveals patterns such as activity tailing off in the final months for Student4, the extended Easter holiday during Apr for Student8, and the late starter or crammer that is Student10. A couple of students also broke the usual Christmas learning abstinence and logged in during the holidays. There are a few variants of this that are possible to achieve by playing with the facet or applying it to different summaries of the log data for example a facet on activity types within a course or activity participation for a single learner that I may explore in future.

Continue reading

Where is your learning activity? A #Moodle component heatmap.

Understanding which courses use which tools is a useful starting point for exploration and may be informative to staff development programs or used in conjunction with course observations. The Moodle Guide for Teachers, for example, could be used to help form an understanding of the tools in question. I’m interested in the exploration side, having started a new project in the last week with some former colleagues. We’re exploring what can be learned from learner data and so if I know where different types of activity are happening then I can drill-down into these areas.

I’m using an idea I picked up from Flowing Data to create a heat map of tool use in a Moodle LMS by category. The heat map visualisation site nicely with the existing tool guide so seems a good approach. The Moodle site has been recently upgraded and so the dataset has old style logs (mdl_log) and new style logs (mdl_logstore_standard_log) so the data extraction and wrangling has to account for both formats. Then it is a case of manipulating the data into the heat map format.

Heat Map


I’ve focused on learner activity within each type of tool rather than the number of tools in a course. The intention is to show the distribution of learner activity. It shows clearly the dominance of resource and assessment type tools, as well as some pockets of communication and collaboration. In this instance the values are skewed by the large number of resource based activities and the dominance of a single department in terms of activity numbers, which can be seen in the bar chart below. However, the technique can be applied to comparing courses within a department or comparing users within a course, which may share more similar scales.


Continue reading

Assignment engagement timeline – starting with basics @salvetore #mootau15 #moodle #learninganalytics

Having joined the assessment analytics working group for Moodle Moot AU this year, I thought I’d have a play around with the feedback event data and it’s relation to future assignments. The simplified assumption to explore is that learners who view their feedback are enabled to perform better in subsequent assignments, which may be a reduction of potentially more complex distance travelled style analytics. To get started exploring the data I have produced a simple timeline that shows the frequency of assignment views within a course based on the following identified status of the submission:

  1. Pre-submission includes activities when the learner is preparing a submission
  2. Submitted includes views after submission but before receiving feedback (possibly anxious about results)
  3. Graded includes feedback views once the assignment is graded
  4. Resubmission includes activities that involve the learner resubmitting work if allowed

The process I undertook was to sort the log data into user sequences and use a function to set the status based on preceding events. For example, once the grade is released then count subsequent views as ‘graded’. This gives an idea of the spread and frequency of assignment engagement.



The timeline uses days on the x-axis and users on the y-axis. Each point plotted represents when events were logged for each learner – coloured by the status and sized according to the frequency on that day. There are a few noticeable vertical blue lines which correspond to feedback release dates (i.e. many learners view feedback immediately on its release) and you start to get an idea that some learners view feedback much more than others. The pattern of yellow points reveal learners who begin preparing for their assignment early, contrasted with those who cram a lot of activity closer to deadlines. I have zoomed into a subset of the learners below to help show this.


Having put this together quickly I am hoping I will have some time to refine the visualisation to better identify some of the relationships between assignments. I could also bring in some data from the assignment tables to enrich this having limited myself just to event data in the logs thus far. Some vertical bars showing deadlines, for example, might be helpful, or timelines for individual users with assignments on the y-axis to see how often users return to previous feedback across assignments as shown below. Here you can see the very distinct line of a feedback release, which for formative assessment it may have been better learning design to release feedback more regularly and closer to the submission.


Continue reading

Inside forum posts – politics, networks, sentiment and words! Inspired by @phillipdawson, @shaned07, and @indicoData #moodle #learninganalytics

Enhanced communication has long been championed as a benefit of online learning environments, and many educational technology strategies will include statements around increased communication and collaboration between peers. So in thinking towards an engagement metric for my current project and the need to get inside activities for my, in progress, PhD proposal exploring forum use is one of the more interesting analytics spaces within the LMS. I’ve used three techniques for my initial analysis: (1) a look at post and reply counts inspired by @phillipdawson and his work on the Moodle engagement block, (2) social network analysis inspired by a paper by @shaned07 on teacher support networks; and (3) sentiment and political view analysis provided by @indicoData as an introduction to text mining.

I’ll start with sharing the visualisations and where these might be useful and then finish with details of how I coded these.

Forum posts

Total weekly forum posts by student

Following Phillip Dawson’s work on the engagement block for Moodle, I decided to look into two posting patterns: (1) posts over time; and (2) average post word count. The over time analysis (above) compares the weekly posting pattern of each student in a group. For most students replies to peers and teachers are “in phase” suggesting that when they are active they discuss with the entire group and so learning design might focus on keeping them active. One can also notice that those who only reply to peers appear to have much lower overall post activity, which in the original engagement block would place them at-risk – learning design may consider teacher-led interventions to understand whether discussions with the teacher impact their overall activity. The average word count analysis (below) reinforces the latter case where those demonstrating that those who only reply to peers infrequently post shorter replies. Conversely those who post infrequent lengthy posts tend to target the teacher and do not follow up with many further replies discussion. There is some suggestion of an optimal word count around 75-125 for forum posts that might warrant further investigation.

Forum Posts

Social Network Analysis

Social Network Analysis

The network diagram (above) confirms what was emerging in the post analysis: that a smaller core of students (yellow circles) are responsible for a majority of the posts, and further reveals the absolute centrality of the teacher (blue circle) that highlight how important teacher-led interventions may be to this group. This is probably not surprising although the the teacher may use this to consider how they might respond more equally to the group – here the number of replies is represented by increasing thickness of the grey edges and they appear to favour conversations in the lower left of the network. A similar theme is explored by Shane Dawson (2010) in “‘Seeing’ the learning community”. One can understand this further by plotting eigenvalue centrality against betweenness centrality (below) where a student with high betweenness and low eigenvalue centrality may be an important gatekeeper to a central actor, while a student with low betweenness and high eigenvalue centrality may have unique access to central actors.


Content Analysis

Sentiment analysis

Text analysis of forums provides a necessary complement to the above analysis, exploring the content within the context. I have used the Indico API to aid my learning of this part of the field rather than try to build this from scratch. The sentiment analysis API determines whether a piece of text was positive or negative in tone and rates this on a scale from 0 (negative) to 1 (positive). Plotting this over time (above) provides insights into how different topics might have been received with this group showing generally positive participation, although with two noticeable troughs that might be worth some further exploration. The political opinion API scores political leaning within a text on a scale of 0 (neutral) to 1 (strong). Plotting this for each user (below) shows that more politicised posts tend to be conservative (unsurprising) although there is a reasonable mix of views across the discussion. What might be interesting here is how different student respond to different points of view and whether a largely conservative discussion, for example, might discourage contribution from others. Plotting sentiment against libertarian leaning (below2) shows that participants are, at least, very positive when leaning towards libertarian ideology, though this is not the only source of positivity. Exploring text analysis is fascinating and if projects such as Cognitive Presence Coding and the Quantitative Discourse Analysis Package make this more accessible then there are some potentially powerful insights to be had here. I had also hoped to analyse the number of external links embedded in posts following a talk by Gardner Campbell I heard some years ago about making external connections of knowledge, however the dataset I had yielded zero links, which while informative to learning design is not well represented in a visual (code is included below).
Political leaning

Libertarian sentiment

Continue reading

Learning logs: how long are your users online? Analytics Part 2 #moodle #learninganalytics

How long do users spend on Moodle (or more generally e-Learning) is another common question worth some initial exploration as part of my broader goal towards the notion of an engagement metric. This article discusses an approach into defining and obtaining insights from the idea of a session length for learning. This is mostly a data wrangling exercise to approximate the duration from event logs that will tell us that while all events are born equal, some are more equal than others. The algorithm should prove useful when I progress to course breakdowns in identifying particularly dedicated or struggling students who are investing larger amounts of time online, or those at-risk who aren’t spending enough. These questions are something I will return later in a future post as part of the project.

Learning Duration

This works on the same data as last week’s look at some basic distribution analysis which contains extraction SQL.

Event-duration Correlation

Event-duration Correlation

Duration distribution

Duration distribution

Session spread

Session spread

Continue reading

Scratching the surface: Moodle analytics in Rstudio Part 1 #moodle #learninganalytics

At some point I always come back to the question of how do we understand use of the VLE/LMS, which I’ve theorised a lot. As part of an interest to learn about Data Science I’ve signed up to Sliderule (@MySlideRule) and am being mentored through a capstone project with some Moodle data. The main goal is for me to learn R, which I’d never touched until 2 weeks ago, but hopefully the data can tell me something about Moodle at the same time. Feedback or advise on techniques is welcomed.

Exploratory Data Analysis on mdl_logstore_standard

For this part I am going to focus on producing some simple two-dimensional analysis. This assumes you have MySQL access to your Moodle database and RStudio.

Daily logins

Hourly access

Module use

Day of week

Frequency distribution

Activity distribution

Continue reading

Communication Plans and Agile

At a recent meeting I was prompted of the need to map the informal communications of everyday collaboration with the more formal expectations of project management – as the adage goes: ‘two monologues do not make a dialogue’. On the other hand two equally weighted discourses directed at the same referential object (or theme) must intersect and enter into a semantic bond (Bakhtin, 1984). While ideologically I might hope of encouraging innovative collaboration through renouncement of monologic habits and primitive definitiveness, in a practical sense I needed to integrate the formal plan with dynamic feature delivery. For me AGILE approaches better capture everyday collaboration while PRINCE2 better handles the formal project staging (taking the best bits of both).

The project, as most of mine do, involves the implementation of an integrated Moodle / Mahara platform enhanced with a range of customisations. So some features are delivered out of the box by the software and some require bespoke development.  Rather than sharing spreadsheets and (un)versioned documents, we have implemented Pivotal Tracker (@pivotallabs) which has proved effective in its simplicity. In order to transform a 20+ page document of requirements into a deliverable items requires a quick review and mapping of how we label and score stories. Using a macro to get the original document table into a workable spreadsheet, one can then apply a workflow mapping between the methodologies before importing into pivotal.

Icebox New ideas to be scoped for future iterations Requirements for later stages
Backlog Scheduled items for next iteration (outside current velocity) Requirements in the next work package
Current To be delivered in current iteration Requirements in the current work package
Done Features accepted as delivered Signed-off requirements

The other area I needed a mapping was between the notions of implementation (what the software does) and development (what we need to change). We also extended our point scale for development to map implementation items onto this (it remains to be seen if the value mapping is equivalent as velocity starts to be recorded).

Score Development Implementation
0 No action No action
1 Language string change Default feature / configuration
2 Minor interface change 3rd party plug-in
3 Exact requirements understood Module combinations / learning design
5 Good idea of requirements – refine through iteration Multiple options / possible training need
8 “Epic” – further investigation required Further investigation required

With a sensible labelling system to cross-map the system aspects (e.g. core, 3rd-party, development) to the requirements sections (e.g. content, assessment, communication) one can  filter on the key aspects in groups and check their status. The significance of tagging over categorisation for linked data approaches should not be underestimated here, as it allows the information to be presented within different hierarchies. A weekly review of the current and next iteration now simplifies the communication process.

I don’t claim to have done anything radical here, other than reinforce in my mind the importance of keeping the project focus on communication. Tools, methodologies and most importantly documents are only as good as the dialogues they mediate. While creative (or productive) ideas will originate in the informal everyday collaborations, if they cannot be scoped into the project then they may disappear – worse yet, this may then discourage new ideas and limit overall project innovation.

Once upon a time the most agile prince was a frog:

‘Its very funny to be a frog
You can dive into the water and cross the rivers
And the oceans
And you can jump all the time and everywhere’
M83, Raconte-Moi Histoire

I hope that the project can instil this light-hearted approach to its management.


Bakhtin, M. M. (1984). Problems of Dostoevsky’s Poetics. Minneapolis: University of Minnesota Press.