What we talk about when we talk about ‘rating systems’

stars

Context

I’m MoodleNet Lead and, since the project’s inception, I’ve had lots of conversations with many different people. Once they’ve grasped that MoodleNet is a federated resource-centric social network for educators, some of them ask a variation of this question: Oh, I assume you’ll be using a star rating system to ensure quality content?

They are often surprised when I explain that no, that’s not the plan at all. I haven’t written down why I’m opposed to star rating systems for educational content, so what follows should hopefully serve as a reference I can point people towards next time the issue crops up!

However, this is not meant as my last word on the subject, but rather a conversation-starter. What do you think about the approach I outline below?

Introduction

Wikipedia defines a rating system as “any kind of rating applied to a certain application domain”. Examples include:

  • Motion Picture Association of America (MPAA) film rating system
  • Star rating
  • Rating system of the Royal Navy

A rating system therefore explains how relevant something is in a particular context.

Ratings in context

Let’s take the example of film ratings. Thanks to the MPAA film rating system, parents can decide whether to allow their child to watch a particular film. Standardised criteria (e.g. drugs / sex / violence) are applied to a film which is then given a rating such as G (General Audiences), PG (Parental Guidance), and R (Restricted). These ratings are reviewed on a regular basis, sometimes leading to the introduction of new categories (e.g. PG-13).

Despite the MPAA film rating system, many parents seek additional guidance in this area – for example, websites such as Common Sense Media which further contextualise the film.

Common Sense Media screenshot
Screenshot showing the film ‘How to Train Your Dragon’ on the Common Sense Media website

In other words, the MPAA rating system isn’t enough. Parents also take into account what their child is like, what other parents do, and the recommendations of sites they trust such as Common Sense Media.

Three types of rating systems

As evident in the screenshot above, Common Sense Media includes many data points to help parents make a judgement as to whether they will allow their child to watch a film.

With MoodleNet, we want to help educators find high-quality, relevant resources for use in their particular context. Solving this problem is a subset of the perennial problem around the conservation of attention.

Educational resources triangle
Project management triangle, adapted for educational resources

In other words, we want to provide the shortest path to the best resources. Using an adapted project management triangle, educators usually have to make do with two of the three of time, effort, and quality. That is to say they can minimise the time and cost of looking for resources, but this is likely to have a hit on the relevance of resources they discover (which is a proxy for quality).

Likewise, if educators want to minimise the time and maximise the quality of resources, that will cost them more. Finally, if they want to minimise the cost and maximise the quality, they will have to spend a lot more time finding resources.

The ‘holy grail’ would be a system that minimises time and cost at the same time as delivering quality education resources. With MoodleNet, we are attempting to do that in part by providing a system that is part searchable resource repository, and part discovery-based social network.

Proactive/Reactive
Diagram by Bryan Mathers showing MoodleNet as both a place where educators can search for and discover educational resources

Simply providing a place for educators to search and discover resources is not enough, however. We need something more granular than a mashup of a search engine and status updates.

What kinds of rating systems are used on the web?

There are many kinds of rating systems used on the web, from informal approaches using emoji, through to formal approaches using very strict rubrics. What we need with MoodleNet is something that allows for some flexibility, an approach that assumes some context.

With that in mind, let’s consider three different kinds of rating systems:

  1. Star rating systems
  2. Best answer systems
  3. Like-based systems

1. Star rating systems

One of the indicators in the previous example of the Common Sense Media website is a five-star rating system. This is a commonly-used approach, with perhaps the best-known example being Amazon product reviews. Here is an example:

Amazon page for Google Pixelbook
Amazon product page for a Google Pixelbook showing an average of 3.5 stars out of five from 12 customer reviews

Should I buy this laptop? I have the opinion of 12 customers, with a rating of three-and-a-half stars out of five, but I’m not sure. Let’s look at the reviews. Here’s the top one, marked as ‘helpful’ by nine people:

One-star review for Google Pixelbook
One-star review from a customer complaining about faulty Google Pixelbook

So this reviewer left a one-star review after being sent a faulty unit by a third-party seller. That, of course, is a statement about the seller, not the product.

Meanwhile:

Five-star review for Google Pixelbook
Five-star review from a customer pleased with their Google Pixelbook

Averaging the rating of these two reviews obviously does not make sense, as they are not rating the same thing. The first reviewer is using the star rating system to complain, and the second reviewer seems to like the product, but we have no context. Is this their first ever laptop? What are they using it for?

Star rating systems are problematic as they are blunt instruments that attempt to boil down many different factors to a single, objective ‘rating’. They are also too easily gamed through methods such as ‘astroturfing’. This is when individuals or organisations with a vested interested organise for very positive or very negative reviews to be left about particular products, services, and resources.

From the Wikipedia article on the subject:

Data mining expert Bing Liu (University of Illinois) estimated that one-third of all consumer reviews on the Internet are fake. According to The New York Times, this has made it hard to tell the difference between “popular sentiment” and “manufactured public opinion.”

As a result, implementing a star rating system in MoodleNet, a global network for educators, would be fraught with difficulties. It assumes an objective, explicit context when no such context exists.

2. Best answer approach

This approach allows a community of people with similar interests to ask questions, receive answers, and have both voted upon. This format is common to Stack Overflow and Reddit.

Stackoverflow question and answer
Screenshot of a question with answers on Stack Overflow

Some of these question and answer pages on Stack Overflow become quite lengthy, with nested comments. In addition, some responders disagree with one another. As a result, and to save other people time, the original poster of the question can indicate that a particular answer solved their problem. This is then highlighted.

The ‘best answer’ approach works very well for knotty problems that require clarification and/or some collaborative thinking-through of problems. The result is then be easily searched and parsed by someone coming later with the same problem. I can imagine this would work well within MoodleNet community discussion forums (as it already does on the moodle.org forums).

When dealing with educational resources, however, there is often no objective ‘best answer’. There are things that work in a particular context, and things that don’t. Given how different classrooms can be even within the same institution, this is not something that can be easily solved by a ‘best answer’ approach.

3. Like-based systems

Sometimes simple mechanisms can be very powerful. The ‘like’ button has conquered social networks, with the best-known example being Facebook’s implementation.

Facebook Like button
Example of a Facebook ‘like’ button

I don’t use Facebook products on principle, and haven’t done since 2011, so let’s look at other implementations.

YouTube

Social networks are full of user-generated content. Take YouTube, for example, where 400 hours of video is uploaded every single minute. How can anyone possibly find anything of value with such a deluge of information?

YouTube search
YouTube search results for ‘bolshevik revolution’ sorted by relevance

In the above screenshot, you can see a search for one of my favourite topics, The Bolshevik Revolution. YouTube does a good job of surfacing ‘relevant’ content and I can also choose to sort my results by ‘rating’.

Here is the top video from the search result:

Annotated YouTube video
YouTube video with upvote and downvote functionality highlighted

I don’t have time to watch every video that might be relevant, so I need a shortcut. YouTube gives me statistics about how many people have viewed this video and how many people subscribe to this user’s channel. I can also see when the video was published. All of this is useful information.

The metric I’m most interested in, however, and which seems to make the biggest impact in terms of YouTube’s algorithm, is the number of upvotes the video has received compared to the number of downvotes. In this example, the video has received 16,000 upvotes and 634 downvotes, meaning that over 95% of people who have expressed an opinion in this way have been positive.

If I want more information, I can dive into the comments section, but I can already see that this video is likely to be something that may be of use to me. I would add this to a shortlist of three to five videos on the topic that I’d watch to discover the one that’s best for my context.

Twitter

Going one stage further, some social networks like Twitter simply offer the ability for users to ‘like’ something. A full explanation of the ‘retweet’ or ‘boost’ functionality of social networks is outside of the scope of this post, but that too serves as an indicator:

Tweet from UN Education Report
Tweet from UN Education Report showing reteweets and likes

This tweet from the UN about a report their Global Education Monitoring report has been liked 72 times. We don’t know the context of the people who have ‘liked’ this, but we can see that it’s popular. So, if I were searching for something about migrant education, I’d be sure to check out this report.

Although both YouTube and Twitter do not make it clear, their algorithms take into account ‘likes’ and ‘upvotes’ within the context of who you are connected to. So, for example, if a video has a lot of upvotes on YouTube and you’re subscribed to that channel, you’re likely to be recommended that video. Similarly, on Twitter, if a tweet has a lot of likes and a lot of those likes come from people you’re following, then the tweet is likely to be recommended to you.

Twitter user explaining likes are bookmarks, not endorsements
Twitter user account with bio that includes “Likes are usually bookmarks, not endorsements”

Interestingly, many Twitter users use the limited space in their bios to point out explicitly that their ‘likes’ are not endorsements, but used to bookmark things to which they’d like to return. In the past year, Twitter has begun to roll out bookmarks functionality, but it is a two-step process and not widely used.

So likes act as both votes and a form of bookmarking system. It’s a neat, elegant, and widely-used indicator.

What does this mean for MoodleNet?

So far, we have discovered that:

  • The ‘quality’ of a resource depends upon its (perceived) relevance
  • Relevant resources depend upon a user’s context
  • We cannot know everything about a user’s context

MoodleNet will implement a system of both taxonomic and folksonomic tagging. Taxonomic tags will include controlled tags relating to (i) language, (ii) broad subject area, and (iii) grade level(s). Folksonomic tags will be open for anyone to enter, and will autocomplete to help prevent typos. We are considering adding suggested tags via machine learning, too.

In addition to this, and based on what we’ve learned from the three rating systems above, MoodleNet users will soon be able to ‘like’ resources within collections.

Potential future location of 'like' button in MoodleNet
Screenshot of a MoodleNet collection with arrow indicating potential future location of ‘like’ button

By adding a ‘like’ button to resources within MoodleNet collections, we potentially solve a number of problems. This is particularly true if we indicate the number of times that resource has been liked by community members.

  1. Context – every collection is within a community, increasing the amount of context we have for each ‘like’.
  2. Bookmarking – ‘liking’ a resource within a collection will add it to a list of resources a user has liked across collections and communities.
  3. Popularity contest – collections are limited to 10 resources so, if we also indicate when a resource was added, we can see whether or not it should be replaced.

As discussions can happen both at the community and collection level, users can discuss collections and use the number of likes as an indicator.

Conclusion

Sometimes the best solutions are the simplest ones, and the ones that people are used to using. In our context, that looks like a simple ‘like’ button next to resources in the context of a collection within a community.

We’re going to test out this approach, and see what kind of behaviours emerge as a result. The plan is to iterate based on the feedback we receive and, of course, continue to tweak the user interface of MoodleNet as it grows!


What are your thoughts on this? Have you seen something that works well that we could use as well / instead of the above?

Leave a Reply

Your email address will not be published. Required fields are marked *