HyperbolusForumsLoungeDifficulty rating, part II

Difficulty rating, part II

Posted by SolveForX314, 2023-06-25, 21:36:59, Thread ID: 18
avatar
Power Reviewer
Reputation:6
Posts:54
Credits:0
OP06/25/23, 04:36 PM
ID: 44

So recently, as I'm sure you are all aware, the difficulty ratings for levels have had their range increased to 100. Overall, this is a good change; levels come in too wide a variety of difficulty levels for this to be properly expressed using only one significant figure. However, this seems to have thrown the rankings off.

First of all, any and all reviews from before the change have had their difficulty ratings set to N/A. This way of handling it seems fine at first glance, but I think the average is still being calculated based on the total number of reviews, as most levels have a substantially lower difficulty rating than they're supposed to. For example, Bloodbath, whose two recent reviews have an average difficulty rating of 78, is displayed as a 19.5. (Interestingly, the displayed number seems to have been the sum of all recent reviews divided by 8 — one more than the total review count. This is something else that should probably be addressed, but that seems like more of an admin-side thing.)

Even if you recalculate the difficulty ratings, though, the rankings are still a little wonky. Windy Landscape (difficulty 30) would be positioned below Nine Circles (40.75), Bloodbath (78) would be ranked above Limbo (66.33), and X (22) and Speed Racer (23) would both be higher than White Space (7.33) and B (10.5). I suspect the reason behind this is a problem that's more on our end — we don't all seem to agree on how exactly the difficulty continuum should be arranged.

If every active user reviewed every popular level using their own difficulty mapping, this would be less of a problem; all the ratings would sort of combine into a single score built off of all of our own opinions about where things should go. However, not everyone will be able to review every level; just as everyone has different playing styles, everyone will have different criteria about which levels they do and don't review. (If you've taken AP Stats, you might notice how this might lead to some bias later down the line once all the larger issues get smoothed out, but maybe we can discuss this another time.) If you combine these two factors, things could get confusing. For example, let's say we have group A of users who rate easy demons from 40-49 and group B of users who rate easy demons from 20-29. If two easy demons are about the same difficulty, but one is reviewed by group A more frequently and the other is reviewed by group B more frequently, then they will appear to be vastly different difficulties. I'm sure you can see why this might be somewhat of a problem.

There are probably a number of ways this could be handled. One possibility is that we could wait until we have enough people reviewing that everyone's different systems turn into a more homogenous-looking system, like what I imagine happened when RobTop introduced demon sub-difficulties. However, this might take a while; there are currently fewer than 100 people using this site, and not all of them review a lot of levels. Another approach we could try would be to come up with a single standard for everyone to follow, but this would require getting everyone to agree on something, and despite what I said earlier, 87 isn't exactly a small number.

What do you guys think? Do either of these solutions sound acceptable, or should we try to come up with something else?

-SolveX3

1
avatar
Administrator
Reputation:13
Posts:45
Credits:-13
06/27/23, 04:09 PM
ID: 46

Thanks for pointing out that first bug, it is in fact an error in the averaging calculation. It has been fixed and should be pushed by the time you read this.

This is a really interesting topic for sure. I have thought about the disconnect between people's lack of voting standards and it's still something I am actively researching based on what I remember from statistics. There's a great Tom Scott video here on the topic about how averages can settle in on large datasets. Of note is that part about lottery numbers. You can't find an accurate answer when there is none. Level ratings fall in between fact and randomness because they are subjective. So whatever average the site falls on would be representative of the "GD Forums rating index" or how the average GD Forums user rates this level. This is something we don't really get to choose and how useful this number is, will be up to us. Of course this needs a large quantity of active users to smooth out the bumps. I hope to gain some users once the site is in a stable state and I can feel confident sharing it with friends and submitting it to GD Today for example. But I digress...

Making sure there is an even distribution of ratings on levels is my job to encourage users to be active and thoughtful with their submissions. However, there is another trick we can pull...

The system used for rating tags and screenshots uses Bayesian averaging. What this lets us do is anticipate an average ahead of time. If you go right now to add a tag to a level you might notice the score (which is between 0-1 or 0% to 100%) you will get 0.3 or 30%. Why is this? You are the only person voting for that tag so it should be 1/1 100%. Well, the ratings are weighted to a standard which becomes weaker the more people vote on a tag to offset this weighting. There is a downside which is that it requires someone to choose a "default" or "expected" value. So I cannot implement this for levels at this moment. This would be changing the rating totals so they weren't plain averages and maybe I can add a switch to toggle which you see. But basically I need to wait for the "GD Forums rating index" or whatever to develop itself. Once there is enough data, we can draw conclusions such as "A demon level with this star rating and this amount of reviews gets this average score" and we can pick good values that weigh level reviews to their expected rating even when there are a low amount of people rating those levels. This hopefully mitigates, even if it doesn't solve, the A/B group problem. It's not a perfect solution and there are many other I have not yet explored but I felt it I owed it to share where my plans lie in regards to this whole thing.

1
avatar
Power Reviewer
Reputation:6
Posts:54
Credits:0
OP06/28/23, 05:14 PM
ID: 47

That sounds like a good solution! Probably better than either of the ones I came up with, haha :)

Haven't read the whole article about Bayesian averaging yet, but from the part I have read, it sounds like it would work great for this sort of thing. I liked the comparison to star ratings on product review sites. Rating something as "four stars out of five" doesn't necessarily have any sort of intrinsic meaning; other than its numerical value in relation to the other options, it really only means what the reviewer decides it means. This seems pretty similar to the difficulty rating system here, where it is up to individual users to decide what exactly the number 47 means. One could even argue that the in-game demon sub-difficulties don't have any intrinsic meaning — the only reason we can't all arbitrarily decide Cataclysm is only an insane demon is because there's so many people who've already rated it as extreme (and more who will continue doing so) that it's pretty much become tradition to say Cataclysm is an extreme.

I suppose the best way to standardize a rating system will be to wait and see what emerges...

-SolveX3

avatar
Junior Member
Reputation:-8
Posts:109
Credits:0
04/27/25, 09:22 PM
ID: 1182

i think the difficulty rating system would be better if levels were rated relative to vam by flaaroni

0/100 if the level is easier than vam by flaaroni

100/100 if the level is harder than or equal to vam by flaaroni

OK THAT BOY A F**KING GOOFY HE A RICHARD

Log in to post a reply
Hyperbolus is not affiliated with RobTopGames AB or Geometry Dash
Hyperbolus © 2025
Connected via
Alakazam

GDPR Cookie Consent

Hyperbolus uses cookies and local browser storage to enable basic functionality of the site. If we make any changes to these options we will ask for your consent again.

Strictly necessary
Analytics and performance
Advertising personalization

sorry about this gang