Machine learning for climbing grades

Conventional assessment of route difficulty for rock climbing is a subjective process. A small number of people (often just one) assign a grade for a particular route, and there isn’t really a process for refining grades once they’ve been assigned (it’s just one opinion vs another). Most of the grading systems are on an ordinal scale, which means you can rank the grades in order but the difference or ratio between grades isn’t meaningful. Intentional biases are even part of climbing culture.

To address these shortcomings, I developed a statistical model for grading rock climbing routes. The difficulty of a climbing route and the performance of a climber on a particular day are described by numerical ratings. The difference in ratings between a climber and a route determines the probability the climber will ascend the route “successfully”. For modern sport climbing, success loosely means getting to the top without weighting a rope or other mechanical devices. The climbing model is based on a dynamic Bradley-Terry model, which is a common model for game and sports rating systems such as Elo and Glicko-2.

While the statistical model provides a theory for predicting ascent outcomes based on ratings parameters, it’s not useful in practice without a process for estimating the parameters (individual ratings for climbers and routes) and hyperparameters (generalizations that are independent of individual climbers or routes, e.g. how hard the “average” route is, and how quickly climbers can improve). So I implemented an algorithm for estimating the parameters, based on the Whole-History Rating (WHR) algorithm. WHR is a fast algorithm that uses second-order (Newton-Raphson) optimization for finding the ratings for climbers and routes that maximize the likelihood of observing a particular set of ascents (known as the maximum a posteriori estimates). I used machine learning methods to choose the hyperparameters. The implementation is available as a free, open-source software package at the Climbing Ratings project on GitHub.

So, how did it perform? With the help of, I fitted the model to hundreds of thousands of ascents from Australia. The output ratings were closely correlated with the conventional (subjectively-assigned) grades. This is an important result because it shows the principles of science (using observations to make testable predictions about the world) can be applied to grading climbs. I’ve published a more formal write-up of the results in the academic paper: Estimation of Climbing Route Difficulty using Whole-History Rating, and a layman’s explanation in the climbing magazine Vertical Life.

The real value for climbers came when integrated the estimated grades into their website. The Climbing Ratings software produces grades for the thousands of routes where they have sufficient ascent data. The estimated grades are called “grAId” on and are available on the route description pages, alongside the grades from guidebooks. Sometimes the estimated grade and the conventional grade differ, suggesting the conventional grade was inconsistent with climbers’ actual ascent experiences. This sparked some debate, but it also provided some valuable feedback and opportunities for improvements.

I’m now starting to plan out the next steps from a technical perspective, and thinking about applications of more sophisticated machine learning and artificial intelligence methods. Get in contact if you have ideas!

Leave a Reply

Your email address will not be published. Required fields are marked *