• Discover new ways to elevate your game with the updated DGCourseReview app!
    It's entirely free and enhanced with features shaped by user feedback to ensure your best experience on the course. (App Store or Google Play)

Odds of beating a player rated x

Lewis

* Ace Member *
Joined
Sep 18, 2008
Messages
4,484
Location
Marietta, GA
Sometimes people criticize the PDGA ratings system for not publishing a standard deviation as part of a person's rating. If we had standard deviations as part of the rating system, the reasoning goes, we could do things like estimate the odds of a person with rating y beating a person with rating x, which can't be done with a rating alone, which only represents the average score a person is expected to turn in, but not the typical spread of scores he will turn in over time.

Curious about this, and knowing that the PDGA publishes round rating histories for players on their website, I decided to see how close an association there is between a player's rating and how close to his rating he tends to score. You'd expect players with higher ratings to play more consistently, and therefore have a lower standard deviation in their rating history. The numbers as I crunched them seem to agree. For your consideration, I'm sharing a spreadsheet that shows the approximate relationship between PDGA rating and standard deviation, which in turn helps estimate the likelihood that a player with a given rating will beat other players at a variety of ratings.

It's just an estimation tool for fun, and for particular players you could do better by looking up and comparing their particular rating histories, but I think it might be interesting for folks to see how the curve comes out. I'm sharing the file via Google Drive. Just download the Excel file and have fun. The "data" tab is where I pasted in a bunch of player data at a bunch of different ratings, to get a best-fit line for the relationship between player rating and standard deviation of round ratings. The "odds" tab is where you can plug in a rating (in cell A2) and see how it tends to compete against other ratings.

https://drive.google.com/file/d/0B25ABECFZrK5SGQ1dk1scEFyWGc/view?usp=sharing
 
Hi Lewis,

First, awesome data and calculations! :D Now on to the questions.. do you think it would be possible to use this kind of method to improve on the accuracy of the PDGA rating system? For example, you're pretty clearly showing that, as rating increases, standard deviation (of rounds that make up the round rating) decreases. By contrast, the PDGA rating system makes the assumption that all propagators are equal, both in terms of validity/reliability of their round scores (and prior rated rounds) and their standard deviation.

Could we engineer something that, for example, evaluated just how valuable/meaningful each propagator's result really is, based on their standard deviation as well as their rating? ;)
 
The idea of just using higher rated propagators with lower standard deviations has already been tested and shown to be worse than using the current system that includes propagators with the highest standard deviations. For example, in a pool of 50 propagators with standard deviations ranging from 15 to 40 (max allowed), if you use just the props with the 20 lowest SD, you get less reliable values than if you use all 50 props. More total props turns out to be better than fewer elite props.
 
First, let me say that I love seeing a post where someone actually shows knowledge on a subject, in this case statistics, and provides real data.

Second, I don't claim to fully understand how the ratings system in disc golf works, but would love to be informed. Basically, correct me if I'm wrong, I won't be offended.

I have minimal experience with some basic biometric methods and off the top of my head could see an F-test or multiple ANOVA applying to this situation. Lewis's hypothesis about a correlation between an increased player's rating and a shrinking SD is an interesting one. However, it also makes sense that the number of rounds recorded would have a similar effect. Obviously I agree that people who are better at disc golf are usually more consistent, but I also think that people who play more are usually more consistent as well. It could just be that most top-level pros play more frequently, and thus a similar effect is seen. It likely is some kind of interaction variable between both of these factors (Once again, this could be tested with an ANOVA).

Another thought… Since an increased sample size produces better data, wouldn't it make since to "weigh" a round from a player with 30+ rounds more heavily that a round from a player who only has 6 rounds. I use the word "weigh" with quotes, because what I'm really referring to is an F-test, and the degrees of freedom in both the denominator and numerator of two different samples, but know that not everyone will know what that means.

And one more… Theoretically, wouldn't it be possible to set up 95% confidence intervals (p < .05) so that when nightmare or amazing rounds happen the pdga can track them? (Maybe this is already done, once again; I'm not pretending to understand how it all works). Outliers are interesting and I know I'd be interested in seeing more of that data.

Thanks again for the interesting post/read. I'm going to go play with that excel sheet now.
 
Cgk aka Chuck, is pretty much the man when it comes to the rating system. I am not entirely sure anyone but him fully understands it. I know he has done a lot of testing and the system is pretty well set. That being said I love seeing these kinds of data sets and theories. I'm all geeking out.
 
Chuck (Cgkdisc) can fill you in on all the details of how the ratings are calculated. For my purposes, I took a not-so-random sample of players with a large number of rounds recorded in 2014, with a little effort to include players with a variety of different ratings. To do this, I sorted the player statistics page on the PDGA website by events played, and viewed the "ratings detail" tab on a bunch of players from the top couple of pages of results. From there I just copied and pasted in round ratings, so the "average" rating you see in my spreadsheet for a given user (I omitted names but you could probably figure out who some of them are) will be close to but a little different from their official PDGA rating. It's meant to be a proof of concept, not to be scientific, so y'all who really know something about statistics have plenty of room, and are welcome, to improve it. Whether it could prove useful to the PDGA and Chuck's ratings system are beyond me.
 
Last edited:
The idea of just using higher rated propagators with lower standard deviations has already been tested and shown to be worse than using the current system that includes propagators with the highest standard deviations. For example, in a pool of 50 propagators with standard deviations ranging from 15 to 40 (max allowed), if you use just the props with the 20 lowest SD, you get less reliable values than if you use all 50 props. More total props turns out to be better than fewer elite props.

Hi Chuck,

Yes, I remember you mentioning previously that simply reducing the overall number of propagators doesn't produce more accurate ratings. That's not really what I was suggesting, though. The rounds coming from High-SD players still have *some* value/validity, just potentially not as much validity as those coming from low-SD players. So we can't throw them out, but why can't we 'weight' them based on how valuable/accurate they are? e.g. if you have a player with a rating of 900 and a high SD, and a player with a rating of 1000 and a low SD, shouldn't the best fit line run 'tighter' to the 1000-rated players' round score than the 900-rated players' round score?

Most modern (win/loss) rating systems out there (e.g. TrueSkill) use an internal-only value for every player that determines how valuable/accurate their (external) rating is. The PDGA rating system, by comparison, does not.
 
Another thought… Since an increased sample size produces better data, wouldn't it make since to "weigh" a round from a player with 30+ rounds more heavily that a round from a player who only has 6 rounds. I use the word "weigh" with quotes, because what I'm really referring to is an F-test, and the degrees of freedom in both the denominator and numerator of two different samples, but know that not everyone will know what that means.

Yes, this is precisely what I've been wondering too (see the above post). ;)
 
Ok..so the lower your opponents ratings (from yours) the higher chance you will win, the higher your opponents rating (from yours) the less of a chance you will win. Otherwise hard to inject "luck", "weather", "in the zone" factors that probably contribute heavily to the end result.
 
Cgk aka Chuck, is pretty much the man when it comes to the rating system. I am not entirely sure anyone but him fully understands it. I know he has done a lot of testing and the system is pretty well set. That being said I love seeing these kinds of data sets and theories. I'm all geeking out.

Hi TalbotTrojan,

There are actually a variety of threads here where the workings of the PDGA rating system are detailed (I've contributed to some of them). From what we've been able to glean, however, there are several underlying assumptions of the PDGA system:

1. All round scores from all propagators are equally-valuable (and equally-accurate). This is the major assumption in question in this particular thread: the idea that the round score data coming from some propagators is more valuable/accurate than others. The PDGA system assumes that the round scores of a player with 8 prior rounds (and potentially a very high standard deviation) is equally-valuable to the round scores of a player with hundreds of prior rounds (and a very low standard deviation).

2. The relationship between player rating and round score is linear. This is another really fundamental question, and data from *some* events (particularly events with a lot of players >1030 rated) makes this assumption questionable. The issue is whether, as rating increases, round score changes proportionally and linearly. e.g. is the difference between a 900-rated player and a 950-rated player precisely equal to the difference between a 1000-rated player and a 1050-rated player?

3. All courses/layouts of equal SSA have an identical (points-per-throw) slope. This is tied into #2 above. The issue here is whether or not the linear equation generated by plotting rating vs round score can really be accurately predicted by only inputting SSA. e.g. take any two possible courses/layouts with an equal SSA. Are these two courses/layouts really guaranteed to spread out the field precisely equally?
 
In theory, one would think more rounds would be better. For example, I would think the stats for a player who played the same course, at the same time each day, during similar weather would be more reliable after 30 rounds than 8 rounds.

However, our round ratings are gathered over random time intervals on different courses and different tier events and away versus home courses. If one believes people play some types of courses better than others and players in theory are getting better with more play, I'm not sure a player with a rating based on two recent A-tiers and Worlds for 15 rounds 10 months ago would be as good as one based on 15 rounds in events played on different courses in the past few months. That lack of data uniformity I believe undermines more precision.

The one thing I hope will help improve ratings and forecasting in the future is when we get more course characteristics linked to tournament scores. Sort of how race horses might be known as "mudders", we might find that Player Sam is better in the woods by two throws versus in the open but worse by two throws on hilly courses.
 
So basically Disc golf is like ecology, the error term/residual effects/individual variations are huge!
 
Whether it could prove useful to the PDGA and Chuck's ratings system are beyond me.
It would be useful to use to figure out how to set your betting odds. It might also tell you who you should "work" during the round. Higher SD players are probably more susceptible to "working" than low SD players.

My guess is that this information is much more interesting than useful. Of course, I'd guess that most ratings are actually more interesting than useful as well. Beyond telling you which AM division you qualify for they aren't really super useful.
 
The one thing I hope will help improve ratings and forecasting in the future is when we get more course characteristics linked to tournament scores. Sort of how race horses might be known as "mudders", we might find that Player Sam is better in the woods by two throws versus in the open but worse by two throws on hilly courses.

It might also help assign a "slope" rating to courses, as they do in ball golf, if the PDGA is so inclined. This would depend on someone figuring out what course features tend to spread the field more, and vice versa. In ball golf, the common wisdom is that wide fairways and forgiving rough help the high handicapper more than the scratch golfer, while narrow fairways and punishing rough punish the high handicapper more than the scratch golfer. The reason is that scratch golfers are going to hit the fairway most of the time anyway, no matter if it is wide or narrow, and aren't going to be in the bad parts of the rough much anyway, while the high handicapper is going to be helped a great deal by the wider fairways and forgiving roughs giving him more chances to recover. If the same is true of disc golf courses, then laces like Flyboy, which have very clear, open fairways, but horribly nasty shule just off the edge of most of the fairways, would tend to spread the scores more than a course that is wide open. Both courses could be considered very "fair", but comparing the round ratings of 1000-rated players and 900-rated players, you would see a tendency of Flyboy to spread the rating results more than the open course, i.e., at Flyboy, the 1000-rated players would tend to score above their rating, and the low-rated players would tend to score below their rating, whereas at a wide open course, the opposite tendency would hold.
 
Last edited:
And just to be clear, all I meant do to with my spreadsheet was share a way to estimate the odds of beating another player, given only their rating, which is probably most useful as a handicapping tool, as garublador has suggested.

With my disclaimer out there, y'all feel free to carry on the conversation as you see fit. It's definitely interesting stuff. :popcorn:
 
We have more proof at this point that slope does not exist in DG and possibly does not in ball golf but they don't have a way of proving it one way or the other. But certainly when we get more scoring data connected with course terrain and hazard elements, we'll be able to look deeper.
 
Hi Lewis,

I just realized that this project *might* be able to help answer two mathematical questions I've been trying to answer lately.

1). Are your statistics able to suggest a (constant) value for odds of a tie against a precisely-equally-rated player?

2). I realize the data sample may not be large enough (or high-enough quality) to determine this, but would you think the observed distribution fits a gaussian or logistic curve better?

Thanks!
 
I'm going to bump this back up because I have another statistical inquiry based on standard deviation.

Could a person use the standard deviation of your rated rounds to determine where they are in their progression/whether they've maxed their potential? I just did come quick calculations using a few professionals and amateurs that I know and I feel like it may be a potential indicator.

Example: Based on the rounds used in his 2014 rating, McBeast has a standard deviation of 23.25 rating points (~2.3 strokes per round), while The Champ has a standard deviation of 21.05 rating points per round (~2.1 strokes per round). While the strokes may not matter, the increased amount of variance for McBeth indicates that he could become a more consistent player when compared to Ken. Another pro, James Proctor, has a standard deviation of 26.36 rating points (~2.6 strokes per round). With these deviations, we know that the ratings of Paul and Ken only varied by 5 and 7 points respectively, while James' rating varied by 17 points over the same time frame.

By including a data set from one of the best MA1 players in California and Nevada (Standard Deviation of 30.8, 34 point rating variation) and a few others I thought this became a much more viable hypothesis. I came up with a purely subjective idea of how each range would be classified:

Assumptions
STDEV < 20 Extremely Consistent; Current rating close to maximum without major overhaul of game
20 < STDEV < 25 Very Consistent; Current rating may be able to increase by 5 -10 points through mental game
25 < STDEV < 30 Mostly Consistent; Current rating may be able to increase by 25 - 50 points with dedicated practice
30 < STDEV < 35 Less Consistent; Current rating may be able to increase by 50 - 100 points with practice and time
35 < STDEV Not Consistent; Current rating may be able to increase by 100+ points with practice and time

Since this was a pretty stat heavy thread I was wondering what people thought regarding something like this. While looking through data, I found some amateur players with standard deviations in the low 20s, which lead me to believe they may have capped in terms of their rating and potential. What does everyone think?
 

Latest posts

Top