When is a 1020 round not a 1020 round?

jeverett · Mar 1, 2013

Preface: this post will contain statistics, and may be best-directed at cgkdisc.

No, my rhetorical question is not actually rhetorical. I believe that the PDGA "rating interval compression formulas" are incorrect, and as such PDGA player round ratings (and subsequently player ratings) are not as accurate as they could be. The result of these formulas, from what data I've been able to collect so far, is that almost no recorded rounds for any event, for rounds above or below 1000, are being correctly rated. The discrepancy is typically very small, however may be much more pronounced on very short or very difficult courses.

Note: I am referencing the 2013 Gentlemen's Club Challenge for much of the following. Unfortunately, a ratings update has happened since the following calculations were run, making comparison with the original data more difficult. The results of that event can be found here:

http://www.pdga.com/tournament_results/99721

For any who are interested, here is a quick breakdown of how PDGA round ratings are calculated:

Step 1. Determine who really is a propagator. Unfortunately this part is actually annoying to do accurately with available data. For example, players without active PDGA memberships may still be propagators for an event, but their initial rating cannot be determined externally. Also, in order to be considered a propagator, a player must have played a sufficient number of previous rated rounds (i.e. they need more than to just have a rating), and this is very time-consuming (and sometimes impossible) to determine externally.

Step 2. Using all propagators, determine the best-fit linear equation that matches the round data. For example, here is a plot of round 1 of the 2013 Gentleman's Club Challenge (GCC):

The linear equation that best fits the above data is:

Round_Score = -0.12196922 * Initial_Rating + 182.4463478

Step 3. Determine the SSA for the round, using the best-fit linear equation above. As an example, for round 1 of the GCC, the SSA is approximately 60.47712771.

Step 4. Determine round ratings, using the following 'compression formulas':

For SSA's above 50.3289725:
Rating Increment = -0.225067 * SSA + 21.3858

For SSA's below 50.3289725:
Rating Increment = -0.487095 * SSA + 34.5734

Note: these formulas were derived from PDGA event results, and are not the precise formulas used by the PDGA. Unfortunately, due to rounding (of round ratings to the nearest whole number, and possibly also round of SSA values) it is impossible to exactly determine the actual linear formulas used. They are accurate, however, to within 0.01% of the actual PDGA linear formulas, and in the case of the GCC, accurately matched the 'unofficial' round ratings for each player in the round (subject to a small amount of rounding error). The rating increment for round 1, for example, was 7.774394297, or each throw off of the SSA increased or decreased the players' round rating by 7.774394297.

Step 5. Compare the round rating of each propagator with their initial rating, and throw out any propagator who's round rating was not within 50 point of their initial rating, and recalculate steps 2-4. Here is a plot of GCC round 1, with these propagators removed:

The linear equation that best fits the above data is:

Round_Score = -0.114947347 * Initial_Rating + 175.3579791

Producing a modified SSA of 60.41063194, and a new rating interval (using the same compression formula as above) of 7.789360301.

So.. what's wrong with any of this? Nothing.. yet. But what happens when you compare the linear equation that was used to compute the SSA with the rating interval compression formula value for that SSA? *If* it is valid to use a linear equation to model player rating vs. round score (and from the two plots above, plus countless other plots I've made from PDGA round data it does appear to be valid), should it not also be valid to use this same linear equation to determine the (averaged) number of rating points per throw? But this is not what the PDGA system does. For example, when you compare the rating interval used for each round of the GCC, against the observed rating-interval-per-throw-vs-initial-player-rating of all propagators, you get the following plot:

The PDGA Interval line above is of course this linear formula:

Rating_Increment = -0.225067 * SSA + 21.3858

But as you can observe, this in no way matches the observed rating interval based on player initial rating vs. round score, and the round data I've collected to date further supports this trend: that the relationship between initial player rating, round score, and round rating increment cannot be accurately predicted using the PDGA linear 'compression formula'. The effect of this is that round ratings are not (as) accurately reflecting the effect of player rating on round scores as they could be. In other words, if a 1020-rated player shoots an "average" round, by the PDGA compression formulas that round is not being rated a 1020. In fact, the only round rating not subject to this kind of induced error is of course a round rated exactly 1000.

What am I suggesting? I am suggesting that the PDGA instead switch to using the same linear equation used to determine SSA for a round to determine the per-throw rating increment. For example, for round 1 of the GCC, this would mean a round that was rated as a 1020 (technically a round score of 57.8430269985) would instead be rated as a 1022.3372266 (or rounded to a 1022). Yes, this difference is very small..

So why does it really matter? The goal here is correlation.. or more specifically the correlation between (initial) player rating and round/event score. i.e. How well did the initial ratings of all players predict how they scored at an event. I don't know if anyone has seen the PDGA report on correlation coefficients for their events (it was published last year), but increasing the correlation coefficient for Majors appears to be a goal of theirs. This of course can also be addressed with course design, but a very simple way to improve the correlation coefficient could be to switch how round ratings themselves are calculated. I don't, however, have any method of proving that this change will work.. and that's where I'd *love* some help (maybe even from Chuck). In order to test this, the 'real' PDGA method of determining SSA and rating intervals per throw would need to be used, using the precise number and ratings of propagators for real events, and then we'd need to test the impact both round rating methods would have on first player ratings and then the correlation coefficient of how well future event rounds are predicted by said player ratings.

Ok.. wall of text over.. thoughts?

BionicRib · Mar 1, 2013

Do you perhaps have a link to the correlation coefficients reported by the PDGA? Or are they only sent to the TD's?

Dave242 · Mar 1, 2013

jeverett said:
the SSA is approximately 60.47712771

Your use of "approximate" is less accurate than PDGA ratings.

I guess it is important to get this down to the nearest few billionths of a throw!

jeverett · Mar 1, 2013

BionicRib said:
Do you perhaps have a link to the correlation coefficients reported by the PDGA? Or are they only sent to the TD's?

What I have is the PDGA document (more of a white paper) on the topic. It is available online here:

http://www.pdga.com/files/documents/Correlation_for_Better_Courses.pdf

BionicRib · Mar 1, 2013

I asked Chuck a similar question a few days ago. I do see what you are getting at and after reading that file you linked me too.......I would say that it there are two issues that will take time to help with these correlation coefficients. 1. Is as you mentioned.....Course design...."luck golf/tweener holes/bad design". Having clearer definitions that are practiced by "all" designers across the country is a work in progress. 2. IMO....and I'm sure Chuck can clear this up better than I, but is there really enough data to work with? I personally don't think so. If disc golf were as popular as baseball or golf for that matter, the numbers would be more "fine tuned" because you have more players. More players equals more data. You have a fifty fifty chance of getting heads or tails when flipping a coin. If you flip it 100 times I bet you never get 50/50 on the dot.......(more like 70/30 or 60/40). If you flip it a million times you will get closer and closer to that "50/50" on paper. Just my thoughts

jeverett · Mar 1, 2013

BionicRib said:
I asked Chuck a similar question a few days ago. I do see what you are getting at and after reading that file you linked me too.......I would say that it there are two issues that will take time to help with these correlation coefficients. 1. Is as you mentioned.....Course design...."luck golf/tweener holes/bad design". Having clearer definitions that are practiced by "all" designers across the country is a work in progress. 2. IMO....and I'm sure Chuck can clear this up better than I, but is there really enough data to work with? I personally don't think so. If disc golf were as popular as baseball or golf for that matter, the numbers would be more "fine tuned" because you have more players. More players equals more data. You have a fifty fifty chance of getting heads or tails when flipping a coin. If you flip it 100 times I bet you never get 50/50 on the dot.......(more like 70/30 or 60/40). If you flip it a million times you will get closer and closer to that "50/50" on paper. Just my thoughts

Hi BionicRib,

Oh definitely, due to sample size problems, the inherent random nature of disc golf (somewhat mitigated by course and equipment design), and physical/muscular limitations on just how 'controllable' disc golf is, period, getting a 100% correlation coefficient between player rating and event score is going to be impossible.. not to mention not ideal anyway.

However my hope is, that with one slight adjustment to how player round ratings (and by extension player ratings) are calculated, we can very, very slightly increase the correlation coefficient between (initial) player rating and event score. I don't really know what could be expected in terms of improvement with this one change.. probably less than 1%.. but as I said, I don't have the ability to determine this, due to not having access to the *exact* same rating methodology that the PDGA uses.

jenb · Mar 1, 2013

Of course, one answer is that it doesn't really matter as long as it's applied consistently to everyone.

When someone gives that answer, I want you to look them square in the eye and say, "Momma said knock you out."

Steve West · Mar 2, 2013

Why assume "linear"?

Hampstead · Mar 2, 2013

I just want to say that I love playing, but the OP made my head swim.
Too smart for me.
Definitely not hating, I'm impressed with all the info.

DGstatistician · Mar 2, 2013

jeverett said:
Hi BionicRib,

Oh definitely, due to sample size problems, the inherent random nature of disc golf (somewhat mitigated by course and equipment design), and physical/muscular limitations on just how 'controllable' disc golf is, period, getting a 100% correlation coefficient between player rating and event score is going to be impossible.. not to mention not ideal anyway.

However my hope is, that with one slight adjustment to how player round ratings (and by extension player ratings) are calculated, we can very, very slightly increase the correlation coefficient between (initial) player rating and event score. I don't really know what could be expected in terms of improvement with this one change.. probably less than 1%.. but as I said, I don't have the ability to determine this, due to not having access to the *exact* same rating methodology that the PDGA uses.

I will preface this by saying that currently my teaching schedule has me too busy to look through data and figure out exact models. In May, it will be in between Spring and Summer semesters and I will have plenty of time to look into the data and actually test some of my theories.

You make a very valid point that the binary model may not be as accurate as a linear model. (This is a classic cost of simplicity debate). However, I believe that this is also not accurate. It is likely that we should be looking at a quadratic of some sort. Although I am not sure of what model I would specifically fit to the design, the difference between harder or easier courses is likely not linear.

I also plan on looking into the lag that is used for ratings. Specifically the use and value of rounds played over 6 months previously.

jeverett · Mar 2, 2013

DGstatistician said:
I will preface this by saying that currently my teaching schedule has me too busy to look through data and figure out exact models. In May, it will be in between Spring and Summer semesters and I will have plenty of time to look into the data and actually test some of my theories.

You make a very valid point that the binary model may not be as accurate as a linear model. (This is a classic cost of simplicity debate). However, I believe that this is also not accurate. It is likely that we should be looking at a quadratic of some sort. Although I am not sure of what model I would specifically fit to the design, the difference between harder or easier courses is likely not linear.

I also plan on looking into the lag that is used for ratings. Specifically the use and value of rounds played over 6 months previously.

Hi Dgstatician,

Yes, I agree with you that the assumption of a linear distribution of initial player rating vs. round score is not necessarily 100% valid, and some of the data I've collected does somewhat contradict that assumption, particularly with very long/difficult courses or lots of forced distance/carry holes.. i.e. course layouts where lower-skill players are having to actually throw different shots than players who can out-drive them. If you look at the plot of the Memorial, for example, and look at the distribution above 1020, the assumption of the same linear distribution as the rest of the skill range is questionable.

Round 1 (MPO and FPO):

Round 2 (MPO and FPO):

I agree, though, that finding an appropriate (possibly quadratic) model for all events/rounds that ultimately increases the correlation coefficient between initial player rating and round score is going to be some work.

This is actually what BionicRib was discussing with Chuck.. so he's definitely been thinking along these lines, too.

jeverett · Mar 3, 2013

Thinking a bit more on the topic of finding a more-accurate fit than linear to represent player rating vs. round score, I feel like it's coming down to two (maybe three) variables:

Putting - as player rating increases, their ability to make putts absolutely increases, but what is the relationship between these two? Is it linear? Does it have some kind of diminishing returns? Has anyone ever attempted to test this relationship? e.g. have a wide range of rated players each attempt a large number of (static distance) putts, and record their percentages? Anyone who's ever run putting contests happen to have any data on this topic?

I'd love to see some.

Drive distance - as player rating increases, their drive distance definitely increases, but again I'm uncertain as to what the relationship between player rating and drive distance is (not to mention we may also need to deal with drive accuracy in here too). This relationship is most definitely not just linear. First, depending on the hole length, there's going to be a point at which a higher-rating/skill/distance player has the possibility of shaving a whole throw off their score, that a lower-rating/skill/distance player simply cannot reach. Imagine a course of all 480ft. holes.. a 1020+ rated player could theoretically be in birdie range on every hole, yet a player rated in the lower 900's is best-case getting Par on every one of those. Plus, as overall drive distance increases, overall putting distance decreases: i.e. drive distance is affecting round score twice. If we were to combine drive distance with 'accurate' distance, we could theoretically measure the relationship between player rating and round score with this regard.. e.g. pick a circle radius, and a necessary accuracy percentage (e.g. 90%), and see at what maximum distance players can hit that circle 90% of the time vs. their rating.

Any thoughts so far? Anyone attempted to estimate/determine any of these relationships?

Melynda · Mar 3, 2013

jeverett said:
x

if it's easy, will you distinguish the FPO scatter from MPO?

the PDGA Ratings Team only assessed MPO in the article, "Correlation for Better Courses," so i'm missing out on all the fun !!!!!!!

Theme font size

Search

Theme font size

When is a 1020 round not a 1020 round?

jeverett

Double Eagle Member

BionicRib

* Ace Member *

Dave242

* Ace Member *

jeverett

Double Eagle Member

BionicRib

* Ace Member *

jeverett

Double Eagle Member

jenb

* Ace Member *

Steve West

* Ace Member *

Hampstead

* Ace Member *

DGstatistician

Newbie

jeverett

Double Eagle Member

jeverett

Double Eagle Member

Melynda

Newbie

Similar threads

Latest posts