Methodology
The KPoz sports ranking system is a linear probability model based on the logit regression technique. The model assigns a power rating to each team based on the results of games played (i.e. wins and losses) and incorporates a home field advantage term to adjust for the psychological edge from playing at home. Teams are then ranked according to their power ratings.
In addition to its high degree of accuracy, the KPoz model satisfies the required properties of statistical ranking models: transparency, verifiability, and accuracy:
- Transparency: Computer ranking models used to determine which universities are selected for postseason play need to be transparent. The public needs to understand exactly what the model is trying to accomplish as well as determine if the approach is intuitive and reasonable. Black box models where the approach is hidden from the public scrutiny should not be used in the process since they do not allow critique, evaluation, or comparison.
- Verifiability: Computer models need to be verifiable. The public needs to be able to readily duplicate the results to ensure proper and correct solutions. A proper ranking model should only have a single solution and that exact solution should be computed each time the model is solved with the same underlying dataset. This will also allow evaluation and critique of the methodology, process, and results. It will also build overall consumer confidence in the process.
- Accuracy: Computer ranking models need to be accurate. The models need to be able to assess quality and relative strength, and provide an accurate ranking of teams based on available data. Models must be able to contribute value to the ranking process. Models with results that are frequently discarded from the formula because they represent either the highest or lowest rankings or some other unreasonable result provides relatively little value to the ranking scheme and should not be included in the ranking scheme.
Model Description
The KPoz sports ranking system is a linear probability model based on the logit regression technique. The model determines a power rating for each team based on wins and losses, and venue (and home field advantage) to accommodate for the psychological edge gained from playing at home. The model incorporates overall strength of schedule, power rating of the opponents, and home field advantage. Teams are then ranked according to their power rating.
The KPoz regression equation is as follows:
Y = ฿0 + ฿H ฿A (1)
where
Y = natural logarithm of the odds ratio
฿0 = home field rating
฿H = power rating for the home team
฿A = power rating for the visiting team
Mathematically, we have
Y = ln | ( |
pHA
1 pHA
|
) | (2) |
where
pHA | = probability that the home team will defeat the away team | ||||
1 pHA | = probability that the away team will defeat the home team | ||||
pHA
1 pHA
|
= odds ratio (percentage of wins-to-losses) | ||||
|
= natural logarithm of the odds ratio |
The odds ratio above is the ratio of wins to losses. It denotes the number of wins per each loss. It is typically computed as the percentages of wins to percentage of losses (e.g., winning percentage divided by losing percentage).
It is important to note in equation (1) that the sign of home team is always positive, e.g., and the sign of the away team is always negative, e.g., ฿H. Additionally, the home field advantage term is also always positive, e.g., + ฿0 .
Example
Suppose we have the following power ratings ฿0 = 0.25, ฿H = 3.25, and ฿A = 2.75. Then, the probability that the home team will defeat the away (visiting) team is computed as follows:
First, compute the natural logarithm of the odds ratio.
Y = ฿0 + ฿H ฿A = 0.25 + 3.25 2.75 = 0.75
So we have
Y = ln | ( |
pHA
1 pHA
|
) | = 0.75 |
Next, compute pHA as follows:
ln | ( |
pHA
1 pHA
|
) | = 0.75 |
pHA
1 pHA
|
= e0.75 |
pHA = |
e0.75
1 e0.75
|
= 0.68 |
So the probability that the home team will win is pHA = 68%.
The probability that the visiting (away) team will win is simply 1 minus the probability that the home team will win. That is,
pAH = (1 pHA) = (1 68%) = 32%
Specifying the probabilities
One difficulty that arises when estimating the parameters of the model, i.e., the power rating for each team, is that we do not know the exact probabilities that one team will defeat another team. Therefore, we can not compute true odds ratio or Y exactly. But this can be resolved by estimating the probability from each observation. The process is to assign p^= 0.90 if the home team won the game and p^= 0.10 if the visiting team won the game. Thus we have,
Y = | { |
|
So, if the home team wins we set Y = 2.20 and if the away team wins we set Y = 2.20.
Example - Estimating the parameters
Suppose we have the following six observations.
- JHome defeats KAway so Y = 2.20.
- LHome loses to JAway so Y = 2.20.
- KHome defeats LAway so Y = 2.20.
- MHome loses to KAway so Y = 2.20.
- MHome defeats LAway so Y = 2.20.
- JHome defeats MAway so Y = 2.20.
Then the model equations (1) are written as follows:
฿0 + ฿J ฿K = 2.20
฿0 + ฿L ฿J = 2.20
฿0 + ฿K ฿L = 2.20
฿0 + ฿M ฿K = 2.20
฿0 + ฿M ฿L = 2.20
฿0 + ฿J ฿M = 2.20
The ratings ฿0, ฿J, ฿K, ฿L, and ฿M are determined from regression analysis. But this requires a correction term for the matrix rank and for heteroscadisticity.