On VP Scales

1. Introduction

2. Victory Point (VP) scales - what are they?

3. How many imps is a VP?

4. How many matchpoints is a VP?

5. How do we create a Butler Imp VP scale?

6. Conclusion in respect of the anomalous results

7. What affects K?

8. Assigning matches

9. What is A+/A-?

10. What VP scale for aggregate scoring?

11. Fixing Washington VPs

12. Late round entry into a VP Swiss pairs

13. VPs for Patton scoring

 

1. Introduction

Mathematicians and statisticians are going to curl up and die here, but engineers understand the tools they use (they leave the clever stuff to the mathematicians and use the worthwhile bits) and this works.

This article was originally written as a result of some anomalous results from an on-line 15 Board Butler imp on-line Swiss competition, played over 6 rounds with "incomplete-Barometer" scoring. Since matches were played during the course of a week, early matches had less accurate barometers than late played matches; hence my term "incomplete-Barometer". The recommendation contained in the EBU White Book at the time of the event was to halve the number of boards and use the VP scale for that number. This advice proved to be flawed. Later work (see Sections 4 and 11) shows the White Book Swiss Pairs VP scale was seriously flawed too. Another problem was discovered with late round entry into Swiss after being ko'd from the main event (Section 12).

2. Victory Point (VP) scales - what are they?

The English Bridge Union (EBU) uses VP scales for Swiss events, both Swiss Teams and Swiss Pairs and also for all-play-all teams events. With some fudging VP scales can also be used for Butler imps events. There are a number of different sorts of VP scales, but the ones used by the EBU are designed to give an equal probability of any result from 20-0 through 10-10 to 0-20. The statistical caveat to this is that the matches themselves are between teams (pairs) of equal strength, and each board is independent.

VP scales can be used for any sort of event where a scoring method yields a normal distribution of results. In this scale there are 21 different possible results so each VP result represents 100/21 = 4.7619% of the complete range of results. Note that this means the score 10-10 will occur 4.7619% of the time but all the other scores will appear twice as often (eg 11-9 will appear as often as 9-11 as often as 10-10, but 11-9’s will appear 9.5238% of the time). If we know the std deviation of a set of results we can express the VP scale as a number of standard deviations from the mean for each score. Below is an extract from a table of the normal distribution expressed in standard deviations from the mean. (I've nicked a table that's public domain and just show a few rows so you can see how it works).

Table 1. Gaussian distribution table
z.00.01.02.03.04.05.06.07.08.09
1.00.158650.156250.153860.151500.149170.146860.144570.142310.140070.13786
0.90.184060.181410.178780.176180.173610.171050.168530.166020.163540.16109
0.80.211850.208970.206110.203270.200450.197660.194890.192150.189430.18673
0.70.241960.238850.235760.232690.229650.226630.223630.220650.217690.21476
0.60.274250.270930.267630.264340.261080.257840.254620.251430.248250.24509
0.50.308530.305020.301530.298050.294600.291160.287740.284340.280950.27759
0.40.344570.340900.337240.333590.329970.326350.322760.319170.315610.31206
0.30.382090.378280.374480.370700.366920.363170.359420.355690.351970.34826
0.20.420740.416830.412930.409040.405160.401290.397430.393580.389740.38590
0.10.460170.456200.452240.448280.444330.440380.436440.432500.428570.42465
0.00.500000.496010.492020.488030.484040.480060.476070.472090.468110.46414

At least it’s reassuring to see that 68.27% of all results fall within 1 std dev of zero. Don't believe everything you read on the net.

All we need to do is to interpolate the number of standard deviations for cumulative intervals of multiples of 4.7619% to get the VP scale expressed in standard deviations. I’ve used straight-line interpolation for the third digit as it’s not particularly significant in the grand scheme of things.

There is a case for awarding negative VP's to a side which has been soundly trounced. In particular it makes it much less attractive to "shoot" for a good result when one is currently on a score of about 1 or 2 VP's. An assertion of Mike Pomfrey's is that for negative VP's half of the range of a whitewash should be given to 20-0, and the remaining half of the range should be equally distributed between 20-(-1) through 20-(-5). There is good justification for this unequal split in that for any head to head match moving from 19-1 to 20-0 there is a net difference of 2 VPs whereas moving from 20-0 to 20(-1) is a net difference of 1 VP, and so the first half of a score of 20 can appropriately be assigned to 20-0. This extends the table from 20-0 through to 20-(-5)

Table 2. The 20-0 VP scale expressed
in standard deviations

        

Table 3. Negative VPs for a 20-0 scale
expressed in standard deviations

Cum. %age
from mean
Std devs
from mean
VPs
00.0000-02.38100.000-0.05910-10
02.3810-07.14290.059-0.18011- 9
07.1429-11.90480.180-0,30312- 8
11.9048-16.66670.303-0.43113- 7
16.6667-21.42860.431-0.56614- 6
21.4286-26.19050.566-0.71215- 5
26.1905-30.95240.712-0.87616- 4
30.9524-35.71430.876-1.06817- 3
35.7143-40.47621.068-1.30918- 2
40.4762-45.23811.309-1.66819- 1
45.2381-50.0 1.668+ 20- 0
Cum. %age
from mean
Std devs
from mean
VPs
45.2381-47.61901.668-1.98120- 0
47.6190-48.19521.981-2.07420 -1
48.1952-48.67642.074-2.18920 -2
48.6764-49.04762.189-2.34420 -3
49.0476-49.62382.344-2.59220- 4
49.6238-50.0 2.592+ 20 -5

3. How many imps is a VP?

Some years ago John Manning published what at that time was original research on ideal VP scales for EBU events. To get to a usable scale for a Team-of-4 match, we need to know that the standard deviation expressed in imps is K x sqrt(n) where K=6.5 and n is the number of boards in the match. The figure of 6.5 is the standard deviation for a 1-board match, and was derived empirically by Manning et al after considerable study of large numbers of match results. Indeed there is some discussion as to the exact value of K but there is confidence it lies between 6.0 and 7.0, with higher values in this range preferred when contestants are not equally matched. Manning produced values of 6 (for a 13 round Swiss simulation) and 6.65 for round robins wheras McKinnon, an Aussie, used 7 but that was a while ago when perhaps bidding was less accurate. Max Bavin for a long time used 20/3 which is simply vulgar.

As a result of some theory by Mike Pomfrey, we can convert Teams-of-4 scales to Teams-of-8 scales, as long as we cross-imp the Team-of-8 results (giving 4 comparisons per board). Pomfrey asserts that the relationship between two scales varies as the square root of the product of the number of comparisons and the number of results [root(CxR)]. In fact with a bit of juggling we note that teams of 4 has C=1, R=2 and so the generalised standard deviation for any cross-imped match is K x sqrt(n x C x R / 2)

So let's construct our VP table for an 8-board Teams-of-4 match using K=6.5, C=1, R=2 and n=8 (Std dev=18.38). The EBU uses this scale for 7-9 boards, but what the heck, it's actually computed for 8 boards. We might as well construct the table for negative VP's too.

Table 4. VP table for 8 board matches          Table 5. Extension to the table for negative VPs.
Std devs
from mean
VP scoreComputed
Imp range
White Book
Imp range
0.000-0.05910-10 0.0- 1.1 0- 0
0.059-0.18011- 9 1.1- 3.3 1- 2
0.180-0.30312- 8 3.3- 5.6 3- 4
0.303-0.43113- 7 5.6- 7.9 5- 6
0.431-0.56614- 6 7.9-10.4 7- 9
0.566-0.71215- 510.4-13.110-12
0.712-0.87616- 413.1-16.113-15
0.876-1.06817- 316.1-19.616-18
1.068-1.30918- 219.6-24.119-23
1.309-1.66819- 124.1-30.724-29
1.668+ 20- 030.7+ 30+
Std devs
from mean
VP scoreComputed
Imp range
Sensible
Imp range
1.668-1.98120- 030.7-36.431-36
1.981-2.07420 -136.4-38.137-38
2.074-2.18920 -238.1-40.239-40
2.189-2.34429 -340.2-43.141-43
2.344-2.59220 -443.1-47.744-47
2.592+ 20 -547.7+ 48+

It is worth noting that although 10-10 is computed as a 2.2 imp spread -1.1 through +1.1 we score it as a single imp spread of zero to zero. This is done so that the other intervals can slowly increase up to the standard deviation (17-3) at which point the ranges increase much more quickly. We need to be looking at matches of about 14 boards before it is sensible to assign the 3 imp spread (0-1) to the score of 10-10. Indeed the VP scale is at best an approximation as one can see. Also here we have used a K of 6.5 whereas there's a much better fit to the White Book with K=6.25 which gives a 20-0 of 29.5, which I suspect is what was used for the White book tables. I wonder whether hand dealing vs computer dealt hands has an effect on K? Manning certainly used hand dealt data.

Using the K x sqrt(n) formula we can easily see that for a match of 32 boards we "know" that the 20-0 score will be about 60 imps, the ratio of sqrt(32) to sqrt(8) (actually 61 imps, since it's quoted as a range of boards).

Following Pomfrey we have for Teams-of-4 C=1; R=2 and for cross-imp Teams-of-8 we have C=4; R=4, and the relationship is that of root(2) to root(16). So if you want to devise a VP scale for cross-imp teams of 8 playing 8 boards, you simply multiply the Teams-of-4 imps by 2 x sqrt(2) and you get 85 imps is a 20-0 win

4. How many matchpoints is a VP?

We can also use Tables 2 and 3 to create Swiss Pairs VPs. This is based on the score for a 1-board match, where 100% awards a 20-0. It follows a 4-board match would have a 20-0 of 75% and the generalised formula for Swiss pairs is 50+50/sqrt(n) for the 20-0. fwiw this gives a standard deviation for the 8 board match of 50/(1.668 x sqrt(8)) = 10.598 and the mean is 50. Another way of looking at it is that 20-0 is 67.68%. Technically this is true only for fields of 21 results. What surprises me is that the overall frequency of the different VP scores really does behave much as the statisticians predict, but there you go; they must be clever people. The 2004 White Book has 65.99% and gives too many 20-0's. It doesn't work, What a surprise! See Section 11 for an analysis based on the results of Brighton 2005 Swiss Pairs

5. How do we create a Butler Imp VP scale?

We now move on to Butler imp VP scales, and to put it mildly it is yucky. Max Bavin suggests "To convert Butler IMPs to normal IMPs, I think that you should multiply by 5/6 of root (2R/C). Assuming that t = (c+1) [everyone plays all the boards], then the root factor (2R/C) is definitely correct. The 5/6 factor is a Bavin invention to do with the fact that IMP scales are non-linear". This is a discussion Max and I have been having for a while, but I think we're in agreement. Bavin also suggests and I agree that there is a "bigger variance in standard in On-Line bridge than in f2f national championships"

The problem is due to using a datum and due to the non-linearity of the imp scale. If you win a board at teams of 4 by +420, +50 for 470 you get 10 imps. At Butler, if the datum is near the midpoint (ie half the field made it, and half the field didn’t) say +180 then we score 6 imps against the datum and our opponents lose 6 imps against the datum and so we are net +12 imps, but if the datum is close to 420 or 50 we only score 10. This perplexed all of us - how do we find a VP scale? We concluded, again from inspection of a large numbers of imp results, where the board has been played a large number of times, that Butler imps overstates the number of imps compared with teams of 4 by a factor that is between 1.18 and 1.2, let’s call it 6/5. This conclusion has been reached relatively recently, and is empirical.

As to the higher variance in online bridge one can reasonably say that the Brighton field is much more uniform than that playing on-line and so we should adjust our K upwards. But there’s another problem with on-line games where we use barometer scoring. You know with 1 board to play what your score is and, say you’re trailing 19-1, then the very shape of the VP scale makes it worthwhile to "shoot" as your maximum loss is 1VP but your gain could be several VPs. This means that each board is not independent of all others, as your result on the last board is affected by your results on the others. And further, board 15 of such a match contains a larger number than usual of wild swings and the datum is all over the shop. Also because of the shooting effect there is a case for pushing the std dev up, but this is a 1 board effect and I've ascribed 1 imp to it (I have no justification but I think it's a reasonable idea.)

What we need to do is convert normal f2f imps to Butler online imps so we can establish a VP scale. By inversion the formula will be 6/5 x sqrt(C / (2 x R)) and we can plug that into our extant formula for "normal teams" (ie Teams-of-4) where we know R=2 and C=1 giving B x K x sqrt(n x C / (2 x R)) + 1 as the STD dev of an online Butler game, where B is the Bavin or Butler factor, C is No.Tables - 1 and K is now 7.

So let us consider a 15 board match, "normal" Teams of 4: we know the std Dev is 6.5 x sqrt(15) = 25.17 and we can multiply by 1.668 to get the 20-0 score = 42.00. For the equivalent online Butler game with 15 tables in play we get a Std Dev of 1.2 x 7 x sqrt(15 x 14 / 30) + 1 = 23.22 and a 20-0 score of 38.73. The nearest published VP scale is for 10-13 boards with a 20-0 of 36 imps, but we can devise our own - I won't bore you with the math.

There are two things we can do about the "shooting". Firstly we institute negative VPs to cut down the instances of shooting in which case we can remove my 1 imp adjustment, and secondly we can start each table on a different board number, to maximise the chances of an "honest" datum.

6. Conclusion in respect of the anomalous results

So there we have it. For the game in question a 15 table, 15 board online Butler:
1) 20-0 should have been 39 imps and not 30.
2) We should have used negative VPs.
3) Each table should have started with a different board.
... and inspection of the results with a 39 imp 20-0 shows good correlation with the requirement to equalise all likely scores.
4) The best advice is "Use the same scale if the number of boards is in the top half of the range, and the next scale down if the number of boards is in the bottom half, including the midpoint". The EBU White Book now recommends just going to the scale below. It's not perfect but it's good enough; certainly better than the original recommendation.

7. What affects K?

We know some of these things in my list have an effect, others are just surmise.

1) Variance in the strength of the field
2) Barometer scoring
3) Online play
4) Computer dealing?
5) Homogeneity of bidding system?

Max Bavin has promised me the Brighton Swiss Teams 2004 cards, so perhaps I can answer a few more questions later. For the record I missed them, but will pick up the 2005 cards.

Just for the record BCL! ran the event again in 2005 using the VP table I constructed. The conformity of the results with the theory was quite remarkable. Both Max Bavin and I were even more convinced that B = 6/5 really does work for Butler.

8. Assigning matches

We need also to consider what is the "best" way of assigning the matches. I think it's reasonable to use random draw for the first round, and indeed this is what is normally done by the EBU. Seeded first rounds seem to get bad press. I believe some ABF games are seeded with top half teams drawn against bottom half teams. This round is known as the 'bloodbath'. It certainly gainsays the requirement that teams should be of equal strength as mentioned in section 2.

We have quite a choice of methods: random draw, raw score difference, raw score quotient, capped raw score, swiss count-back to name but a few. Let's look at these methods in some detail, taking an 8-board Swiss teams for the basis of discussion

8.1 Raw score difference (goal difference). This looks quite attractive until one considers the team who took 70 imps out of a team of bunnies on the first round for their 20-0. The side effect is that they will be saddled with playing the next strongest team on their score for the rest of the competition.

8.2 Raw score quotient (goal average). Let's consider the wild but strong team who are imp generators, who win their match 70-40 for a score of 20-0, and the tight and strong team who win their match 35-5. The first team has an average of 1.75 and the second 7.0 - yet they have the same score. Should the tight team be saddled with other tight teams and the Frenzied Four have to play yet another Oxbridge 1st team?

8.3 Swiss count-back (strength of previous opponents). So for the 2nd round we'd better have another random draw. On the third round we will find that the teams who started slowly will have played teams who are more likely to have gotten better scores generally and will therefore be computed to have played stronger teams. This seems to be ok in a sense, but you could just as well rank the teams in order of strength of previous opponents and get a totally different assignment list with vast numbers of mis-matches, as compared with their actual VP scores.

8.4 Capped raw score instead of VPs. We'll set the cap at the 20-0 score. This looks ok too, until you think of the team who takes 3 x 30 imp wins for +90. There won't be a team close to them, and the competition is over for the rest of the contestants. This is why we use Swiss, so that no one team can run away, and to make the competition more attractive for the bulk of the contestants.

8.5 Random draw. I feel strongly that this must be the correct method. A Swiss is designed to find a winner. If the 20-0 scale is fine enough to find a winner then it's done its job. If you don't like having random draw then use a 30-0 or 40-0 scale to decrease the size of the groups on the same score. But why even bother with this, as it makes no difference in doing its job of finding the winner. The important point of a Swiss is that one divides the field into equal parts, and if you achieve a given score then, for that event, for that field, that is your measure of merit and you have no merit more or less than any other team on that score.

In conclusion, if you're going to Swiss then use a fine enough scale by all means and pick any of the above methods, all of which are flawed, or recognise that a Swiss is designed to find a winner and random draw is equally unfair to everyone.

9. What is A+/A-?

Law 86A is clear that for teams of 4, head to head A+ is 3 imps. We have to do our best with that. However we do have methods of translating this 3 imp value to other forms of scoring than Teams of 4, (eg cross-imped Teams of 8, or Butler) and any arguments about the value of A+ are resolvable by this means. I've no doubt everyone will continue to argue the toss, but it really is spitting to windward.

10. What VP scale for aggregate scoring?

I had an enquiry from the Sheffield League to produce a 12-0 VP scale for aggregate League matches of 24 boards. Here we are interested in 13-tiles rather than 21-tiles. So I constructed a 13-tile table as follows from the Guass tables (column 1), VP scores required (Column 2), Score ranges by multiplying Column 1 by the standard deviation (1790) computed from 418 match results (Column 3). The figures in column 4 become the VP table. Trivial really. Just for fun I plugged the 418 results back into the table as a check (Column 5). Remember 6-6 only shows half the expected number as there are negative scores that get 6 as well. I think it looks ok. Note we can compute K as 1790/sqrt(24) = 365, and can use this value for matches of any length, or complexity. Heaven forbid we ever do teams-of-8 cross aggregate :)

I was eventually asked to round off the scores a bit, and we concluded "100 for a win" sounded appropriate and I suggested the scores in the final column

Table 6. Aggregate scoring 12-0 VP scale
Std devs
from mean
VP scoreComputed
Score range
Published
Score range
Observed
frequency
Practical
scale
0.000-0.097 6-6 0 - 173 0- 17041 0- 90
0.097-0.293 7-5 173- 524 180- 52066 100- 490
0.293-0.502 8-4 524- 899 530- 89052 500- 890
0.502-0.736 9-3 899-1317 900-131062 900-1290
0.736-1.02010-21317-18251320-1820741300-1790
1.020-1.42611-11825-25521830-2550601800-2490
1.426+ 12-02552+ 2560+ 632500+

11. Fixing Washington VPs

I propose to call Swiss Pairs VP scales Washington VPs. A very few of you will know why, and I'm not going to explain.

It has become apparent that the Swiss pairs VP scale in the White Book is flawed. The scale was published in 2004 and used inter alia at Brighton 2005. It generates too many 20-0’s. We would expect the 20-0 based on normal scoring for a field size of about 63 tables would use a score of about 67.39%. However the 20-0 is 65.99% in the White Book. Are we surprised the scale doesn’t work? No! Would we expect there to be too many 20-0’s? Yes! Can we estimate how many? Yes! Is this what we find? Hard question but very probably Yes. I’ve used the VP scores for Brighton 2005 for the observed data.

Table 7. Brighton Swiss pairs 2005 observed count of VPs by match
VP\Match123456 7 8 91011121314Total
10131011 81316 6 6 8 513121112144
111726182923161727232824202419311
122415232421162226221617201415275
131726262418221216212416232626297
142117312628262630172226282820346
152721201816163026301920261926314
162426152717201730213122172417308
172121182019221922202529222732317
182119241830353228252027292925362
193226352430223226363334282331412
204050363942464420343429323234512

Adjustment 1: Field size and Ascherman

To start with we're going to do some "arm-waving". Let us consider a single board in a pairs event scored 21 times. We will consider Ascherman’s variant on scoring where we compare with our own score as well as with all the other scores. This has the effect of normalising for field size, in the sense we score the same percentage for a given result whether it’s 3 or 3,000 tables. The reader will be aware that scores in small fields tend to be higher and lower than in large fields and this correction resolves the problem.

In our example a top is 41 (and a bottom is 1) against an average of 21. If we were to apply these matchpoint scores to a Swiss scale we would now want all scores from 20-0 through 0-20 to be represented. In fact we have defined the minimum possible value for a 20-0 here. Suppose we had 42 results, then a top of 83 against an average of 42 would be 20-0 and a tied top of 82 against an average of 42 would also be 20-0. Ascherman has resolved the field size problem.

Adjustment 2: No. of Boards and square root thingies

Now let us consider a 4 board pairs match. We know that the “top” is a function of the square root of the number of boards in the match, so we “know” that the 20-0 is half way between 50% and 41/42 expressed as a percentage. Indeed we can state the general case for an n board match. For the 20-0 in an n-board match we need to score 50 + 41/84 x 1/sqrt(n) and for n=8 we get 67.27% and this assumes we use Ascherman’s scoring.

If we use “normal” matchpoints the formula is 50+ 50/sqrt(n) and we’d get 67.68 but this would apply only to field sizes of 21 results where a top is 100%. We must look at 63 results as well as this is very close to a typical ebu event where the field is divided into pools of about 65 tables. 20-0 is 50+ (61/124)/sqrt(8). 122/124 represents 3 tied tops scoring 20-0 and half this is 61/124. The answer comes out at 67.39%. As expected this is less than 67.68 for the 21 table field using normal scoring but more than the Ascherman computed figure of 67.27. Indeed with larger fields we will tend towards the Ascherman figure which is the limit for an infinite sized field.

Patience dear reader. We’re getting there.

Gauss, STDs, distribution of scores and observed data

We know that 1.668 STDs gives a 20-0 in the standard EBU 21 point VP scale. Since I think a Brighton 20-0 should be 67.39+% which is 1.668 STDs, then 65.99% is 15.99/17.39 x 1.668, or 1.534 STDs which is 93.75% in the Gauss tables. In other words we EXPECT a 20-0 of 65.99% to produce 6.25% of 20-0’s if the 67.39 is correct. So I counted 257 results for each of 14 matches at Brighton from a fan-fold listing John Pain gave me. I predict 449 20-0’s and what do we find? No great surprise! There are 512 20-0’s out of 3598

Oh so nearly there! As discussed above, The Gauss stuff only works if the opponents are equally matched, so we’d expect fewer 20-0’s in the 2nd half of the event. (John Manning found some but not a great correlation at imps), but what happens here is remarkable. In the first half we’d predict 225 20-0’s and get 297 where the pairs are less equally matched and in the 2nd half where they are closely matched we’d expect 225 and we get 215. The real mathematicians can bugger around with my table if they like but it looks pretty conclusive to me.

Conclusion:

We have shown from observed data that the White Book Washington VP scale is seriously flawed. We have built a model to attempt to predict what the current scale would produce and find the model to be “pretty conclusive to me”. We believe that with Ascherman scoring the score for 20-0 should be at least 67.27+% for an 8-board match, and that scoring programs should use Ascherman to get the correct results from this table. Non-use of Ascherman will produce more 20-0’s than it should, more so for small fields. The table itself is trivial from here on in. One day I'll get around to it.

What standard deviation should we use?

OK, I've been bullied into producing a workable scale, the EBU is going to republish the White Book and we might as well fix it up. .... We should look more deeply into Washington VPs to see whether we can suck any more juices from Table 7. We should also consider the original (2003) table, the Washington (2004) table and the 2006 model table. We'll start with table 2 and stick some more columns on it.

Table 8. Various analyses of Swiss pairs tables for 7-9 board matches
Cum. %age
from mean
Std devs
from mean
VPs2003
triangular?
W'ton 2004
STD=9.59%
A'man 2006
STD=10.35%
Random draw
STD=11.65%
EBU 2006
STD=11%
00.0000-02.38100.000-0.05910-1050.0-50.550.00-50.5750.00-50.6150.00-50.6950.00-50.65
02.3810-07.14290.059-0.18011- 950.5-51.750.57-51.7350.61-51.8650.69-52.1050.65-51.98
07.1429-11.90480.180-0.30312- 851.7-53.151.73-52.9051.86-53.1452.10-53.5351.98-53.33
11.9048-16.66670.303-0.43113- 753.1-54.752.90-54.1353.14-54.4653.53-55.0253.33-54.74
16.6667-21.42860.431-0.56614- 654.7-56.554.13-55.4354.46-55.8655.03-56.5954.74-56.23
21.4286-26.19050.566-0.71215- 556.5-58.555.43-56.8355.86-57.3756.59-58.2956.23-57.83
26.1905-30.95240.712-0.87616- 458.5-60.756.83-58.4057.37-59.0758.29-60.2157.83-59.64
30.9524-35.71430.876-1.06817- 360.7-63.158.40-60.2359.01-61.0660.21-62.4459.64-61.75
35.7143-40.47621.068-1.30918- 263.1-65.760.23-62.5561.06-63.5562.44-65.2561.75-64.40
40.4762-45.23811.309-1.66819- 165.7-68.562.55-65.9963.55-67.2765.25-69.4364.40-68.35
45.2381-50.0 1.668+ 20- 068.5+ 65.99+ 67.27+ 69.43+ 68.35+

A quick check of the 2004 scale shows that it uses the same std deviation calcs from table 2 that I do, and that the STD of the 2004 table is 9.59.

With some help from Jonathan Cooke (who's a management Consultant manque, and whose stats is pretty good), I have attempted to assess the effect of pairs not being of the same strength on the expected results. This fact alone will generate a larger number of 20-0's than my model predicts and we have tried to measure it. Feedback from players who have expressed concern over the 2004 scale is that the 2003 scale was ok as far as 20-0 was concerned, and of course that a 15-5 was overpopulated by 30% in the original scale would neither be obvious nor noticed.

In a way we really want to get to a table that will behave "normally" on a random draw such as we have in round 1 & 2, and where 20-0's become progrssively tougher as the event progresses. This would produce final counts where 20-0 would be underpopulated. The argument goes along the lines that in a random draw a really good pair should hope to be able to extract the maximum from a weak pair but that too many 20-0's in the first round makes 20-0 meaningless. This approach makes most sense as the same scale is used for one day events too, where pair strengths aren't well matched even by the last round.

Cooke's stats shows that the STD for a "random draw" 8 board match is 11.73% with a 90% confidence level that it lies between 11.04 and 12.55. This is produced from the first two rounds of the Brighton 2005 results. This figure does not take account of Ascherman, which reduces the STD to about 11.65%. I can find little justification for using any other value. This figure needs to be normalised for the one board match and is: 11.65 x sqrt (8) = 32.95%. Programmers should produce the VP scale for the event they're running and publish it. Even the difference between 7 and 8 boards creates an enormous variance on 20-0's. If one takes this approach then normal match point scoring can be adopted, as the Ascherman adjustment can be built in for the actual field size and will be reflected in the scale. It's not rocket science.

Final outcome

A number of people discussed this during July 2006 with Max Bavin. Significant input from Frances Hinden, a spreadsheet from John Armstrong, the original germ at Brighton 2005 that all was not well from Alan Mould and a conclusion that no number was perfect was the outcome. Eventually Max settled on STD = 11%
'Many thanks to everyone who has contributed to this. At the end of the day I'm afraid that I've just "picked a number", and the number is a STD of 11.00% in an 8-board match. Whilst appreciating that this wont necessarily meet with universal approval, I hope neither does it meet with universal horror. Certainly it returns 20-0 back towards where it once was in the days before the complaints started to come in.'
It's about as good as we can get, maybe a tad low. In addition it solves a problem with the 15-5's in the 2003 scale I noted above. It's the best scale we've had, I shall monitor it.

12. Late round entry into a VP Swiss pairs

BCL! runs a head-to-head 18 board pairs ko where scores of 50%+ stay in the event for the next round over 6 rounds or so, with a subsiduary Swiss VP Pairs of 18 board head-to-head for those ko'd. We tried to set what the entry scores into the Swiss should be after being ko'd in later rounds. In 2005 we tried 12, 13, 14, 15, 16 additively and it led to people entering on too low a score, such that late ko's couldn't win the Swiss. A good model was a different 2006 event which was flighted by player strength (rating) into 7 flights of 8 pairs, all-play-all, meaning one genuinely played a pair of one's own strength over 18 boards head-to-head, scored ATF. The Spring 4's model, see above, was used in 2005 as it's the best we had and it didn't work. There's some 'orrible square root stuff if one gets rigorous - this stuff is 1st order approximation.

Let's suppose we assume anyone entering the swiss on a late round should be close to being the "current leader" in the Swiss as they are, by definition, "superior". If we look at the top scores in the 2006 flighted event we see leading scores after 6 matches of: 86, 83, 91, 76, 74, 85, 87 for each of the 7 flights; average 83.1. It is probably fair to say that flighted conforms well with a Swiss, as the pairs in both events are of "equal" strength. Actually the flighted probably conforms better, but we could assume these were the top scores in a 7x8 =56 pair Swiss.However, there was the secondary problem in that the Swiss pairs scale in use for the flighted event is known to be flawed and EBU has modified the scales. Whilst not yet in general use, we'd always done our best to stay abreast of current thinking and Max Bavin values our input so we used the new scale where 20-0 was a bit tougher. The rough and ready effect is that one loses about 3/4 VP per round played if one has a perfect score, or 1/2 of 3/4VP for each round if one is averaging 15/20 (you'd lose nothing if you were averaging 10/20). This means the leading players would be 6 rounds of 3/8 VP worse off (ie 2.2 VPs worse off) in a 6 round case. So an average top score in the 7 flights would be 80.9 rather than 83.1, and a leading score would be 88.8.

It seemed to me therefore that we should be aiming to get a score a bit less than 89 if one entered after a 6th round of the Swiss has been played. I'm talking here about the final ko's where only one more round of Swiss is to be played, and where we want the latest entrants to be on table 1 or 2. The method needed to be easy to administer.

An essence of fairness suggests that each succeeding round should be more heavily rewarded, and this suggested we use 12,13,14,15,16,17 additively for a total of 87 after 6 rounds. How do we use these numbers if we only have 4 rounds before the final entry? Since the leading VP score will be closer to 100% with fewer rounds I designed the scores such that one entered with a higher average VP score after 4 rounds of Swiss where this is the final entry to the last round of Swiss, and this suggested I should make the final round "add 17" and work backwards from there.

I checked the 4th round leaders of the flighted event and found: 66, 56, 58, 57. 54, 64, 59, which is should reduced by 1.5 if we'd used the new VP scales and final entry after 4 rounds gives 14,15,16,17; ie 62, which is close to the top score of 64.5. A similar check of the 5th round case gave me 75,72,77,68,58,65,.73 less 2 VPs and an entry at 75, so it all hung together.

The way it works in practice is to consider the sessions: assume 6
Session 1; ko's, no Swiss
Session 2; Swiss for losers, start on 0
Session 3, Swiss with new entrants starting on 14; play opponents from the Swiss. The VP scale suggests you're 1/3 way down the field
Session 4 Swiss, enter with 29, you're doing a bit better
Session 5 Swiss, enter with 45, better still
Session 6 Swiss, enter with 62. Observed data suggests you're at table 1 or 2.

So for 2006 we did the following
1) used the new VP scale
2) "Add 17" for the last entry to the Swiss. ie with 6 total sessions start with 14 in Session 3, 7 sessions with 13, 8 with 12.
3) New entrants play pairs already in the Swiss - a bit unfair in session 3, but better than being drawn together

The true math is more complex but this looked ok for what we're doing.

13. VPs for Patton scoring

Herman de Wael wondered about VP tables for use with Patton scoring. 2, 3 or 4 board rounds. What is needed here is for the VPs awarded on the point a board (2 per board) to be the same as for the Imped match result. I won't go on about how stupid Patton scoring is. Neither fish nor fowl. So for 2 board matches we need a 4-0 scale; 3 board matches a 6-0 scale and 4 board matches an 8-0 scale. I guess this method is better than any other of the perversions used for Patton. Back to the tables!

We need quintiles for the 4-0; heptiles for the 6-0 and 9-tiles for the 8-0. We will use K = 6.5

Table 9. The 4-0 VP scale; STD=9.1924
Cum. %age
from mean
Std devs
from mean
Computed
Imp range
Patton-4
Imp range
VPs
00.0-10.00.000-0.2520.0-2.30-22-2
10.0-30.00.252-0.8412.3-7.73-73-1
30.0-50.00.841+ 7.7+ 8+ 4-0

Table 10. The 6-0 VP scale; STD=11.2583
Cum. %age
from mean
Std devs
from mean
Computed
Imp range
Patton-6
Imp range
VPs
00.0000-07.14290.000-0.1800.0- 2.00- 13-3
07.1429-21.42860.180-0.5662.0- 6.42- 64-2
21.4286-35.71430.566-1.0686.4-12.07-125-1
35.7143-50.00001.068+ 12.0+ 13+ 6-0

Table 11. The 8-0 VP scale; STD=13.0
Cum. %age
from mean
Std devs
from mean
Computed
Imp range
Patton-8
Imp range
VPs
00.0000-05.55560.000-0.140 0.0- 1.8 0- 14-4
05.5556-16.66670.140-0.431 1.8- 5.6 2- 55-3
16.6667-27.77780.431-0.763 5.6- 9.9 6- 96-2
27.7778-38.88890.763-1.220 9.9-15.910-157-1
38.8889-50.00001.220+ 15.9+ 16+ 8-0

I got to do a bit more work on these tables, the EBU doesn't necessarily award the same VPs for the imps as the point-a-board

Table 12. VP table for hybrid imps scoring

Imp Hybrid HP Scales

This table can be used to construct the HP table for the Imp part of hybrid teams scoring if you go that route. Award 2 HPs per board on a point-a-board basis; (10 points is still a tie). Decide how many HPs you want to award for the match, which will be added to the point-a-board HPs. With short rounds you may want to award half HPs and you do this by using the HP scale for twice the number and then halving the HPs. Eg: You’re playing 3 board rounds and want to award an extra 4 HPs for the match in halves for a grand total of 10HPs. Use the HP scale “8”, and the 3-board round column. Halve the HPs
  NB, The K and STDs are shown so you can build further tables if needed

B/Match

2

3

4

5

6

7

STDs
from
mean

HPs

K=6.5

9.1924

11.2583

13.0000

14.5344

15.9217

17.1974

HP
Scale

Imp Difference

4

 0- 2

 0- 2

 0- 3

     

0.000-0.252

 2-2

 3- 7

 3- 9

 4-10

     

0.252-0.841

 3-1

 8+

10+

11+

     

0.841+

 4-0

 

6

 0- 1

 0- 1

 0- 2

 0- 2

 0- 2

 0- 2

0.000-0.180

 3-3

 2- 5

 2- 6

 3- 7

 3- 8

 3- 9

 3- 9

0.180-0.566

 4-2

 6- 9

 7-11

 8-13

 9-15

10-17

10-18

0.566-1.068

 5-1

10+

12+

14+

16+

18+

19+

1.068+

 6-0

 

8

 0- 1

 0- 1

 0- 1

 0- 1

 0- 1

 0- 2

0.000-0.140

 4-4

 2- 4

 2- 4

 2- 5

 2- 6

 2- 6

 3- 7

0.140-0.431

 5-3

 5- 7

 5- 8

 6- 9

 7-11

 7-12

 8-13

0.431-0.763

 6-2

 8-11

 9-13

10-15

12-17

13-19

14-20

0.763-1.220

 7-1

12+

14+

16+

18+

20+

21+

1.220+

 8-0

 

10

 

 0- 1

 0- 1

 0- 1

 0- 1

 0- 1

0.000-0.115

 5-5

 

 2- 4

 2- 4

 2- 4

 2- 5

 2- 5

0.115-0.349

 6-4

 

 5- 7

 5- 7

 5- 8

 6- 9

 6-10

0.349-0.604

 7-3

 

 8-10

 8-11

 9-13

10-14

11-15

0.604-0.908

 8-2

 

11-15

12-17

14-19

15-21

16-22

0.908-1.336

 9-1

 

16+

18+

20+

22+

23+

1.336+

10-0

 

12

 

0

0

 0- 1

 0- 1

 0- 1

0.000-0.097

 6-6

 

 1- 2

 1- 3

 2- 4

 2- 4

 2- 4

0.097-0.293

 7-5

 

 3- 5

 4- 6

 5- 7

 5- 8

 5- 8

0.293-0.502

 8-4

 

 6- 8

 7- 9

 8-10

 9-12

 9-12

0.502-0.736

 9-3

 

 9-11

10-13

11-14

13-16

13-17

0.736-1.020

10-2

 

12-16

14-18

15-20

17-22

18-24

1.020-1.426

11-1

 

17+

19+

21+

23+

25+

1.426+

12-0

Table constructed by John Probst, January 2008

This article is always under construction

John Probst, February 2004; revised April 2004;
August 2005, added section 10.
June 2006, added last column to Section 10; updated Section 4;
added new section 11 on Swiss pairs to fix problems noted in Section 4.
August 2006, added section 12.
February 2007, added section 13.
January 2008, added more tables for hybrid scoring in Section 13.
this article is in the public domain subject to acknowledgement