White Ball Analytics

This is the second post in a series, in which I outline my approach to assessing player value. The first explains the overall objective: to measure the expected contribution of each player in runs. This post then details four main adjustments that I make to historic performances to remove any obvious biases in the data

Brendon McCullum’s 158. A seminal moment in T20 history, announcing the BCCI’s inaugural boundary bonanza to the world. McCullum lead the visiting Knight Riders to victory, simultaneously delighting the home crowd in M Chinnaswamy Stadium with a dominant batting display never seen before in cricket. It remained the highest individual score in the IPL, and in all T20, until Chris Gayle in 2013. M Chinnaswarmy Stadium again. This time the home crowd were treated to one of their own players blasting boundaries against an over-matched Pune team. Gayle’s 175 still remains the highest score in T20 cricket

It is no coincidence that the two highest individual scores in the IPL were achieved in the same stadium. The short boundaries and high altitude allowed the game’s pre-eminent hitters to surpass even their own high standards. Batsmen are at an advantage playing here, bowlers a disadvantage. This, venue, is one of the four main factors that I try to adjust for when assessing player value

*Source: Wikipedia, which in term used ESPNcricinfo (but the formatting on Wikipedia is nicer)*

Venue

To assess the impact of each venue on scoring, I compare the average first innings scores across grounds within the same competition. For example, looking only at away teams (to avoid the complication that some grounds may be home to especially talented batting line-ups) the average first innings score at M Chinnaswamy Stadium is 168.3. Almost 12 runs above the IPL average (156.7)

In an ideal world, I would have enough data to analyse all grounds like this. Unfortunately, this would lead to extremely small sample sizes for some grounds: Nehru stadium has hosted just 4 IPL matches and it would not make sense to attach too much significance to the slightly below-par scores in those matches

Different samples used to assess typical venue scoring rates

To mitigate the small sample sizes, I also look at less reliable forms data sets: either innings, posted by any international side, both home and away. However, the most weight is always given to the most reliable data. For example, away teams in T20Is have averaged just 135.3 at M Chinnaswamy Stadium but this information is not considered to be as reliable as the high scoring trend seen throughout every IPL season (plus that only covers 3 matches)

Additionally, stats for all venues are regressed to the mean to avoid unrealistically large adjustments for venues which have hosted just one or two games and witnessed outlier performances. Stadiums like M Chinnaswarmy, for which there is plenty of data, are relatively unaffected by this step

Almost every adjustment listed in this post is made on a per ball basis. McCullum effectively received a 0.064 run advantage with every ball he faced on that night back in 2008. Facing 80 deliveries he gained a total advantage of 4.7 runs. That may not seem like much, when you are busy launching 10 fours and 13 sixes, but, for mere mortals, that is a huge advantage

Per ball adjustements made based on the typical first innings score at each venue / ground / stadium

There are disadvantages to the per-ball approach: it does not always reward players for sustained scoring at a fast rate and it can penalise players who arrive late to the crease and do not have time to complete a full innings. I am willing to accept these drawbacks as a small price to avoid additional complexity

League

Aside from the venues themselves, some leagues are easier to score in than others. Colin Ingram has not scored 52 sixes in two Blast seasons because every venue in England is tiny (they are not) or at a high altitude (definitely not). It is because the talent pool of bowlers is diluted across 16 teams. He faces the world’s best bowling less often than players in the star-laden IPL

I have explained in a previous post how I adjust for league quality in detail, so I won’t do the same here. Essentially, I take a cohort of players who all played in the same two domestic competitions and compare their performances across both competitions. Unlike the venues, this can result in a league being simultaneously low-scoring for batsman (facing top bowlers) and high-scoring for bowlers (facing top batsmen)

The chart provides per ball adjustments that can be applied to performances across different competitons. Unsurprisingly, the Blast features some of the weakest bowling attacks and a player like Ingram benefits significantly, to the tune of 0.09 runs per ball. The chart also includes T20Is, World Cups, and matches between associate members, not just domestic leagues. The quality across international matches is more variable but there was no other obvious way to breakdown those groups any further

Batting Position

Jaipur, 2012: the Chennai Super Kings had just lost their last two genuine batsmen within the space of three balls. Chasing a target of 127 in a low scoring game, they still required 43 runs from 22 balls. Their Win Probability was 24%. Ultimately, the next two players at the crease, Albie Morkel and Srikkanth Anirudha made the comeback look embarrassingly easy. Both finished with 18 runs from 6 balls and the match was over with almost 2 overs remaining

Albie Morkel staged a late comeback with Anirudha to help the Chennai Super Kings defeat the Rajasthan Royals in the IPL in 2012

It is not possible to maintain such a run rate throughout an entire innings. Late game, reckless shots become calculated risks and almost any player can channel their inner Afridi. It is relatively straightforward to assign extra credit to players according to match situation. My Win Probability and Expected Total models use ball-by-ball data to account for the score, target, wickets and ball remaining.

Occasionally, unfortunately, only the basic scorecard is available. In these cases, I apply an adjustment based batting order as a rough proxy for match situation. I use a similar approach as for the league adjustment – looking at players who have played multiple positions in their careers

Bowler Workload

Just as bowlers are often given easy opportunities to score quickly with the bat, batsmen are sometimes thrown easy opportunities with the ball. A part-time bowler is usually only given the ball when a match-up is especially in their favour. Joe Root bowls 0.75 overs per match and still has terrible figures; Dwayne Bravo takes on the death overs every time, regardless of who occupies the crease

The final adjustment that I make is to account for the greater workload shouldered by bone fide bowlers. Again, I already detailed in a previous post how to decide how much credit to give a bowler for taking on the full 4 overs. Adjusting for workload is less scientific than the other covered here - at worst, my approach amounts to a wild guess, and at best, an educated one

However, in comparison to the other factors listed here, adjusting for bowler workload is relatively unimportant. Not because the effect does not exist or because it isn’t significant but we rarely need to compare full-time and part-time bowlers – it is easy to include a measure for workload alongside a bowler’s per-ball efficiency when making evaluations

Adjusting for league and venue is vital. Whilst many star players play across various leagues and have a bank of international appearances to draw from, the data for less known players is limited. Understanding the degree of difficulty across different leagues in essential if we want to make comparisons

Adjusting for batting order and bowling workload, on the other hand, is not so important. Neither provides much information which cannot be captured via a pair of contextual stats: average batting position and average workload. Nobody should be comparing an opening batsman with a late innings swinger with no appreciation for the differing contexts

Cocooned within an in-depth analytical investigation, I use all the numbers, splitting out different factors, displaying them side-side, ignoring some, upweighting others, before eventually reaching a tentative conclusion. But for making cool, engaging, and perhaps frivolous visuals and charts, incorporating all the context and extenuating factors into an all-encompassing measure makes sense

Part III