Analyzing the Distribution of Extra Deliveries in T-20 Cricket with Respect to Time Using Statistical and Machine Learning Approaches
Abstract
This study employed a combination of statistical and machine learning techniques to analyze the distribution and influencing factors of extras in T20 cricket across different overs. Exploratory data analysis revealed a concentration of extras during the middle overs, particularly around overs 9–10. Poisson distribution modeling provided a good fit (χ² = 14.792, p = 0.736), indicating that extras occur at a relatively constant rate over time, with the highest rate (λ ≈ 0.1263) observed at over 9. Multinomial logistic regression further revealed that situational match variables—such as balls remaining, runs to get, ball number, and over number—are significant predictors of the type of extra conceded (p < 0.001), while cumulative performance indicators like total runs or wickets in an innings showed no significant impact. CHAID decision trees effectively segmented overs by extra type, and correspondence analysis offered a visual representation of the association between specific types of extras and over ranges, notably identifying wides as frequent between overs 3 and 7. Collectively, the findings demonstrate that extras in T20 cricket exhibit time-dependent patterns closely tied to match phases. These insights are actionable for coaches and analysts aiming to minimize extras during critical overs, ultimately enhancing team performance in the high-stakes T20 format.