T HE C HINESE U NIVERSITY OF H ONG K ONG F INAL Y EAR P ROJECT R EPORT Predicting Horse Racing Result with Machine Learning Author: Yide L IU Supervisor: Prof. Michael L YU A final report submitted in fulfillment of the requirements for the final year project in the LYU 1703 Faculty of Engineering Department of Computer Science and Engineering May 21, 2018 ii iii Declaration of Authorship I, Yide L IU , declare that this work titled, “Predicting Horse Racing Result with Ma- chine Learning” and the work presented in it are my own. I confirm that: • This work was done wholly or mainly while in candidature for a research de- gree at this University. • Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated. • Where I have consulted the published work of others, this is always clearly attributed. • Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work. • I have acknowledged all main sources of help. • Where the project is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed my- self. v “ No wife can endure a gambling husband; unless he is a steady winner.” Thomas Dewar vii THE CHINESE UNIVERSITY OF HONG KONG Abstract Faculty of Engineering Department of Computer Science and Engineering BSc degree in Computer Science Predicting Horse Racing Result with Machine Learning by Yide L IU Neural networks with a large number of parameters are very powerful machine learning systems. While neural networks has already been applied to many sophis- ticated real-world problems, its power in predicting horse racing results has yet not fully explored. Horse racing prediction is to predict the finishing time and ranking of horses in a race. Horse racing has been one of the most intriguing sports and entertainment in Hong Kong owing to betting nature and the uncertainty of racing. Here we explore the possibilities of the neural network in predicting the final re- sult and examined several deep learning techniques concerning the problem. We constructed an augmented racing record dataset, horse dataset and weather dataset and designed four approaches: sequence to sequence for sets, multi-layer percep- tron with normalization and rank network. We showed that the most of the neural network approach can make accurate prediction in finishing time. Furthermore, our models outperforms the traditional betting system such as win-odds betting and performance-based betting in terms of in the horse ranking and final results. ix Acknowledgements We would like to express my special thanks of gratitude to our supervisor Profes- sor Michael R. Lyu as well as our advisor Mr. Edward Yau who gave us the golden opportunity to do this wonderful project on the topic ’Predicting Horse Racing Re- sult with Machine Learning’, which also helped us in doing a lot of research and we came to know about so many new things and we are really thankful to them. Secondly, we would also like to thank our friends who helped us a lot in finalizing this project within the limited time frame. xi Contents Declaration of Authorship iii Abstract vii Acknowledgements ix 1 Overview 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Pari-mutuel betting . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Types of bets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Regression on finishing time . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Binary classification on winning . . . . . . . . . . . . . . . . . . 6 1.3.3 Multi-class classification on ranking . . . . . . . . . . . . . . . . 6 1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Data Preparation 9 2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Racing Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Horse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Racing Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Win odds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Weight difference . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Old place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Horse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Sex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 xii Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.3 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 Embedding Network . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Sequence to sequence for sets 31 3.1 Sequence to sequence (for sets) framework . . . . . . . . . . . . . . . . 32 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Read module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2 Process module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Output module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Experiment and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Deep neural network with normalization 37 4.1 Batch Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Experiment and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Rank network 45 5.1 Rank network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Experiment and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Overall Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6 Conclusion 53 A Development Environment 55 B Complement Information of Dataset 57 B.1 Racing Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 B.2 Horse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 B.3 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Bibliography 63 xiii List of Figures 2.1 Complete class system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 1 st place distribution over horse colors . . . . . . . . . . . . . . . . . . . 20 2.3 1 st place possibility over horse colors . . . . . . . . . . . . . . . . . . . . 21 2.4 1 st place distribution over horse sexes . . . . . . . . . . . . . . . . . . . 22 2.5 1 st place possibility over horse sexes . . . . . . . . . . . . . . . . . . . . 23 2.6 1 st place distribution over horse ages . . . . . . . . . . . . . . . . . . . . 24 2.7 1 st place possibility over horse ages . . . . . . . . . . . . . . . . . . . . . 25 2.8 Average finishing time against different weather conditions . . . . . . 26 2.9 Average finishing time correlation with weather . . . . . . . . . . . . . 27 2.10 Embedding Network Architecture . . . . . . . . . . . . . . . . . . . . . 29 2.11 Embedding Network Output . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Set2seq output snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Co-variate shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Deep neural network with normalization . . . . . . . . . . . . . . . . . 40 4.4 Min-max finishing time by races of old model . . . . . . . . . . . . . . . 42 4.5 Min-max finishing time by races of BN model . . . . . . . . . . . . . . . 43 4.6 Relation between Min-max finishing time and accuracy and net gain 44 5.1 Logistic/Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Rank Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Net Gain on WIN Across All Classes . . . . . . . . . . . . . . . . . . . . 49 5.4 Net Gain on PLACE Across All Classes . . . . . . . . . . . . . . . . . . 50 5.5 Final Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 xv List of Tables 1.1 Example: vagueness when doing classification on ranking . . . . . . . 6 1.2 WIN and PLACE bet revisit . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Useful racing features collected from HKJC website . . . . . . . . . . . 11 2.2 Extracted racing features . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Horse features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Weather features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Class and rating standard in handicap races . . . . . . . . . . . . . . . . 15 2.6 Class Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Winodds Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.8 Weight Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.9 Weight Correlation Matrix of "A003" . . . . . . . . . . . . . . . . . . . . 17 2.10 Weight Correlation Matrix of "L169" . . . . . . . . . . . . . . . . . . . . 18 2.11 Weight difference Correlation Matrix . . . . . . . . . . . . . . . . . . . . 18 2.12 Old place Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . 19 2.13 Model performance with/without odds or weather . . . . . . . . . . . 26 3.1 Set2seq Ranking Performance . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Model BN Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Model BN Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.1 Rank Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Rank Model Performance Across All Classes . . . . . . . . . . . . . . . 48 5.3 Rank Model Performance on Class 1 and 2 . . . . . . . . . . . . . . . . 50 B.1 Useful racing features collected from HKJC website . . . . . . . . . . . 58 B.2 Horse features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 B.3 Weather features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 1 Chapter 1 Overview This final year project focuses on predicting horse racing result with deep learning. Throughout this report we will demonstrate the work done during the first semester. This chapter offers a brief overview to this final year project and introduction to the topic. Moreover, it provides a glance of the related work and previous approaches on the horse racing predictions. It then introduces the difficulties in predicting horse racing results. Lastly it shows the objective of this project which is to accurately predict the final racing results and make promising net gain. 1.1 Introduction Neural networks with number of non-linear hidden layers is proved to be highly expressive to learn complicated relationship between their inputs and outputs (Sri- vastava et al., 1989). Pragmatically, neural networks are shown to present its learn- ing power in machine learning and becoming the dominant approach for many problems (Srivastava et al., 2014). Although they are introduced first in late 1950s (Widrow and Hoff, 1959), the increasing functionality of computer hardware and the use of graphics processing unit (GPU) enables the training processing recently. In the latest research in visual object recognition and then following in other research fields, neural networks are designed to goes deeper in layers(He et al., 2015) and a term called "Deep Learning", i.e. training a deep neural network with multitudinous layers,is appearing often in both the academic and the public society. However, when researching on a new field of studies, traditional approach begins studying networks expressive power from very few layers (Huang et al., 2016). In visual object recognition, for example, begins with a primitive network called LeNet (LeCun et al., 1998) consisted of 5 layers and recent study of Highway Network (Srivastava, Greff, and Schmidhuber, 2015) and Residual Networks (He et al., 2015) 2 Chapter 1. Overview surpass 100 layers. Latest research in neural networks in this year shows that very deep networks with exceeding 1000 layers have been studied and employed (He et al., 2016). While it helps to go deeper in network structure, the study of neural net- works requires researcher to start from a very beginning of smaller version. These approaches are accord with the nature of neural networks: the exceeding number of neurons and parameters, the large carnality of hyper-parameter space, the appro- priate score function and insidious structure issues. Since training a deep neural network takes days and months to run (Vanhoucke, Senior, and Mao, 2010), it is reasonable to train network with simple structure in order to accelerate the research progress. Conventionally, neural networks are developed in steps to target sets of well-known academic problems. Neural networks are fully explored in those classical problems: text classification (Kim, 2014; Zhang, Zhao, and LeCun, 2015), and sentimental anal- ysis (Santos and Gatti, 2014; Ouyang et al., 2015) in natural language processing, pat- tern recognition and object detection (Ren et al., 2015; Szegedy, Toshev, and Erhan, 2016) in computer vision, auto-encoding (Lange and Riedmiller, 2010) and noisy- encoding (Graves, Mohamed, and Hinton, 2013) in information theory. In spite of the promising performance in classical problems, the power neural network in other real-world problems is still under exploration. The complexity and unclear relation- ship makes it difficult to express the relationships. One of these undetermined real-world problems is horse racing. The prediction of horse racing result has been a popular research topics in recent years. However, the research in this fields make little progress over these years. Few paper is published in academic domain after the prediction problem is firstly introduced in 2008. Two similar studies reviewed the power of neural network using different optimization techniques and performed finishing time predictions base on previous horses racing records (Snyder, 1978b; Williams and Li, 2008). Last year, LYU1603 (Tung and Hei, 2016) worked with two different approaches: binary classification on winning and logistics regression on horse finishing time. Their models realized positive net gains only with a threshold over 95% on betting confidence. Those studies provides dif- ferent approaches to interpret horse racing problems but in contrast reveals a lack of understanding in horse racing predictions. The horse racing events, while they are commonly considered a special kind of game, follows similar characteristics shared with stock market predictions where 1.2. Background 3 futures performances are related to precious and current performances to some ex- tent. On the other hand, unlike games of perfect information such as GO 1 and PEN- TAGO 2 (Méhat and Cazenave, 2011), the optimal value function, which determines the outcome of a game, is not well-defined (Silver et al., 2016). While the horse racing prediction problem being a mixture of imperfect information and stochastic randomness (Snyder, 1978a), previous naive approaches fails to capture the critical information and produce few promising results. To the best of our knowledge, cur- rent horse racing prediction is limited and the results are under satisfaction. In this final year project, we scrutinize features of horse racing events and predict horse racing results directly through finishing time. The rest of this report is or- ganized as follows: Chapter2 illustrates how first-hand data is collected and struc- tured. Moreover it provides prudent statistical analysis on related features and data standardization. In Chapter3-5, we review all approaches we tried during this term, including sequence to sequence for sets (Vinyals, Bengio, and Kudlur, 2015), deep neural network with normalization and rank network. In the last chapter, we con- clude the accomplishment achieved in this term and give a brief future outlook. 1.2 Background Horse racing is a sports to run horses at speed. Horse racing is not only a profes- sional sports but also of a beloved entertainment of betting in Hong Kong. Every season, hundreds of races are held respectively in Shatin and Happy Valley race- courses at different tracks and distance. In each race, 8-14 horses runs in a row for the fastest and various bet types are created for entertainment on the result of the races. Horse racing events are managed by the Hong Kong Jockey Club (HKJC). HKJC is a non-profit organization to formulate and develop horse racing, sporting and betting entertainment in Hong Kong. Moreover, it is the largest taxpayer and community benefactor in Hong Kong. It holds a government-granted monopoly in providing pari-mutuel betting on horse racing. In the history of horse racing in Hong Kong, the HKJC plays a essential role in promotion and regulation and combines the bet- ting entertainment into this sports. "With strict rule enforcement, betting fairness 1 https://en.wikipedia.org/wiki/Go_(game) 2 https://en.wikipedia.org/wiki/Pentago 4 Chapter 1. Overview and transparency, the HKJC has taken the Hong Kong racing to a world-class stan- dard and also earned itself an enviable global reputation as a leading horse racing organization." 1.2.1 Pari-mutuel betting Betting is the most fascinating attraction of horse racing by the nature of pari-mutuel betting system. Pari-mutuel betting is a betting system in which the stake of a par- ticular bet type is placed together in a pool, and the returns are calculated based on the pool among all winning bets (Riess, 1991). Dividend is divided by the number of winning combinations of a particular pool. Winners shares the percentage of pool payout proportional to their betting stakes and taxes are reducted from the dividend in a particular ratio. 1.2.2 Types of bets There are multiple types of bets of a single race as well as multiple races. The fol- lowing figures from the HKJC website provides an explanation of each bet type.