1. Preface a. Contents and Structure b. Who This Book Is For c. Conventions Used in This Book d. Using Code Examples e. O’Reilly Online Learning f. How to Contact Us g. Acknowledgments 2. 1. Python and Algorithmic Trading a. Python for Finance i. Python Versus Pseudo-Code ii. NumPy and Vectorization iii. pandas and the DataFrame Class b. Algorithmic Trading c. Python for Algorithmic Trading d. Focus and Prerequisites e. Trading Strategies i. Simple Moving Averages ii. Momentum iii. Mean Reversion iv. Machine and Deep Learning f. Conclusions g. References and Further Resources 3. 2. Python Infrastructure a. Conda as a Package Manager i. Installing Miniconda ii. Basic Operations with Conda b. Conda as a Virtual Environment Manager c. Using Docker Containers i. Docker Images and Containers ii. Building a Ubuntu and Python Docker Image d. Using Cloud Instances i. RSA Public and Private Keys ii. Jupyter Notebook Configuration File iii. Installation Script for Python and Jupyter Lab iv. Script to Orchestrate the Droplet Set Up e. Conclusions f. References and Further Resources 4. 3. Working with Financial Data a. Reading Financial Data From Different Sources i. The Data Set ii. Reading from a CSV File with Python iii. Reading from a CSV File with pandas iv. Exporting to Excel and JSON v. Reading from Excel and JSON b. Working with Open Data Sources c. Eikon Data API i. Retrieving Historical Structured Data ii. Retrieving Historical Unstructured Data d. Storing Financial Data Efficiently i. Storing DataFrame Objects ii. Using TsTables iii. Storing Data with SQLite3 e. Conclusions f. References and Further Resources g. Python Scripts 5. 4. Mastering Vectorized Backtesting a. Making Use of Vectorization i. Vectorization with NumPy ii. Vectorization with pandas b. Strategies Based on Simple Moving Averages i. Getting into the Basics ii. Generalizing the Approach c. Strategies Based on Momentum i. Getting into the Basics ii. Generalizing the Approach d. Strategies Based on Mean Reversion i. Getting into the Basics ii. Generalizing the Approach e. Data Snooping and Overfitting f. Conclusions g. References and Further Resources h. Python Scripts i. SMA Backtesting Class ii. Momentum Backtesting Class iii. Mean Reversion Backtesting Class 6. 5. Predicting Market Movements with Machine Learning a. Using Linear Regression for Market Movement Prediction i. A Quick Review of Linear Regression ii. The Basic Idea for Price Prediction iii. Predicting Index Levels iv. Predicting Future Returns v. Predicting Future Market Direction vi. Vectorized Backtesting of Regression- Based Strategy vii. Generalizing the Approach b. Using Machine Learning for Market Movement Prediction i. Linear Regression with scikit-learn ii. A Simple Classification Problem iii. Using Logistic Regression to Predict Market Direction iv. Generalizing the Approach c. Using Deep Learning for Market Movement Prediction i. The Simple Classification Problem Revisited ii. Using Deep Neural Networks to Predict Market Direction iii. Adding Different Types of Features d. Conclusions e. References and Further Resources f. Python Scripts i. Linear Regression Backtesting Class ii. Classification Algorithm Backtesting Class 7. 6. Building Classes for Event-Based Backtesting a. Backtesting Base Class b. Long-Only Backtesting Class c. Long-Short Backtesting Class d. Conclusions e. References and Further Resources f. Python Scripts i. Backtesting Base Class ii. Long-Only Backtesting Class iii. Long-Short Backtesting Class 8. 7. Working with Real-Time Data and Sockets a. Running a Simple Tick Data Server b. Connecting a Simple Tick Data Client c. Signal Generation in Real Time d. Visualizing Streaming Data with Plotly i. The Basics ii. Three Real-Time Streams iii. Three Sub-Plots for Three Streams iv. Streaming Data as Bars e. Conclusions f. References and Further Resources g. Python Scripts i. Sample Tick Data Server ii. Tick Data Client iii. Momentum Online Algorithm iv. Sample Data Server for Bar Plot 9. 8. CFD Trading with Oanda a. Setting Up an Account b. The Oanda API c. Retrieving Historical Data i. Looking Up Instruments Available for Trading ii. Backtesting a Momentum Strategy on Minute Bars iii. Factoring In Leverage and Margin d. Working with Streaming Data e. Placing Market Orders f. Implementing Trading Strategies in Real Time g. Retrieving Account Information h. Conclusions i. References and Further Resources j. Python Script 10. 9. FX Trading with FXCM a. Getting Started b. Retrieving Data i. Retrieving Tick Data ii. Retrieving Candles Data c. Working with the API i. Retrieving Historical Data ii. Retrieving Streaming Data iii. Placing Orders iv. Account Information d. Conclusions e. References and Further Resources 11. 10. Automating Trading Operations a. Capital Management i. Kelly Criterion in Binomial Setting ii. Kelly Criterion for Stocks and Indices b. ML-Based Trading Strategy i. Vectorized Backtesting ii. Optimal Leverage iii. Risk Analysis iv. Persisting the Model Object c. Online Algorithm d. Infrastructure and Deployment e. Logging and Monitoring f. Visual Step-by-Step Overview i. Configuring Oanda Account ii. Setting Up the Hardware iii. Setting Up the Python Environment iv. Uploading the Code v. Running the Code vi. Real-Time Monitoring g. Conclusions h. References and Further Resources i. Python Script i. Automated Trading Strategy ii. Strategy Monitoring 12. Appendix. Python, NumPy, matplotlib, pandas a. Python Basics i. Data Types ii. Data Structures iii. Control Structures iv. Python Idioms b. NumPy i. Regular ndarray Object ii. Vectorized Operations iii. Boolean Operations iv. ndarray Methods and NumPy Functions v. ndarray Creation vi. Random Numbers c. matplotlib d. pandas i. DataFrame Class ii. Numerical Operations iii. Data Selection iv. Boolean Operations v. Plotting with pandas vi. Input-Output Operations e. Case Study f. Conclusions g. Further Resources 13. Index Python for Algorithmic Trading From Idea to Cloud Deployment Yves Hilpisch Python for Algorithmic Trading by Yves Hilpisch Copyright © 2021 Yves Hilpisch. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles ( http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Acquisitions Editor: Michelle Smith Development Editor: Michele Cronin Production Editor: Daniel Elfanbaum Copyeditor: Piper Editorial LLC Proofreader: nSight, Inc. Indexer: WordCo Indexing Services, Inc. Interior Designer: David Futato Cover Designer: Jose Marzan Illustrator: Kate Dullea November 2020: First Edition Revision History for the First Edition 2020-11-11: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492053354 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Python for Algorithmic Trading , the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. This book is not intended as financial advice. Please consult a qualified professional if you require financial advice. 978-1-492-05335-4 [LSI] Preface Dataism says that the universe consists of data flows, and the value of any phenomenon or entity is determined by its contribution to data processing....Dataism thereby collapses the barrier between animals [humans] and machines, and expects electronic algorithms to eventually decipher and outperform biochemical algorithms. —Yuval Noah Harari Finding the right algorithm to automatically and successfully trade in financial markets is the holy grail in finance. Not too long ago, algorithmic trading was only available and possible for institutional players with deep pockets and lots of assets under management. Recent developments in the areas of open source, open data, cloud compute, and cloud storage, as well as online trading platforms, have leveled the playing field for smaller institutions and individual traders, making it possible to get started in this fascinating discipline while equipped only with a typical notebook or desktop computer and a reliable internet connection. Nowadays, Python and its ecosystem of powerful packages is the technology platform of choice for algorithmic trading. Among other things, Python allows you to do efficient data analytics (with pandas , for example), to apply machine learning to stock market prediction (with scikit-learn , for example), or even to make use of Google’s deep learning technology with TensorFlow 1 This is a book about Python for algorithmic trading, primarily in the context of alpha generating strategies (see Chapter 1). Such a book at the intersection of two vast and exciting fields can hardly cover all topics of relevance. However, it can cover a range of important meta topics in depth. These topics include: Financial data Financial data is at the core of every algorithmic trading project. Python and packages like NumPy and pandas do a great job of handling and working with structured financial data of any kind (end-of-day, intraday, high frequency). Backtesting There should be no automated algorithmic trading without a rigorous testing of the trading strategy to be deployed. The book covers, among other things, trading strategies based on simple moving averages, momentum, mean-reversion, and machine/deep-learning based prediction. Real-time data Algorithmic trading requires dealing with real-time data, online algorithms based on it, and visualization in real time. The book provides an introduction to socket programming with ZeroMQ and streaming visualization. Online platforms No trading can take place without a trading platform. The book covers two popular electronic trading platforms: Oanda and FXCM. Automation The beauty, as well as some major challenges, in algorithmic trading results from the automation of the trading operation. The book shows how to deploy Python in the cloud and how to set up an environment appropriate for automated algorithmic trading. The book offers a unique learning experience with the following features and benefits: Coverage of relevant topics This is the only book covering such a breadth and depth with regard to relevant topics in Python for algorithmic trading (see the following). Self-contained code base The book is accompanied by a Git repository with all codes in a self-contained, executable form. The repository is available on the Quant Platform. Real trading as the goal The coverage of two different online trading platforms puts the reader in the position to start both paper and live trading efficiently. To this end, the book equips the reader with relevant, practical, and valuable background knowledge. Do-it-yourself and self-paced approach Since the material and the code are self-contained and only rely on standard Python packages, the reader has full knowledge of and full control over what is going on, how to use the code examples, how to change them, and so on. There is no need to rely on third-party platforms, for instance, to do the backtesting or to connect to the trading platforms. With this book, the reader can do all this on their own at a convenient pace and has every single line of code to do so. User forum Although the reader should be able to follow along seamlessly, the author and The Python Quants are there to help. The reader can post questions and comments in the user forum on the Quant Platform at any time (accounts are free). Online/video training (paid subscription) The Python Quants offer comprehensive online training programs that make use of the contents presented in the book and that add additional content, covering important topics such as financial data science, artificial intelligence in finance, Python for Excel and databases, and additional Python tools and skills. Contents and Structure Here’s a quick overview of the topics and contents presented in each chapter. Chapter 1, Python and Algorithmic Trading The first chapter is an introduction to the topic of algorithmic trading—that is, the automated trading of financial instruments based on computer algorithms. It discusses fundamental notions in this context and also addresses, among other things, what the expected prerequisites for reading the book are. Chapter 2, Python Infrastructure This chapter lays the technical foundations for all subsequent chapters in that it shows how to set up a proper Python environment. This chapter mainly uses conda as a package and environment manager. It illustrates Python deployment via Docker containers and in the cloud. Chapter 3, Working with Financial Data Financial time series data is central to every algorithmic trading project. This chapter shows you how to retrieve financial data from different public data and proprietary data sources. It also demonstrates how to store financial time series data efficiently with Python. Chapter 4, Mastering Vectorized Backtesting Vectorization is a powerful approach in numerical computation in general and for financial analytics in particular. This chapter introduces vectorization with NumPy and pandas and applies that approach to the backtesting of SMA-based, momentum, and mean-reversion strategies. Chapter 5, Predicting Market Movements with Machine Learning This chapter is dedicated to generating market predictions by the use of machine learning and deep learning approaches. By mainly relying on past return observations as features, approaches are presented for predicting tomorrow’s market direction by using such Python packages as Keras in combination with TensorFlow and scikit-learn Chapter 6, Building Classes for Event-Based Backtesting While vectorized backtesting has advantages when it comes to conciseness of code and performance, it’s limited with regard to the representation of certain market features of trading strategies. On the other hand, event-based backtesting, technically implemented by the use of object oriented programming, allows for a rather granular and more realistic modeling of such features.