Reinforcement Learning in the Physical World Why Reinforcement Learning Solutions Make a Difference for Industry Real System Machine Simulation Digital Twin Data Records Logs Reinforcement Learning Genetic Programming Swarm Optimization Actions Model-based A brief history of RL at Siemens 2003: RL for laundry machines Dynamic laundry distribution by RL is faster than best hand-engineered strategy 1994: Backgammon RL with large number of states Business Impact World 2017: Gas Turbine Auto Tuner Installation on largest Siemens gas turbine (SGT-8000H) 2019: Integration in Process Control 5) Interpretable RL learns polymer reactor policy from offline data 2006: Data-Efficient RL 1-3) Architectures and IP to enable offline learning for turbines Technologies 2016: Variable Objectives Goal-conditioned, multi- objective RL 2021: MOOSE Policy 2) Model-based learning and safe operation 2017: Interpretable RL 3) Wind turbines controlled by interpretable RL Policy 198x: Reinforcement Learning Sutton & Barto invent the “RL wheel” 2016: AlphaGo 2013: DQN Atari 2019: OpenAI Five DotA 2021: Energy-efficient time tabling Multi-agent RL reduces delay and energy consumption of metro trains 2022: Control of Tokamak fusion reactor plasmas Physics-informed, model-based RL 2013: RL reduces NOx by 20% NOx emissions of large gas turbines significantly reduced Uncertainty-aware RL 4) using Bayesian NN and Deep Gaussian Processes 2017: Proximal Policy Optimization The Challenge: Bridging the Gap between the Research and Industry Domain Industry Domain Existing engineered solutions, domain experts Safe, interpretable and trustworthy solutions Difficult learning setup: offline, missing data, noise, ... Integration of domain know-how and constraints (Time + budget) limitations on compute resources Research Domain Unsolved problems and uncharted territory Fast moving and competitive research field Clean learning setup: online, fast-feedback, infinite rollouts, unlimited sampling Virtually no limits on (compute) resources Use Case Examples Real-Time Combustion Optimization for Large Gas Turbines: Gas Turbine Auto Tuner Offline RL Modular Safety- embedded Model-based 300 MW Energy-efficient Time Tabling for Subway Systems Subway Simulator Reinforcement Learning in the Cloud Subway topology Timetables Selected KPIs Test scenarios interact Policy Timetable engineer High-dimensional RL Control For Parcel Logistics Simulation/ Digital Twin or real machine (at customer site) Multi-Agent Deep Learning Framework Image Action Reward Multi-agent reinforcement learning Research Focus: Facing the Industry Challenge Impressive results for problems where exploration is cheap If exploration on real systems is prohibited offline data and simulations come to the rescue. Policy search, e.g. by using evolutionary methods, enables robust and interpretable RL solutions Generating uncertainty aware surrogate models from (offline) data or simulations increases robustness and speed of policy training Implicit black-box policies are acceptable Testing and evaluating generated policies is cheap and safe Current Challenges and Opportunities Convert groundbreaking technology into profitable business cases for industrial applications Enable RL solutions to scale fast and generalize over large amount of applications and use cases Develop trustworthy and robust RL solutions, which are ready to pass safety regulations required for industrial products and services