Offline RL with RLlib Who are we ? Project Bonsai is a Low Code AI Development Platform that speeds the creation of AI-powered automation. Without requiring data scientists, engineers can build specific AI components that provide operator guidance or directly make decisions. Help Industry to move from automation to autonomy Help Industry to move from automation to autonomy Building Brains for Autonomous Systems Value Proposition: Machine Teaching A way to transfer knowledge from humans to the underlying Machine Learning algorithm which combines AI with traditional optimization and control Building Brains for Autonomous Systems Machine Teaching Tool Chain 1. Machine Teaching injects subject matter expertise into brain training 3. AI Engine automates the generation and management of neural networks and DRL algorithms 2. Simulation tools for accelerated integration and scale of training 4. Flexible runtime to deploy and scale models in the real world Reinforcement Learning is Key And it’s Hard for now but it will not be forever Value Proposition Machine Teaching Open Source Today Tomorrow Autonomous Systems RL How do we use RLlib? • Comprehensive • Extensible • Algorithms • Execution plans (former optimizers) • Metrics • Models • Pre-processing • Envs/Sims • Flexible RLlib is our bet for Reinforcement Learning New Algorithms Distributed Training Faster prototyping and experimentation Multiple HW support Distributed training is a need Not only because Reinforcement Learning Simulation Dynamism Just-In-Time Machine Teaching requirements: How do we use Ray? Ray is our framework for Distributed Training Not only because RLlib SDK Machine Teaching Services Kubernetes ... ... API ... ... ... ... ... ... Ray Machine Teaching Engine Inkling Compiler AI Engine [custom] RLlib ... ... ... ... ... Kubernetes Only for training AI Focus Micro-Service Architecture: We have a challenge !!! Same for Reinforcement Learning Simulators Hard to implement Sometimes very slow Sometimes only data available Why do we need Offline RL? Our users must be able to train directly from data – without simulators Collect Data Create Dataset in Bonsai Train a Brain directly from Data Design a Simulator Create Sim Package in Bonsai Train a Brain with Simulators Online Training – Simulators RL Offline Training – No Simulators Offline RL What is Offline RL? More than RL without Simulators Even more than training from Data Machine Learning Offline RL Comparing Offline RL and Online RL Online could be on-policy or off-policy Agent acting in the world Several recordings of agents acting Comparing Offline RL and Machine Learning Both benefit from large, diverse and previously collected datasets Machine Learning Offline RL Decision Active Behavioral Recognition Passive Agent acting in the world Several recordings of agents acting Unsupervised Learning Supervised Learning How did we leverage RLlib to handle our Offline RL needs? Implement a new Algorithm Reuse existent RLlib components Extend RLlib CLI The algorithm select was CQL Tensorflow implementation over SAC and DQN Others CQL Combine decisions from sub-optimal episodes Improve over behavioral policy Simple Simple environments Low performance Learns lower bounded Q-Values CQL Reuse existent RLlib components SAC and DQN were extended Slight modification of SAC and DQN to make them reusable Create CQL losses over SAC and DQN losses New agents CQL-SAC & CQL-DQN What is challenging about Offline RL? With the techniques And the lack of simulators Distributional Shift No exploration Evaluation CQL deal with Action Distributional Shift because happens at Training Time, but it’s not saving us from State Distributional Shift because happens at Evaluation Time Results will be as good as the Dataset and the behavioral policies present in that allows. FQE or Model-based OPE is our recommendation Thank you !