RL-Based Sampling Schemes for Adaptive Quantitative Group Testing Camille Dunning UCSD Halıcıo ̆ glu Data Science Institute ’23, Mentored by Prof. Tara Javidi adunning@ucsd.edu February 25, 2022 Camille Dunning (UCSD) HDSI Undergraduate Research Scholarship February 25, 2022 1 / 4 Problem Statement and Na ̈ ıve Method Group testing : Combinatorial method, search for and identify k ”infected” individuals in population of size n . Usually break into groups, can RL do more efficiently? Adaptive vs. non-adaptive : Results of current test depend on previous results, or all tests are known beforehand (pooling design) Notation: Represent population as binary vector ~ x , with k ones denoting infected individuals; ~ x is unknown to us Idea : Deduce locations of ones through repeated measurements Multiply with x with modified Walsh-Hadamard matrix W (binary, fractal structure) to derive observation ~ y Key: Sample correct rows from W such that linear system W ~ x = ~ y has a unique solution, the ground-truth ~ x Brute force approach: Adaptively build ˆ W on iterations and solve linear system until correct solution reached. Converges at the optimal klg ( n ) / lg ( k + 1) iterations VERY computationally expensive, runtime explodes at n = 16 , k < 5 Camille Dunning (UCSD) HDSI Undergraduate Research Scholarship February 25, 2022 2 / 4 Clever Method + Reinforcement Learning Suppose we are given ~ y = [ y 1 , ..., y n ]. Let s 1 , ..., s k be the indices of ones in ~ x Derive ̃ W lg ( n ) × lg ( n ) , lg ( n ) rows sampled from W that correspond to binary expansions of s i , i ∈ 0 , ..., k − 1. Multiply ~ x and ̃ W like before. Then, derive S = ∑ lgN i =0 y i ∗ 2 i = ∑ k i =1 s i . This directly encodes information about where the ones occur in our population! Algorithm should converge quicker, since we precompute lg ( n ) out of n rows in W , so we start with a much smaller amount of valid solutions to check. Before, we had ( n k ) possible solutions to check, which explodes fast. We want a DQN to learn which remaining n − lg ( n ) rows to sample from W such that convergence time is minimized. Maximize Q ∗ ( s , a ) = ∑ s ′ p a ss ′ ( r ( s , a ) + γ max a ′ Q ∗ ( s ′ , a ′ )), where state s is ˆ W , including ̃ W with lg ( n ) initial measurements, and action a is index of chosen row from W at time t , and pick action depending on triple ( Q ∗ , S , ) Camille Dunning (UCSD) HDSI Undergraduate Research Scholarship February 25, 2022 3 / 4 Reinforcement Learning, Results, and Conclusion W , ˆ W are all part of the environment. The agent ”wins” if a unique solution is reached in ≤ . Rewards specified accordingly (i.e., r t = − 0 75 if number of solutions doesn’t improve at time t ) ”Dynamic” epsilon-greedy strategy to avoid long training times and ”overfitting”, decrease by a learning rate if win rate is greater than threshold. Keep increasing the win threshold at the same time (0.01 by default) to avoid overly quick epsilon-decay. Other technical challenges with choosing duplicate actions, using padding to handle changing state and action spaces Current DQN has difficult time converging, starting at n = 8 , k = 4 Epsilon levels out at 1,200 epochs, 11 minutes of training on CPU Epsilon remaings high, meaning not much better performance than random sampling No convergence in reasonable amount of time, much room for improvement in RL part of problem However, clever method involving rows and binary expansions helped approach unique solution exponentially faster than previous research, Camille Dunning (UCSD) HDSI Undergraduate Research Scholarship February 25, 2022 4 / 4