Kernel Quantile Embeddings and Associated Probability Metrics Thank you, we are happy to elaborate, and will add further discussion to the paper for the benefit of the reader. The relation between the method is as follows. First, we summarise covariance embeddings. Kernel covariance (operator) embeddings (KCE) ([7]) represent the distribution P as the second-order moment of the function k ( X, · ) , for X ∼ P , as an alternative to the first-order moment (the kernel mean embedding). Due to being moments of the same distribution, the two share key positives and drawbacks: KCE for kernel k exists iff KME for k 2 exists, and the kernel k is covariance characteristic iff k 2 is mean-characteristic ([9]). The divergence proposed in [7] is the distance between the KCE, and is estimated at O ( n 3 ) due to the need to compute full eigendecomposition of the KCE in order to compute the norm. In contrast, KQE is an embedding of the quantiles, and therefore the relation to the KCE comes down to matching quantiles (which always exist, and come with an efficient estimator), compared to matching the second moment in the infinite-dimensional RKHS (which may not exist, and requires eigenvalue decomposition). The median embedding ([8]) of P is the geometric median of k ( x, · ) in the RKHS, meaning the RKHS element which, on average, is L1-closest to the point k ( X, · ) . In other words, it is the function f ∈ H that minimizes the L 1 problem ∫ H ‖ f ( · ) − k ( x, · ) ‖ H P ( dx ) The median exists for any separable Hilbert space ([10]). However, even for a finite set of points P n = { x 1 , . . . x n } , there is no closed-form solution to this L 1 -problem, and the median is typically approximated using iterative algorithms like Weiszfelds algorithm. The estimator proposed in [8] has a computational complexity of O ( n 2 ) . The property of being median-characteristic, as far as the authors are aware, has not been explored, and no theoretical guarantees are available. The connection to 1D-projected quantiles as done in KQE, even specifically the 1D-projected median, is also unclear. Expanding the understanding of geometric median embeddings presents an exciting area for future research. [7] Makigusa, N. (2024). Two-sample test based on maximum variance discrepancy. Communications in Statistics-Theory and Methods, 53(15), 5421-5438. [8] Nienkötter, A., & Jiang, X. (2022). Kernel-based generalized median computation for consensus learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 5872-5888. [9] Bach, F. (2022). Information theory with kernel methods. IEEE Transactions on Information Theory, 69(2), 752-775. [10] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. 26