DaviD BermBach Benchmarking eventually consistent DistriButeD storage systems David Bermbach Benchmarking Eventually Consistent Distributed Storage Systems Benchmarking Eventually Consistent Distributed Storage Systems by David Bermbach Dissertation, Karlsruher Institut für Technologie (KIT) Fakultät für Wirtschaftswissenschaften Tag der mündlichen Prüfung: 10. Februar 2014 Referent: Prof. Dr.-Ing. Stefan Tai Korreferent: Prof. Dr. rer. pol. Hans-Arno Jacobsen Print on Demand 2014 ISBN 978-3-7315-0186-2 DOI: 10.5445/KSP/1000039389 This document – excluding the cover – is licensed under the Creative Commons Attribution-Share Alike 3.0 DE License (CC BY-SA 3.0 DE): http://creativecommons.org/licenses/by-sa/3.0/de/ The cover page is licensed under the Creative Commons Attribution-No Derivatives 3.0 DE License (CC BY-ND 3.0 DE): http://creativecommons.org/licenses/by-nd/3.0/de/ Impressum Karlsruher Institut für Technologie (KIT) KIT Scientific Publishing Straße am Forum 2 D-76131 Karlsruhe KIT Scientific Publishing is a registered trademark of Karlsruhe Institute of Technology. Reprint using the book cover is not allowed. www.ksp.kit.edu Benchmarking Eventually Consistent Distributed Storage Systems Zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften (Dr.-Ing.) von der Fakultät für Wirtschaftswissenschaften des Karlsruher Instituts für Technologie (KIT) genehmigte DISSERTATION von Dipl.-Wi.-Ing. David Bermbach Tag der mündlichen Prüfung: 10. Februar 2014 Referent: Prof. Dr.-Ing. Stefan Tai Korreferent: Prof. Dr. rer. pol. Hans-Arno Jacobsen Karlsruhe, 2014 Abstract Cloud storage services and NoSQL systems, which have recently found widespread adop- tion, typically offer only "Eventual Consistency", a rather weak guarantee covering a broad range of potential data consistency behavior. The degree of actual (in-)consistency as a ser- vice quality, however, is always unknown. To avoid cost of opportunity or actual costs, resulting data inconsistencies have to be resolved within the application layer. Without de- tailed knowledge on consistency behavior, though, inconsistency handling is inefficient and for some kinds of inconsistency outright impossible. Furthermore, due to the way consistency behavior impacts applications, consistency as a system quality should also be considered during the selection and deployment optimiza- tion of cloud storage offerings and NoSQL systems. This as well as studying the impact of system design decisions on consistency behavior requires the necessary means to analyze consistency behavior of eventually consistent storage systems. In this work, we present four main contributions to address the problems outlined above: First, we develop novel consistency metrics which describe consistency behavior for all kinds of consistency, in a precise way, without needless aggregation, and in way that is meaningful to application or storage system developers as well as systems researchers. Second, we identify key influence factors on consistency behavior and combine them into a model of a storage system. We then present two distinct approaches, which predict consis- tency behavior based on simulations on top of this model. Third, we also present a set of system benchmarking approaches to accurately determine consistency behavior of eventually consistent distributed storage systems via experiments with actually deployed systems. Results of both simulation and system benchmarking are expressed using our novel set of consistency metrics. Fourth, building on 15 extensive experiments with actual systems and a multitude of sim- ulation runs, we demonstrate how inconsistencies can be handled more efficiently leveraging these results. For this purpose, we describe based on a use case how inconsistencies can be resolved in application engineering. We also develop a new middleware-based approach which adds additional consistency guarantees externally to the eventually consistent storage system, thus, alleviating complexity for application developers. Acknowledgements A dissertation is never a product of solitary work but builds on the work of others and is heavily influenced by fellow researchers. As such, this thesis is, therefore, also a team effort and I would like to use this space for thanking everyone who helped and influenced me while creating this work. My foremost thanks go to my PhD advisor, Professor Dr. Stefan Tai, who always supported my ideas and continuously offered valuable advice. He was also the person who inspired me to explore the area of distributed systems in the first place. Without him, I would probably never have started working in this field. I would also like to extend my sincerest thanks to my co-advisor, Professor Dr. Hans-Arno Jacobsen, who offered a lot of much appreciated feedback for my thesis. His advice certainly helped to improve the quality of this work. The remaining members of my thesis committee, Professor Dr. Andreas Oberweis and Professor Dr. Frank Schultmann, also deserve my gratitude: Thank you for a professional and fair thesis defense. I actually enjoyed the challenge of defending and discussing my work with you and my two advisors. Beyond these four, I would also like to thank my colleagues who offered valuable feed- back for my work during discussions over coffee breaks or during our group retreats and who also participated as co-authors of my papers: Bugra Derre, Robin Fischer, Dr. Christian Ja- niesch, Dr. Gregory Katsaros, Markus Klems, Tilmann Kopp, Jörn Kuhlenkamp, Alexander Lenk, Michael Menzel, David Müller, Steffen Müller, Professor Dr. Frank Pallas, Dr. Nelly Schuster, Dr. Ulrich Scholten, Raphael Stein, Dr. Erik Wittern and Dr. Christian Zirpins. Thank you also for a good time over the last three years! Further thanks go to Rita Schmidt for professionally and patiently managing all the administrative hassle for our group. I also want to express my gratitude to the co-authors of the papers this dissertation is based on: Working with you has certainly helped me develop the ideas of this thesis. Beyond the people already named, these are Dr. Sherif Sakr and Dr. Liang Zhao. Furthermore, I would like to thank my family and friends who have supported me and who have been patient when I did not have enough time for them. I would also like to express iv particular gratitude to my father, Professor Dr. Rainer Bermbach, for proofreading this thesis – of course, any mistakes left are still mine. Finally, I would like to thank anyone who in some way or another helped me with finishing this dissertation - may they be conference participants, anonymous reviewers or someone I have forgotten to mention. Karlsruhe, 2014 David Bermbach Contents I Foundations 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Meaningful Consistency Metrics . . . . . . . . . . . . . . . . . . 6 1.2.2 Modeling and Simulation of Consistency Behavior . . . . . . . . 7 1.2.3 System Benchmarking of Consistency Behavior . . . . . . . . . . 7 1.2.4 Inconsistency Handling . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Consistency Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Database Systems and Transactions . . . . . . . . . . . . . . . . 12 2.1.2 Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Consistency Perspectives, Dimensions and Models . . . . . . . . . . . . 16 2.2.1 Consistency Perspectives . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Consistency Dimensions . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Consistency Models and Implementations . . . . . . . . . . . . . 18 2.3 Consistency Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 CAP Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 PACELC Model . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 Indirect Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.4 BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Exemplary Storage Systems . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.1 Google File System . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.2 Google Bigtable . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.3 Amazon Dynamo . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.4 Yahoo! PNUTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.5 Google Megastore and Spanner . . . . . . . . . . . . . . . . . . 34 vi Contents 2.4.6 Further NoSQL Systems . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Failures and Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 Failure Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.2 Failures and Consistency . . . . . . . . . . . . . . . . . . . . . . 38 2.5.3 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Modeling and Simulation of Software Quality . . . . . . . . . . . . . . . 41 3.2 System Benchmarking of Distributed Storage Systems . . . . . . . . . . 42 3.3 Management of Consistency Guarantees . . . . . . . . . . . . . . . . . . 43 II Consistency Benchmarking 45 4 Consistency Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1 Requirements for Consistency Metrics . . . . . . . . . . . . . . . . . . . 49 4.2 Data-centric Consistency Metrics . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Consistency Anomalies . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 Atomicity, Regularity, Safeness . . . . . . . . . . . . . . . . . . 51 4.2.3 Data-centric t-Visibility, k-Staleness . . . . . . . . . . . . . . . . 52 4.2.4 Ordering Violations . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Client-centric Consistency Metrics . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Client-centric t-Visibility, k-Staleness . . . . . . . . . . . . . . . 54 4.3.2 Ordering Violations . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 56 5 Modeling and Simulation of Consistency Behavior . . . . . . . . . . . 59 5.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 Basic System Model . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.2 Interaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2.3 Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.1 Calculating Convolutions . . . . . . . . . . . . . . . . . . . . . . 63 5.3.2 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . 64 5.3.3 Simulation Input Data . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Contents vii 6 System Benchmarking for Consistency Behavior . . . . . . . . . . . . 69 6.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Data-centric Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3 Client-centric Consistency . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.1 t-Visibility and k-Staleness . . . . . . . . . . . . . . . . . . . . . 73 6.3.2 Violations of Monotonic Read Consistency . . . . . . . . . . . . 75 6.3.3 Violations of Read Your Writes Consistency . . . . . . . . . . . . 76 6.3.4 Violations of Monotonic Write Consistency . . . . . . . . . . . . 76 6.3.5 Violations of Write Follows Read Consistency . . . . . . . . . . 76 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 III Application 79 7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.1 Modeling and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.1.1 Data Gathering Tools . . . . . . . . . . . . . . . . . . . . . . . . 83 7.1.2 Simulation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2 System Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.2.1 RYWC Measurements . . . . . . . . . . . . . . . . . . . . . . . 90 7.2.2 MWC Measurements . . . . . . . . . . . . . . . . . . . . . . . . 90 7.2.3 Staleness and MRC Measurements . . . . . . . . . . . . . . . . . 91 7.2.4 Comprehensive System Benchmarking . . . . . . . . . . . . . . 95 7.3 Running Consistency Benchmarks . . . . . . . . . . . . . . . . . . . . . 96 7.3.1 Modeling and Simulation . . . . . . . . . . . . . . . . . . . . . . 96 7.3.2 System Benchmarking . . . . . . . . . . . . . . . . . . . . . . . 97 7.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 97 8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.1 Modeling and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.1.1 MiniStorage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.1.2 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.2 System Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.2.1 Data-centric and Client-centric Staleness . . . . . . . . . . . . . 112 8.2.2 Long-term Study with Amazon S3 . . . . . . . . . . . . . . . . . 113 8.2.3 Geo-replication and Parallel Workloads . . . . . . . . . . . . . . 122 8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 viii Contents 9 Application Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.1 Handling Inconsistencies in a Webshop Scenario . . . . . . . . . . . . . 130 9.1.1 Scenario Description . . . . . . . . . . . . . . . . . . . . . . . . 131 9.1.2 Potential Conflicts and Resolution Mechanisms . . . . . . . . . . 131 9.2 A Middleware Guaranteeing Client-centric Consistency . . . . . . . . . . 135 9.2.1 Overhead and Intended Use Case . . . . . . . . . . . . . . . . . 139 9.2.2 Handling Sessions . . . . . . . . . . . . . . . . . . . . . . . . . 140 9.2.3 Consistency Guarantees . . . . . . . . . . . . . . . . . . . . . . 140 9.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 9.3 Efficient Inconsistency Handling . . . . . . . . . . . . . . . . . . . . . . 148 9.3.1 Modifications to Increase Efficiency . . . . . . . . . . . . . . . . 148 9.3.2 Extensions for Additional Guarantees . . . . . . . . . . . . . . . 149 9.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 IV Conclusions 151 10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 11 Discussion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Part I. Foundations