ABSTRACT
Cloud vendors provide ready-to-use distributed DBMS solutions as a service. While the provisioning of a DBMS is usually fully automated, customers typically still have to make important design decisions which were traditionally made by the database administrator such as finding an optimal partitioning scheme for a given database schema and workload. In this paper, we introduce a new learned partitioning advisor based on Deep Reinforcement Learning (DRL) for OLAP-style workloads. The main idea is that a DRL agent learns the cost tradeoffs of different partitioning schemes and can thus automate the partitioning decision. In the evaluation, we show that our advisor is able to find non-trivial partitionings for a wide range of workloads and outperforms more classical approaches for automated partitioning design.
Supplemental Material
- CloudLab. https://www.cloudlab.us/.Google Scholar
- Postgres-XL database. https://www.postgres-xl.org/.Google Scholar
- TPC-DS benchmark. http://www.tpc.org/tpcds/.Google Scholar
- S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 930--932. ACM, 2005.Google ScholarDigital Library
- I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.Google Scholar
- K. Chen, Y. Zhou, and Y. Cao. Online data partitioning in distributed database systems. In EDBT, 2015.Google Scholar
- C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. PVLDB, 3:48--57, 2010.Google ScholarDigital Library
- G. C. Durand, R. Piriyev, M. Pinnecke, D. Broneske, B. Gurumurthy, and G. Saake. Automated vertical partitioning with deep reinforcement learning. In New Trends in Databases and Information Systems, ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, September 8--11, 2019, Proceedings, pages 126--134, 2019.Google Scholar
- G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in oracle. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1111--1122. ACM, 2008.Google ScholarDigital Library
- I. Fetai, D. Murezzan, and H. Schuldt. Workload-driven adaptive data partitioning and distribution - the cumulus approach. 2015 IEEE International Conference on Big Data (Big Data), pages 1688--1697, 2015.Google ScholarDigital Library
- F. Funke, A. Kemper, and T. Neumann. Benchmarking hybrid oltp&olap database systems. Datenbanksysteme für Business, Technologie und Web (BTW), 2011.Google Scholar
- E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA, pages 6348--6358, 2017.Google Scholar
- A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. 2019.Google Scholar
- T. Kraska, M. Alizadeh, A. Beutel, E. Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan. Sagedb: A learned database system. CIDR, 2019.Google Scholar
- S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196, 2018.Google Scholar
- V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? Proceedings of the VLDB Endowment, 9(3):204--215, 2015.Google ScholarDigital Library
- V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, Oct 2018.Google ScholarDigital Library
- T. Li, Z. Xu, J. Tang, and Y. Wang. Model-free control for distributed stream data processing using deep reinforcement learning. Proceedings of the VLDB Endowment, 11(6):705--718, 2018.Google ScholarDigital Library
- Y. Lu, A. Shanbhag, A. Jindal, and S. Madden. Adaptdb: adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 10(5):589--600, 2017.Google ScholarDigital Library
- R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11):1705--1718, 2019.Google ScholarDigital Library
- R. Marcus and O. Papaemmanouil. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pages 1--4, 2018.Google ScholarDigital Library
- R. Marcus, O. Papaemmanouil, S. Semenova, and S. Garber. Nashdb: An end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proceedings of the 2018 International Conference on Management of Data, pages 1253--1267. ACM, 2018.Google ScholarDigital Library
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.Google ScholarCross Ref
- R. Nehme and N. Bruno. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1137--1148. ACM, 2011.Google ScholarDigital Library
- P. E. O'Neil, E. J. O'Neil, and X. Chen. The star schema benchmark (ssb). Pat, 200(0):50, 2007.Google Scholar
- A. Paliwal, F. Gimeno, V. Nair, Y. Li, M. Lubin, P. Kohli, and O. Vinyals. Reinforced genetic algorithm learning for optimizing computation graphs. In International Conference on Learning Representations, 2020.Google Scholar
- A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-driving database management systems. In CIDR, volume 4, page 1, 2017.Google Scholar
- A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In ACM SIGMOD, pages 61--72. ACM, 2012.Google ScholarDigital Library
- A. Quamar, K. A. Kumar, and A. Deshpande. Sword: scalable workload-aware data placement for transactional workloads. In Proceedings of the 16th International Conference on Extending Database Technology, pages 430--441. ACM, 2013.Google ScholarDigital Library
- T. Rabl and H. Jacobsen. Query centric partitioning and allocation for partially replicated database systems. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 315--330, 2017.Google ScholarDigital Library
- J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 558--569, New York, NY, USA, 2002. ACM.Google ScholarDigital Library
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.Google Scholar
- J. Sun and G. Li. An end-to-end learning-based cost estimator. arXiv preprint arXiv:1906.02560, 2019.Google Scholar
- R. Sutton. Reinforcement learning : an introduction. The MIT Press, Cambridge, Massachusetts, 2018.Google Scholar
- E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In ACM SIGMOD, pages 17--30, 2015.Google ScholarDigital Library
- J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, M. Ran, and Z. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 415--432, New York, NY, USA, 2019. ACM.Google ScholarDigital Library
- D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design advisor: Integrated automatic physical database design. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 1087--1097, 2004.Google Scholar
Index Terms
- Learning a Partitioning Advisor for Cloud Databases
Recommendations
Automatic Database Management System Tuning Through Large-scale Machine Learning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataDatabase management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, ...
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataConfiguration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due to the diverse database instances and query workloads, which make the database administrator ...
Towards learning a partitioning advisor with deep reinforcement learning
aiDM '19: Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data ManagementIn this paper we introduce a partitioning advisor for analytical workloads based on Deep Reinforcement Learning. In contrast to existing approaches for automated partitioning design, an RL agent learns its decisions based on experience by trying out ...
Comments