(Go: >> BACK << -|- >> HOME <<)

skip to main content
10.1145/3318464.3389704acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Learning a Partitioning Advisor for Cloud Databases

Published:31 May 2020Publication History

ABSTRACT

Cloud vendors provide ready-to-use distributed DBMS solutions as a service. While the provisioning of a DBMS is usually fully automated, customers typically still have to make important design decisions which were traditionally made by the database administrator such as finding an optimal partitioning scheme for a given database schema and workload. In this paper, we introduce a new learned partitioning advisor based on Deep Reinforcement Learning (DRL) for OLAP-style workloads. The main idea is that a DRL agent learns the cost tradeoffs of different partitioning schemes and can thus automate the partitioning decision. In the evaluation, we show that our advisor is able to find non-trivial partitionings for a wide range of workloads and outperforms more classical approaches for automated partitioning design.

Skip Supplemental Material Section

Supplemental Material

3318464.3389704.mp4

mp4

143.9 MB

References

  1. CloudLab. https://www.cloudlab.us/.Google ScholarGoogle Scholar
  2. Postgres-XL database. https://www.postgres-xl.org/.Google ScholarGoogle Scholar
  3. TPC-DS benchmark. http://www.tpc.org/tpcds/.Google ScholarGoogle Scholar
  4. S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 930--932. ACM, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.Google ScholarGoogle Scholar
  6. K. Chen, Y. Zhou, and Y. Cao. Online data partitioning in distributed database systems. In EDBT, 2015.Google ScholarGoogle Scholar
  7. C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. PVLDB, 3:48--57, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. C. Durand, R. Piriyev, M. Pinnecke, D. Broneske, B. Gurumurthy, and G. Saake. Automated vertical partitioning with deep reinforcement learning. In New Trends in Databases and Information Systems, ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, September 8--11, 2019, Proceedings, pages 126--134, 2019.Google ScholarGoogle Scholar
  9. G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in oracle. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1111--1122. ACM, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Fetai, D. Murezzan, and H. Schuldt. Workload-driven adaptive data partitioning and distribution - the cumulus approach. 2015 IEEE International Conference on Big Data (Big Data), pages 1688--1697, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Funke, A. Kemper, and T. Neumann. Benchmarking hybrid oltp&olap database systems. Datenbanksysteme für Business, Technologie und Web (BTW), 2011.Google ScholarGoogle Scholar
  12. E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA, pages 6348--6358, 2017.Google ScholarGoogle Scholar
  13. A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. 2019.Google ScholarGoogle Scholar
  14. T. Kraska, M. Alizadeh, A. Beutel, E. Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan. Sagedb: A learned database system. CIDR, 2019.Google ScholarGoogle Scholar
  15. S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196, 2018.Google ScholarGoogle Scholar
  16. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? Proceedings of the VLDB Endowment, 9(3):204--215, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, Oct 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Li, Z. Xu, J. Tang, and Y. Wang. Model-free control for distributed stream data processing using deep reinforcement learning. Proceedings of the VLDB Endowment, 11(6):705--718, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Lu, A. Shanbhag, A. Jindal, and S. Madden. Adaptdb: adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 10(5):589--600, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11):1705--1718, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Marcus and O. Papaemmanouil. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pages 1--4, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Marcus, O. Papaemmanouil, S. Semenova, and S. Garber. Nashdb: An end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proceedings of the 2018 International Conference on Management of Data, pages 1253--1267. ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. Nehme and N. Bruno. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1137--1148. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. E. O'Neil, E. J. O'Neil, and X. Chen. The star schema benchmark (ssb). Pat, 200(0):50, 2007.Google ScholarGoogle Scholar
  26. A. Paliwal, F. Gimeno, V. Nair, Y. Li, M. Lubin, P. Kohli, and O. Vinyals. Reinforced genetic algorithm learning for optimizing computation graphs. In International Conference on Learning Representations, 2020.Google ScholarGoogle Scholar
  27. A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-driving database management systems. In CIDR, volume 4, page 1, 2017.Google ScholarGoogle Scholar
  28. A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In ACM SIGMOD, pages 61--72. ACM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Quamar, K. A. Kumar, and A. Deshpande. Sword: scalable workload-aware data placement for transactional workloads. In Proceedings of the 16th International Conference on Extending Database Technology, pages 430--441. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Rabl and H. Jacobsen. Query centric partitioning and allocation for partially replicated database systems. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 315--330, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 558--569, New York, NY, USA, 2002. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.Google ScholarGoogle Scholar
  33. J. Sun and G. Li. An end-to-end learning-based cost estimator. arXiv preprint arXiv:1906.02560, 2019.Google ScholarGoogle Scholar
  34. R. Sutton. Reinforcement learning : an introduction. The MIT Press, Cambridge, Massachusetts, 2018.Google ScholarGoogle Scholar
  35. E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In ACM SIGMOD, pages 17--30, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, M. Ran, and Z. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 415--432, New York, NY, USA, 2019. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design advisor: Integrated automatic physical database design. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 1087--1097, 2004.Google ScholarGoogle Scholar

Index Terms

  1. Learning a Partitioning Advisor for Cloud Databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
        June 2020
        2925 pages
        ISBN:9781450367356
        DOI:10.1145/3318464

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader