research-article

Learning a Partitioning Advisor for Cloud Databases

Authors:
Benjamin Hilprecht

TU Darmstadt, Darmstadt, Germany

TU Darmstadt, Darmstadt, Germany
View Profile

,
Carsten Binnig

TU Darmstadt, Darmstadt, Germany

TU Darmstadt, Darmstadt, Germany
View Profile

,
Uwe Röhm

The University of Sydney, Sydney, Australia

The University of Sydney, Sydney, Australia
View Profile

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataJune 2020Pages 143–157https://doi.org/10.1145/3318464.3389704

Published:31 May 2020Publication History

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 143–157

ABSTRACT

Cloud vendors provide ready-to-use distributed DBMS solutions as a service. While the provisioning of a DBMS is usually fully automated, customers typically still have to make important design decisions which were traditionally made by the database administrator such as finding an optimal partitioning scheme for a given database schema and workload. In this paper, we introduce a new learned partitioning advisor based on Deep Reinforcement Learning (DRL) for OLAP-style workloads. The main idea is that a DRL agent learns the cost tradeoffs of different partitioning schemes and can thus automate the partitioning decision. In the evaluation, we show that our advisor is able to find non-trivial partitionings for a wide range of workloads and outperforms more classical approaches for automated partitioning design.

Supplemental Material

3318464.3389704.mp4

mp4

143.9 MB

Download

References

CloudLab. https://www.cloudlab.us/.Google Scholar
Postgres-XL database. https://www.postgres-xl.org/.Google Scholar
TPC-DS benchmark. http://www.tpc.org/tpcds/.Google Scholar
S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 930--932. ACM, 2005.Google ScholarDigital Library
I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.Google Scholar
K. Chen, Y. Zhou, and Y. Cao. Online data partitioning in distributed database systems. In EDBT, 2015.Google Scholar
C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. PVLDB, 3:48--57, 2010.Google ScholarDigital Library
G. C. Durand, R. Piriyev, M. Pinnecke, D. Broneske, B. Gurumurthy, and G. Saake. Automated vertical partitioning with deep reinforcement learning. In New Trends in Databases and Information Systems, ADBIS 2019 Short Papers, Workshops BBIGAP, QAUCA, SemBDM, SIMPDA, M2P, MADEISD, and Doctoral Consortium, Bled, Slovenia, September 8--11, 2019, Proceedings, pages 126--134, 2019.Google Scholar
G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in oracle. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1111--1122. ACM, 2008.Google ScholarDigital Library
I. Fetai, D. Murezzan, and H. Schuldt. Workload-driven adaptive data partitioning and distribution - the cumulus approach. 2015 IEEE International Conference on Big Data (Big Data), pages 1688--1697, 2015.Google ScholarDigital Library
F. Funke, A. Kemper, and T. Neumann. Benchmarking hybrid oltp&olap database systems. Datenbanksysteme für Business, Technologie und Web (BTW), 2011.Google Scholar
E. B. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA, pages 6348--6358, 2017.Google Scholar
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. 2019.Google Scholar
T. Kraska, M. Alizadeh, A. Beutel, E. Chi, A. Kristo, G. Leclerc, S. Madden, H. Mao, and V. Nathan. Sagedb: A learned database system. CIDR, 2019.Google Scholar
S. Krishnan, Z. Yang, K. Goldberg, J. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196, 2018.Google Scholar
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? Proceedings of the VLDB Endowment, 9(3):204--215, 2015.Google ScholarDigital Library
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, Oct 2018.Google ScholarDigital Library
T. Li, Z. Xu, J. Tang, and Y. Wang. Model-free control for distributed stream data processing using deep reinforcement learning. Proceedings of the VLDB Endowment, 11(6):705--718, 2018.Google ScholarDigital Library
Y. Lu, A. Shanbhag, A. Jindal, and S. Madden. Adaptdb: adaptive partitioning for distributed joins. Proceedings of the VLDB Endowment, 10(5):589--600, 2017.Google ScholarDigital Library
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. Proceedings of the VLDB Endowment, 12(11):1705--1718, 2019.Google ScholarDigital Library
R. Marcus and O. Papaemmanouil. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pages 1--4, 2018.Google ScholarDigital Library
R. Marcus, O. Papaemmanouil, S. Semenova, and S. Garber. Nashdb: An end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proceedings of the 2018 International Conference on Management of Data, pages 1253--1267. ACM, 2018.Google ScholarDigital Library
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.Google ScholarCross Ref
R. Nehme and N. Bruno. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1137--1148. ACM, 2011.Google ScholarDigital Library
P. E. O'Neil, E. J. O'Neil, and X. Chen. The star schema benchmark (ssb). Pat, 200(0):50, 2007.Google Scholar
A. Paliwal, F. Gimeno, V. Nair, Y. Li, M. Lubin, P. Kohli, and O. Vinyals. Reinforced genetic algorithm learning for optimizing computation graphs. In International Conference on Learning Representations, 2020.Google Scholar
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-driving database management systems. In CIDR, volume 4, page 1, 2017.Google Scholar
A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In ACM SIGMOD, pages 61--72. ACM, 2012.Google ScholarDigital Library
A. Quamar, K. A. Kumar, and A. Deshpande. Sword: scalable workload-aware data placement for transactional workloads. In Proceedings of the 16th International Conference on Extending Database Technology, pages 430--441. ACM, 2013.Google ScholarDigital Library
T. Rabl and H. Jacobsen. Query centric partitioning and allocation for partially replicated database systems. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 315--330, 2017.Google ScholarDigital Library
J. Rao, C. Zhang, N. Megiddo, and G. Lohman. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 558--569, New York, NY, USA, 2002. ACM.Google ScholarDigital Library
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.Google Scholar
J. Sun and G. Li. An end-to-end learning-based cost estimator. arXiv preprint arXiv:1906.02560, 2019.Google Scholar
R. Sutton. Reinforcement learning : an introduction. The MIT Press, Cambridge, Massachusetts, 2018.Google Scholar
E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In ACM SIGMOD, pages 17--30, 2015.Google ScholarDigital Library
J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, M. Ran, and Z. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 415--432, New York, NY, USA, 2019. ACM.Google ScholarDigital Library
D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design advisor: Integrated automatic physical database design. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, August 31 - September 3 2004, pages 1087--1097, 2004.Google Scholar

Index Terms

Learning a Partitioning Advisor for Cloud Databases
1. Information systems
  1. Data management systems
    1. Database administration
      1. Autonomous database administration
    2. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

Automatic Database Management System Tuning Through Large-scale Machine Learning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, ...
Read More
An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Configuration tuning is vital to optimize the performance of database management system (DBMS). It becomes more tedious and urgent for cloud databases (CDB) due to the diverse database instances and query workloads, which make the database administrator ...
Read More
Towards learning a partitioning advisor with deep reinforcement learning
aiDM '19: Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

In this paper we introduce a partitioning advisor for analytical workloads based on Deep Reinforcement Learning. In contrast to existing approaches for automated partitioning design, an RL agent learns its decisions based on experience by trying out ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
database management systems
database tuning
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 1,367
  Total Downloads
- Downloads (Last 12 months)156
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning a Partitioning Advisor for Cloud Databases

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Automatic Database Management System Tuning Through Large-scale Machine Learning

An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

Towards learning a partitioning advisor with deep reinforcement learning