(Go: >> BACK << -|- >> HOME <<)

skip to main content
10.1145/564691.564757acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Automating physical database design in a parallel database

Published:03 June 2002Publication History

ABSTRACT

Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent nodes. We seek to automate the process of data partitioning. Given a workload of SQL statements, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal (or close to optimal) performance for that workload. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today's sophisticated query optimizers.We present a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system. Our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates. We compare a rank-based enumeration method with a random-based one. Our experimental results show that the former is more effective.

References

  1. {ACN00} Sanjay Agrawal, et al. Automated selection of materialized views and indexes in SQL databases. In Proceedings of VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {BFG+95} C. Baru, et al. DB2 parallel edition database systems: The future of high performance database systems. IBM Systems Journal, 34(2), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {CABK88} G. Copeland, et al. Data placement in Bubba. In Proceedings of the ACM SIGMOD Conference, pages 99-108, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {Car75} A. Cardenas. Analysis and performance of inverted data base structures. Communications of ACM, 18(5):253-263, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {CN98} Surajit Chaudhuri and Vivek R. Narasayya. Microsoft index tuning wizard for SQL server 7.0. In Proceedings SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {CNW83} Stefano Ceri, et al. Distribution design of logical database schemas. TSE, 9(4), 1983.Google ScholarGoogle Scholar
  7. {Cor00a} IBM Corporation. DB2 Universal Database enterprise extended edition Version 7.0. 2000.Google ScholarGoogle Scholar
  8. {Cor00b} Informix Corp. http://www.informix.com/informix/solutions/dw/redbrick/ vista. 2000.Google ScholarGoogle Scholar
  9. {Cor00c} Oracle Corporation. Oracle 9i database. 2000.Google ScholarGoogle Scholar
  10. {DG92} David DeWitt and Jim Gray. Parallel database systems: The future of high performance database systems. Communications of ACM, 35(6), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {FST88} S. Finkelstein, et al. Physical database design for relational databases. ACM Transactions of Database Systems, 13(1), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {Gha90} S. Ghandeharizadeh. Physical Database Design in Multi-processor Systems. PhD thesis, University of Wisconsin-Madison, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {GLSW93} Peter Gassner, et al. Query Optimization in the DB2 Family. Bulletin of the IEEE Technical Committee on Data Engineering, 16(4), 1993.Google ScholarGoogle Scholar
  14. {Gol89} David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, INC, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {HLL94} Kien A. Hua, et al. A decomposition-based simulated annealing technique for data clustering. In Proceedings of PODS, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {IK91} Yannis E. Ioannidis and Younkyung Cha Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of SIGMOD, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {KGV83} S. Kirkpatrick, et al. Optimization by simulated annealing. Science, 220(4598), 1983.Google ScholarGoogle Scholar
  18. {OL90} Kiyoshi Ono and Guy M. Lohman. Measuring the complexity of join enumeration in query optimization. In Proceedings of VLDB, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {RM93} Erhand Rahm and Rober Marek. Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In VLDB, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {SAC+79} Patricia G. Selinger, et al. Access path selection in a relational database management system. In Proceedings of SIGMOD, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {SMR00} Thomas Stöhr, et al. Multi-Dimensional Database Allocation for Parallel Data Warehouses. In Proceedings of VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {SW85} Domenico Sacca and Gio Wiederhold. Database partitioning in a cluster of processors. ACM Transactions of Database Systems, 10(1), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {TPC} TPC benchmark H (decision support) revision 1.1.0. http://www.tpc.org/.Google ScholarGoogle Scholar
  24. {VZZ+00} Gary Valentin, et al. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In Proceedings of ICDE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {WHMZ94} Gerhard Weikum, et al. The COMFORT automatic tuning project, invited project review. Information Systems, 19(5), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {Zil98} Daniel C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, Dept. of Computer Science, University of Toronto, 1998.Google ScholarGoogle Scholar

Index Terms

  1. Automating physical database design in a parallel database

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data
                June 2002
                654 pages
                ISBN:1581134975
                DOI:10.1145/564691

                Copyright © 2002 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 3 June 2002

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                SIGMOD '02 Paper Acceptance Rate42of240submissions,18%Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader