ABSTRACT
Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent nodes. We seek to automate the process of data partitioning. Given a workload of SQL statements, we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal (or close to optimal) performance for that workload. Previous attempts use heuristic rules to make those decisions. These approaches fail to consider all of the interdependent aspects of query performance typically modeled by today's sophisticated query optimizers.We present a comprehensive solution to the problem that has been tightly integrated with the optimizer of a commercial shared-nothing parallel database system. Our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates. We compare a rank-based enumeration method with a random-based one. Our experimental results show that the former is more effective.
- {ACN00} Sanjay Agrawal, et al. Automated selection of materialized views and indexes in SQL databases. In Proceedings of VLDB, 2000. Google ScholarDigital Library
- {BFG+95} C. Baru, et al. DB2 parallel edition database systems: The future of high performance database systems. IBM Systems Journal, 34(2), 1995. Google ScholarDigital Library
- {CABK88} G. Copeland, et al. Data placement in Bubba. In Proceedings of the ACM SIGMOD Conference, pages 99-108, 1988. Google ScholarDigital Library
- {Car75} A. Cardenas. Analysis and performance of inverted data base structures. Communications of ACM, 18(5):253-263, 1975. Google ScholarDigital Library
- {CN98} Surajit Chaudhuri and Vivek R. Narasayya. Microsoft index tuning wizard for SQL server 7.0. In Proceedings SIGMOD, 1998. Google ScholarDigital Library
- {CNW83} Stefano Ceri, et al. Distribution design of logical database schemas. TSE, 9(4), 1983.Google Scholar
- {Cor00a} IBM Corporation. DB2 Universal Database enterprise extended edition Version 7.0. 2000.Google Scholar
- {Cor00b} Informix Corp. http://www.informix.com/informix/solutions/dw/redbrick/ vista. 2000.Google Scholar
- {Cor00c} Oracle Corporation. Oracle 9i database. 2000.Google Scholar
- {DG92} David DeWitt and Jim Gray. Parallel database systems: The future of high performance database systems. Communications of ACM, 35(6), 1992. Google ScholarDigital Library
- {FST88} S. Finkelstein, et al. Physical database design for relational databases. ACM Transactions of Database Systems, 13(1), 1988. Google ScholarDigital Library
- {Gha90} S. Ghandeharizadeh. Physical Database Design in Multi-processor Systems. PhD thesis, University of Wisconsin-Madison, 1990. Google ScholarDigital Library
- {GLSW93} Peter Gassner, et al. Query Optimization in the DB2 Family. Bulletin of the IEEE Technical Committee on Data Engineering, 16(4), 1993.Google Scholar
- {Gol89} David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, INC, 1989. Google ScholarDigital Library
- {HLL94} Kien A. Hua, et al. A decomposition-based simulated annealing technique for data clustering. In Proceedings of PODS, 1994. Google ScholarDigital Library
- {IK91} Yannis E. Ioannidis and Younkyung Cha Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of SIGMOD, 1991. Google ScholarDigital Library
- {KGV83} S. Kirkpatrick, et al. Optimization by simulated annealing. Science, 220(4598), 1983.Google Scholar
- {OL90} Kiyoshi Ono and Guy M. Lohman. Measuring the complexity of join enumeration in query optimization. In Proceedings of VLDB, 1990. Google ScholarDigital Library
- {RM93} Erhand Rahm and Rober Marek. Analysis of dynamic load balancing strategies for parallel shared nothing database systems. In VLDB, 1993. Google ScholarDigital Library
- {SAC+79} Patricia G. Selinger, et al. Access path selection in a relational database management system. In Proceedings of SIGMOD, 1979. Google ScholarDigital Library
- {SMR00} Thomas Stöhr, et al. Multi-Dimensional Database Allocation for Parallel Data Warehouses. In Proceedings of VLDB, 2000. Google ScholarDigital Library
- {SW85} Domenico Sacca and Gio Wiederhold. Database partitioning in a cluster of processors. ACM Transactions of Database Systems, 10(1), 1985. Google ScholarDigital Library
- {TPC} TPC benchmark H (decision support) revision 1.1.0. http://www.tpc.org/.Google Scholar
- {VZZ+00} Gary Valentin, et al. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In Proceedings of ICDE, 2000. Google ScholarDigital Library
- {WHMZ94} Gerhard Weikum, et al. The COMFORT automatic tuning project, invited project review. Information Systems, 19(5), 1994. Google ScholarDigital Library
- {Zil98} Daniel C. Zilio. Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems. PhD thesis, Dept. of Computer Science, University of Toronto, 1998.Google Scholar
Index Terms
- Automating physical database design in a parallel database
Recommendations
Options in physical database design
A cornerstone of modern database systems is physical data independence, i.e., the separation of a type and its associated operations from its physical representation in memory and on storage media. Users manipulate and query data at the logical level; ...
Comments