The document proposes an extension to the M-tree family of index structures called M*-tree. M*-tree improves upon M-tree by maintaining a nearest-neighbor graph within each node. The nearest-neighbor graph stores, for each entry in a node, a reference and distance to its nearest neighbor among the other entries in that node. This additional structure allows for more efficient filtering of non-relevant subtrees during search queries through the use of "sacrifice pivots". The experiments showed that M*-tree can perform searches significantly faster than M-tree while keeping construction costs low.
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
This document summarizes a research paper that proposes a distributed graph querying algorithm called MR-Graph that employs MapReduce. MR-Graph uses a filter-and-verify scheme to first filter graphs based on contained features before verifying subgraph isomorphism. It also adaptively tunes the feature size at runtime by sampling data graphs to determine the most appropriate size. The experiments showed MR-Graph outperforms conventional algorithms in scalability and efficiency for processing multiple graph queries over massive datasets.
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
This document proposes a new approach called SASUM for approximate subgraph matching in large graphs. Approximate subgraph matching allows missing edges in query matches, which is important for real-world graphs that may be incomplete. SASUM improves upon the basic approach of generating all possible query subgraphs and doing exact matching for each. It exploits the overlapping nature of query subgraphs to reduce the number that require costly exact matching. SASUM uses a lattice framework to identify sharing opportunities between query subgraphs. It generates small "base graphs" that are shared between queries and chooses a minimum set of these to match, from which it can derive matches for all queries. The approach outperforms the state-of-the-art by orders of
Hex-Cell is an interconnection network that has attractive features like the embedding capability of topological structures; such as; bus, ring, tree and mesh topologies. In this paper, we present two algorithms for embedding bus and ring topologies onto Hex-Cell interconnection network. We use three metrics to evaluate our proposed algorithms: dilation, congestion, and expansion. Our evaluation results
show that the congestion of our two proposed algorithms is equal to one; and the dilation is equal to 2d-1 for the first algorithm and 1 for the second.
This document describes three approaches for indexing moving point data over time: the 3D R-tree, the HR-tree, and the 2+3 R-tree. An experiment was conducted to evaluate the storage space requirements and query performance of each approach. The results showed that while the HR-tree required the most storage space, its query processing costs were over 50% lower than the 3D R-tree and 2+3 R-tree. Compared to maintaining separate R-trees for each time state, the HR-tree also offered similar query performance while using around one third less storage space.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSijcses
Nodes in Mobile Ad-hoc network are connected wirelessly and the network is auto configuring [1]. This paper introduces the usefulness of data warehouse as an alternative to manage data collected by WSN.Wireless Sensor Network produces huge quantity of data that need to be proceeded and homogenised, so as to help researchers and other people interested in the information. Collected data is managed and compared with other coming from datasources and systems could participate in technical report and decision making. This paper proposes a model to design, extract, transform and normalize data collected by Wireless Sensor Networks by implementing a multidimensional warehouse for comparing many aspects in WSN such as (routing protocol[4], sensor, sensor mobility, cluster ….). Hence, data warehouse defined and applied to the context above is presented as a useful approach that gives specialists row data and information for decision processes and navigate from one aspect to another.
The document discusses representing relational spatiotemporal data using information granules. It proposes:
1) Describing the relational data using a vocabulary of granular descriptors formed from Cartesian products of spatial, temporal, and signal information granules. This granular representation provides an interpretable perspective on the data.
2) Analyzing the capabilities of different vocabularies to capture the essence of the data through the processes of granulation and degranulation, where the original data is reconstructed from its granular representation. The quality of reconstruction is used to optimize the vocabulary.
3) Extending the approach to analyze evolvability of the granular description as the relational data changes across consecutive
The document summarizes research on performing spatio-textual similarity joins. It discusses:
1) Developing a filter-and-refine framework to efficiently find similar object pairs from two datasets using signatures.
2) Generating spatial and textual signatures for objects and building inverted indexes on the signatures to find candidate pairs.
3) Refining the candidate pairs to obtain the final result pairs that satisfy spatial and textual similarity thresholds.
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
This document summarizes a research paper that proposes a new dimension-reduced weighted fuzzy clustering algorithm (sWFCM-HD) for high-dimensional streaming data. The algorithm can cluster datasets that have both high dimensionality and a streaming (continuously arriving) nature. It combines previous work on clustering algorithms for streaming data and high-dimensional data. The paper introduces the algorithm and compares it experimentally to show improvements in memory usage and runtime over other approaches for these types of datasets.
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
This document presents an evaluation of different algorithms for performing parallel k-nearest neighbor (kNN) queries on big data using the MapReduce framework. It first discusses how kNN algorithms do not scale well for large datasets. It then reviews existing MapReduce-based kNN algorithms like H-BNLJ, H-zkNNJ, and RankReduce that improve performance by partitioning data and distributing computation. The document also proposes using an adaptive indexing technique with the RankReduce algorithm. An implementation of this approach on a airline on-time statistics dataset shows it achieves better precision and speed than other algorithms.
AN EFFECT OF USING A STORAGE MEDIUM IN DIJKSTRA ALGORITHM PERFORMANCE FOR IDE...ijcsit
The graph model is used widely for representing connected objects within a specific area. These objects are defined as nodes; where the connection is represented as arc called edges. The shortest path between two nodes is one of the most focus researchers’ attentions. Many algorithms are developed with different structured approach for reducing the shortest path cost. The most widely used algorithm is Dijkstra algorithm. This algorithm has been represented with various structural developments in order to reduce the shortest path cost. This paper highlights the idea of using a storage medium to store the solution path from Dijkstra algorithm, then, uses it to find the implicit path in an ideal time cost. The performance of Dijkstra algorithm using an appropriate data structure is improved. Our results emphasize that the searching time through the given data structure is reduced within different graphs models.
Dynamic selection of cluster head in in networks for energy managementeSAT Journals
Abstract In this project, we presented Multipath Region Routing (MRR) protocol for energy conservation in Wireless Sensor Networks (WSNs). Large scale dense WSNs are used in different types of applications for accurate monitoring. Energy conservation is an important issue in WSNs. In order to save energy, Multipath Region Routing protocol is used which provides balance in energy consumption and sustains the network life-span. By using this method, we can reduce the number of energy dissipation because the cluster head will collect data directly from other nodes. Hence, the energy can be preserved and network life time is extended to reasonable time. Keywords: Clustering; Wireless Sensor Networks; Security; Multipath Region Routing;
Dynamic selection of cluster head in in networks for energy managementeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
This document summarizes a research paper that presents a task-decomposition based anomaly detection system for analyzing massive and highly volatile session data from the Science Information Network (SINET), Japan's academic backbone network. The system uses a master-worker design with dynamic task scheduling to process over 1 billion sessions per day. It discriminates incoming and outgoing traffic using GPU parallelization and generates histograms of traffic volumes over time. Long short-term memory (LSTM) neural networks detect anomalies like spikes in incoming traffic volumes. The experiment analyzed SINET data from February 27 to March 8, 2021, detecting some anomalies while processing 500-650 gigabytes of daily session data.
This document summarizes a research paper on developing an improved LEACH (Low-Energy Adaptive Clustering Hierarchy) communication protocol for energy efficient data mining in multi-feature sensor networks. It begins with background on wireless sensor networks and issues like energy efficiency. It then discusses the existing LEACH protocol and its drawbacks. The proposed improved LEACH protocol includes cluster heads, sub-cluster heads, and cluster nodes to address LEACH's limitations. This new version aims to minimize energy consumption during cluster formation and data aggregation in multi-feature sensor networks.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
The document discusses trees and tree data structures. It provides details on heap data structures and algorithms for inserting and deleting elements from a heap. It also covers B-trees, including their definition, properties, and algorithms for constructing, inserting, and removing elements from a B-tree. B-trees are designed to improve search efficiency for large datasets stored on disk compared to binary search trees.
The document describes the process of inserting a new entry into an M-tree which is a data structure for indexing multi-dimensional data. It involves the following steps:
1) Check if the target leaf node is full, if so split the node and redistribute entries.
2) If the target leaf is not full, insert the new entry directly into the leaf.
3) If the target node is not a leaf, recursively descend the tree to find the appropriate leaf node for insertion while splitting nodes as needed to maintain the tree structure.
The document describes m-way search trees, B-trees, heaps, and their related operations. An m-way search tree is a tree where each node has at most m child nodes and keys are arranged in ascending order. B-trees are similar but ensure the number of child nodes falls in a range and all leaf nodes are at the same depth. Common operations like searching, insertion, and deletion are explained for each with examples. Heaps store data in a complete binary tree structure where a node's value is greater than its children's values.
The document describes external sorting techniques used when data is too large to fit in main memory. It discusses two-way sorting which uses two tape drive pairs to alternately write sorted runs. It also covers multi-way merging which merges multiple runs simultaneously using a heap. The techniques can improve performance over standard internal sorting.
1. A multi-way search tree allows nodes to have up to m children, where keys in each node are ordered and divide the search space.
2. B-trees are a generalization of binary search trees where all leaves are at the same depth and internal nodes have at least m/2 children.
3. Searching and inserting keys in a B-tree starts at the root and proceeds by comparing keys to guide traversal to the appropriate child node. Insertion may require splitting full nodes to balance the tree.
1. The document discusses B-trees, which are tree data structures used to store data in external storage like disks.
2. B-trees allow for efficient retrieval of data by keeping the tree balanced and all leaves at the same depth. They support insertion and deletion of keys in logarithmic time.
3. The process of deleting a key from a B-tree involves searching for the key, deleting the entry, and reflowing nodes or combining nodes if underflows occur to maintain the B-tree properties.
This document discusses B-trees, which are self-balancing search trees used to store data in databases. It provides an example of inserting and deleting keys from a B-tree of order 5. The main points are:
- B-trees allow for efficient retrieval and insertion/deletion of data in databases by keeping the tree balanced.
- In a B-tree of order 5, each node can have up to 5 children and must have at least 3 keys.
- The example shows inserting and deleting keys from an empty B-tree, with some insertions requiring splits and some deletions leaving nodes with fewer than the minimum keys.
This document discusses B-trees and B+-trees, which are data structures used to store and retrieve data in databases. It provides details on the characteristics and functioning of B-trees and B+-trees, including how they are implemented, how nodes are structured, and how inserts and deletes are handled. Examples are given of building B-trees and B+-trees of order 5 through inserting values in order and deleting values.
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
This document summarizes a study that evaluated the performance of a kernel radial basis probabilistic neural network (Kernel RBPNN) model for classifying iris data, compared to backpropagation, radial basis function, and radial basis probabilistic neural network models. The Kernel RBPNN model achieved the highest classification accuracy of 89.12% on test data from the iris dataset, performing better than the other models. It also had the fastest training time, being over 80 times faster than the radial basis function model. Analysis of the receiver operating characteristic curves showed that the Kernel RBPNN model had the largest area under the curve, indicating it had the best classification prediction capability out of the four models evaluated.
SUBGRAPH MATCHING WITH SET SIMILARITY IN A LARGE GRAPH DATABASE - IEEE PROJE...Nexgen Technology
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
Subgraph matching with set similarity in anexgentech15
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
This document discusses various algorithms used for clustering data streams. It begins by introducing the problem of clustering streaming data and the common approach of using micro-clusters to summarize streaming data. It then reviews several prominent clustering algorithms like DBSCAN, DENCLUE, SNN, and CHAMELEON. The document focuses on the DBSTREAM algorithm, which explicitly captures density between micro-clusters using a shared density graph to improve reclustering. Experimental results show DBSTREAM's reclustering using shared density outperforms other reclustering strategies while using fewer micro-clusters.
The document discusses employing a rough set based approach for clustering categorical time-evolving data. It proposes using node importance and a rough membership function to label unlabeled data points and detect concept drift between data clusters over time. Specifically, it defines key terms like nodes, introduces the problem of clustering categorical time-series data and concept drift detection. It then describes using a rough membership function to calculate similarity between unlabeled data and existing clusters in order to label the data and detect changes in cluster characteristics over time.
This document discusses using clustering algorithms to construct ontologies from text documents. It begins with an introduction to semantic search, ontologies in the semantic web, and clustering. It then describes the ROCK clustering algorithm in detail. The main tasks to perform are preprocessing text documents, normalizing term weights, applying latent semantic indexing via singular value decomposition, and using the ROCK clustering algorithm. The goal is to group similar documents into clusters to help construct an ontology from the unstructured text data.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
This document summarizes two algorithms - MFA and ATRA - for processing top-k spatial preference queries. MFA is a threshold-based algorithm that partitions queries into three features - spatial, preference, and text - and retrieves objects with the highest aggregate scores. ATRA uses a hybrid indexing structure called AIR-tree to more efficiently retrieve only relevant objects without revisiting the same data. The paper then proposes using an R-tree index structure combined with an enhanced branch-and-bound search algorithm to answer preference-based top-k spatial keyword queries by ranking objects based on feature quality in their neighborhoods.
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
This document summarizes a research paper titled "A Novel Algorithm for Design Tree Classification with PCA". It discusses dimensionality reduction techniques like principal component analysis (PCA) that can improve the efficiency of classification algorithms on high-dimensional data. PCA transforms data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component. The paper proposes applying PCA and linear transformation on an original dataset before using a decision tree classification algorithm, in order to get better classification results.
This document presents an approach for clustering a mixed dataset containing both numeric and categorical attributes using an ART-2 neural network model. The dataset contains daily stock price data with 19 attributes describing comparisons between consecutive days. Clustering mixed datasets is challenging due to different attribute types. The ART-2 model is used to classify the dataset without requiring a distance function. Then an autoencoder model reduces the dimensionality to allow visual validation of the clusters. The results demonstrate the ART-2 model's ability to cluster complex, mixed datasets.
An Efficient top- k Query Processing in Distributed Wireless Sensor NetworksIJMER
Wireless Sensor Networks (WSNs) are usually defined as large-scale, ad-hoc, multi-hop and
wireless unpartitioned networks of homogeneous, small, static nodes deployed in an area of interest.
Applications of sensor networks include monitoring volcano activity, building structures or natural
habitat monitoring. In this paper, we present the problem of processing probabilistic top-k queries in a
distributed wireless sensor networks. The basic problem in top-k query processing is that, a single method
cannot be used as a solution to the problem of top-k query processing because there are many types of
top-k query processing. The method has to be based on the situation, the classification and the type of
database and the query model. Here we develop three algorithms, namely, sufficient set-based (SSB),
necessary set-based (NSB), and boundary-based (BB), for inter- cluster query processing with bounded
rounds of communications. Moreover, in responding to dynamic changes of data distribution in the
overall network, we develop an adaptive algorithm that dynamically switches among the three proposed
algorithms to minimize the transmission cost.
The document proposes a strategy for clustering distributed databases using self-organizing maps (SOM) and K-means algorithms. The strategy applies SOM locally to each distributed data set to obtain representative subsets, then combines the results and applies SOM and K-means globally. Specifically, it performs local SOM clustering, sends representative data to a central site, applies SOM again on the combined data, then uses K-means on the unified map to produce the final clustering result.
This document outlines a course on programming and data structures in C. It discusses key concepts like abstract data types, asymptotic analysis, various data structures like arrays, stacks, queues, linked lists, trees, and graphs. It covers different algorithms for searching, sorting and indexing of data. The objectives are to learn a program-independent view of data structures and their usage in algorithms. Various data structures, their representations and associated operations are explained. Methods for analyzing algorithms to determine their time and space complexity are also presented.
Optimizing the Data Collection in Wireless Sensor NetworkIRJET Journal
This document discusses optimizing data collection in wireless sensor networks. It begins by introducing the concepts of wireless sensor networks and data collection trees. It then discusses using Breadth-First Search (BFS) for data collection and proposes a Parallel Data Collection in BFS (PDCBFS) approach. PDCBFS allows nodes to aggregate data from themselves and child nodes into a single packet to send to the parent node, reducing transfer time compared to individual packets in BFS. The document analyzes and compares the performance of BFS and PDCBFS in terms of data collected and delay required for collection.
Efficient Query Evaluation of Probabilistic Top-k Queries in Wireless Sensor ...ijceronline
The document summarizes research on efficient query evaluation methods for probabilistic top-k queries in wireless sensor networks. It proposes three algorithms (SSB, NSB, BB) that use the concepts of sufficient and necessary sets to prune data and reduce transmissions between clusters and the base station. It also develops an adaptive algorithm that dynamically switches between the three based on estimated costs. Experimental results show the algorithms outperform baselines and the adaptive approach achieves near-optimal performance under different conditions.
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
This document discusses machine learning algorithms for image classification using five different classification schemes. It summarizes the mathematical models behind each classification algorithm, including Nearest Class Centroid classifier, Nearest Sub-Class Centroid classifier, k-Nearest Neighbor classifier, Perceptron trained using Backpropagation, and Perceptron trained using Mean Squared Error. It also describes two datasets used in the experiments - the MNIST dataset of handwritten digits and the ORL face recognition dataset. The performance of the five classification schemes are compared on these datasets.
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
The document reviews existing methods for the k-means clustering algorithm. It discusses how k-means clustering works and some of its limitations when dealing with large datasets, such as being dependent on the initial choice of centroids. It then proposes using Hadoop to overcome big data challenges and calculate preliminary centroids for k-means clustering in a distributed manner. Finally, it reviews different techniques that have been proposed in other research to improve k-means clustering, such as methods for selecting better initial centroids or determining the optimal number of clusters.
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
This document discusses using k-means clustering to partition datasets that have been generated through horizontal aggregation of data from multiple database tables. It provides background on horizontal aggregation techniques like pivot tables and describes the k-means clustering algorithm. The algorithm is applied as an example to cluster a sample dataset into two groups. The document concludes that k-means clustering can effectively partition large datasets produced by horizontal aggregations to facilitate further data mining analysis.
This document provides a tutorial on pointers and arrays in C. It begins by explaining that a pointer is a variable that holds the address of another variable. This allows a pointer variable to indirectly "point to" another variable in memory. The document covers various uses of pointers, including with arrays, strings, structures, dynamic memory allocation, and functions. It provides many code examples to demonstrate how pointers work in practice.
The document summarizes the M-tree, a new access method for organizing and searching large datasets in metric spaces. The M-tree is a balanced tree that partitions objects based on their relative distances as measured by a distance function, with objects stored in fixed-size nodes. It can index objects using arbitrary distance functions as long as they satisfy the metric properties. The M-tree aims to reduce both the number of accessed nodes and distance computations needed for similarity queries, improving performance for CPU-intensive distance functions. Algorithms for range and k-nearest neighbor queries are described that leverage distance information stored in the M-tree to prune search spaces.
c language faq's contains the frequently asked questions in c language and useful for all c programmers. This is an pdf document of Sterve_summit who is one of the best writer for c language
The document defines the User Datagram Protocol (UDP) which provides a minimal datagram mode of communication using the Internet Protocol and allows applications to send messages with minimal overhead, though delivery is not guaranteed like with TCP. UDP uses port numbers and IP addresses in packet headers and optional checksums to provide some protection against misrouted packets. Major applications that use UDP include the Internet Name Server and Trivial File Transfer.
The document provides an overview of the Stream Control Transmission Protocol (SCTP). SCTP is a connection-oriented transport layer protocol that offers reliable data transfer over IP networks. It supports features like multihoming for network fault tolerance, multi-streaming to minimize delay, and congestion control. The document discusses SCTP's architecture, features, security mechanisms, and error handling. It is intended to help application developers write programs using SCTP socket APIs.
This ppt is html for beginners and html made easy for them to get the basic idea of html.
Html for beginners. A basic information of html for beginners. A more depth coverage of html and css will be covered in the future presentations. visit my sites http://technoexplore.blogspot.com and http://hotjobstuff.blogspot.com for some other important presentations.
Here are some interview tips for cracking the interview. During this recession period it is very important.
visit my sites http://technoexplore.blogspot.com and http://hotjobstuff.blogspot.com for some other important presentations.
queue datastructure made easy and datastructure explained in java. visit http://technoexplore.blogspot.com and http://hotjobstuff.blogspot.com for some other important presentations.
Html for beginners. A basic information of html for beginners. A more depth coverage of html and css will be covered in the future presentations. visit my sites http://technoexplore.blogspot.com and http://hotjobstuff.blogspot.com for some other important presentations.
This document introduces binary trees and provides a series of practice problems of increasing difficulty related to binary trees. It begins with an introduction to binary tree structure and terminology. It then provides 14 binary tree problems with descriptions and hints for solving each problem. The problems involve tasks like counting nodes, finding minimum/maximum values, printing trees in different orders, checking for paths with a given sum, and printing all root-to-leaf paths. Sample solutions are provided in subsequent sections in C/C++ and Java languages.
Prim's algorithm is a greedy algorithm used to find minimum spanning trees for weighted undirected graphs. It operates by building the spanning tree one vertex at a time, from an arbitrary starting vertex, at each step adding the minimum weight edge that connects the growing spanning tree to a vertex not yet included in the tree. The algorithm repeats until all vertices are added.
This document provides an introduction and 18 problems related to linked lists of increasing difficulty. It begins with a review of basic linked list code techniques, such as iterating through a list and adding/removing nodes. The problems cover a wide range of skills with pointers and complex algorithms. Though linked lists are not commonly used today, they are excellent for developing skills with complex pointer-based data structures and algorithms. The document provides solutions to all problems to help readers practice and learn.
The server containing programming assignments is located at 10.203.161.7 under the ~/CPP/ directory. The document outlines various C++ programming assignments divided across multiple sessions, including writing classes, operator overloading, inheritance, polymorphism, and more. Assignments involve concepts such as memory management, pass by reference, constructors/destructors, friend functions, and virtual functions.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Hire a private investigator to get cell phone recordsHackersList
Learn what private investigators can legally do to obtain cell phone records and track phones, plus ethical considerations and alternatives for addressing privacy concerns.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
AC Atlassian Coimbatore Session Slides( 22/06/2024)apoorva2579
This is the combined Sessions of ACE Atlassian Coimbatore event happened on 22nd June 2024
The session order is as follows:
1.AI and future of help desk by Rajesh Shanmugam
2. Harnessing the power of GenAI for your business by Siddharth
3. Fallacies of GenAI by Raju Kandaswamy
this resume for sadika shaikh bca studentSadikaShaikh7
I am a dedicated BCA student with a strong foundation in web technologies, including PHP and MySQL. I have hands-on experience in Java and Python, and a solid understanding of data structures. My technical skills are complemented by my ability to learn quickly and adapt to new challenges in the ever-evolving field of computer science.
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threatsanupriti
In the rapidly evolving landscape of blockchain technology, the advent of quantum computing poses unprecedented challenges to traditional cryptographic methods. As quantum computing capabilities advance, the vulnerabilities of current cryptographic standards become increasingly apparent.
This presentation, "Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats," explores the intersection of blockchain technology and quantum computing. It delves into the urgent need for resilient cryptographic solutions that can withstand the computational power of quantum adversaries.
Key topics covered include:
An overview of quantum computing and its implications for blockchain security.
Current cryptographic standards and their vulnerabilities in the face of quantum threats.
Emerging post-quantum cryptographic algorithms and their applicability to blockchain systems.
Case studies and real-world implications of quantum-resistant blockchain implementations.
Strategies for integrating post-quantum cryptography into existing blockchain frameworks.
Join us as we navigate the complexities of securing blockchain networks in a quantum-enabled future. Gain insights into the latest advancements and best practices for safeguarding data integrity and privacy in the era of quantum threats.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
What Not to Document and Why_ (North Bay Python 2024)Margaret Fero
We’re hopefully all on board with writing documentation for our projects. However, especially with the rise of supply-chain attacks, there are some aspects of our projects that we really shouldn’t document, and should instead remediate as vulnerabilities. If we do document these aspects of a project, it may help someone compromise the project itself or our users. In this talk, you will learn why some aspects of documentation may help attackers more than users, how to recognize those aspects in your own projects, and what to do when you encounter such an issue.
These are slides as presented at North Bay Python 2024, with one minor modification to add the URL of a tweet screenshotted in the presentation.
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Erasmo Purificato
Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)
AI_dev Europe 2024 - From OpenAI to Opensource AIRaphaël Semeteys
Navigating Between Commercial Ownership and Collaborative Openness
This presentation explores the evolution of generative AI, highlighting the trajectories of various models such as GPT-4, and examining the dynamics between commercial interests and the ethics of open collaboration. We offer an in-depth analysis of the levels of openness of different language models, assessing various components and aspects, and exploring how the (de)centralization of computing power and technology could shape the future of AI research and development. Additionally, we explore concrete examples like LLaMA and its descendants, as well as other open and collaborative projects, which illustrate the diversity and creativity in the field, while navigating the complex waters of intellectual property and licensing.
1. Improving the Performance of M-tree Family
by Nearest-Neighbor Graphs
Tom´ˇ Skopal, David Hoksza
as
Charles University in Prague, FMP, Department of Software Engineering
Malostransk´ n´m. 25, 118 00 Prague, Czech Republic
ea
{tomas.skopal, david.hoksza}@mff.cuni.cz
Abstract. The M-tree and its variants have been proved to provide an
efficient similarity search in database environments. In order to further
improve their performance, in this paper we propose an extension of the
M-tree family, which makes use of nearest-neighbor (NN) graphs. Each
tree node maintains its own NN-graph, a structure that stores for each
node entry a reference (and distance) to its nearest neighbor, considering
just entries of the node. The NN-graph can be used to improve filtering
of non-relevant subtrees when searching (or inserting new data). The fil-
tering is based on using ”sacrifices” – selected entries in the node serving
as pivots to all entries being their reverse nearest neighbors (RNNs). We
propose several heuristics for sacrifice selection; modified insertion; range
and kNN query algorithms. The experiments have shown the M-tree (and
variants) enhanced by NN-graphs can perform significantly faster, while
keeping the construction cheap.
1 Introduction
In multimedia retrieval, we usually need to retrieve objects based on similarity
to a query object. In order to retrieve the relevant objects in an efficient (quick)
way, the similarity search applications often use metric access methods (MAMs)
[15], where the similarity measure is modeled by a metric distance δ. The prin-
ciple of all MAMs is to employ the triangle inequality satisfied by any metric,
in order to partition the dataset S among classes (organized in a metric index ).
When a query is to be processed, the metric index is used to quickly filter the
non-relevant objects of the dataset, so that the number of distance computa-
tions needed to answer a query is minimized.1 In the last two decades, there
were many MAMs developed, e.g., gh-tree, GNAT, (m)vp-tree, SAT, (L)AESA,
D-index, M-tree, to name a few (we refer to monograph [15]). Among them, the
M-tree (and variants) has been proved as the most universal and suitable for
practical database applications.
In this paper we propose an extension to the family of M-tree variants (im-
plemented for M-tree and PM-tree), which is based on maintaining an additional
structure in each tree node – the nearest-neighbor (NN) graph. We show that
utilization of NN-graphs can speed up the search significantly.
1
The metric is often supposed expensive, so the number of distance computations is
regarded as the most expensive component of the overall runtime costs.
2. 2 M-tree
The M-tree [7] is a dynamic (easily updatable) index structure that provides good
performance in secondary memory (i.e. in database environments). The M-tree
index is a hierarchical structure, where some of the data objects are selected
as centers (references or local pivots) of ball-shaped regions, and the remaining
objects are partitioned among the regions in order to build up a balanced and
compact hierarchy of data regions, see Figure 1a. Each region (subtree) is indexed
recursively in a B-tree-like (bottom-up) way of construction.
The inner nodes of M-tree store routing entries
routl (Oi ) = [Oi , rOi , δ(Oi , P ar(Oi )), ptr(T (Oi ))]
where Oi ∈ S is a data object representing the center of the respective ball
region, rOi is a covering radius of the ball, δ(Oi , P ar(Oi )) is so-called to-parent
distance (the distance from Oi to the object of the parent routing entry), and
finally ptr(T (Oi )) is a pointer to the entry’s subtree. The data is stored in the
leaves of M-tree. Each leaf contains ground entries
grnd(Oi ) = [Oi , δ(Oi , P ar(Oi ))]
where Oi ∈ S is the indexed data object itself, and δ(Oi , P ar(Oi )) is, again, the
to-parent distance. See an example of routing and ground entries in Figure 1a.
Fig. 1. (a) Example of an M-tree (b) Basic filtering (c) Parent filtering
2.1 Query processing
The range and k nearest neighbors (kNN) queries are implemented by traversing
the tree, starting from the root2 . Those nodes are accessed, the parent region
(R, rR ) of which (described by the routing entry) is overlapped by the query ball
(Q, rQ ). In case of a kNN query (we search for k closest objects to Q), the radius
rQ is not known in advance, so we have to additionally employ a heuristic to
dynamically decrease the radius during the search (initially set to ∞).
2
We just outline the main principles, the algorithms are described in Section 4, in-
cluding the proposed extensions. The original M-tree algorithms can be found in [7].
3. Basic filtering. The check for region-and-query overlap requires an explicit
distance computation δ(R, Q), see Figure 1b. In particular, if δ(R, Q) ≤ rQ + rR ,
the data ball R overlaps the query ball, thus the child node has to be accessed.
If not, the respective subtree is filtered from further processing.
Parent filtering. As each node in the tree contains the distances from the
routing/ground entries to the center of its parent node, some of the non-relevant
M-tree branches can be filtered out without the need of a distance computa-
tion, thus avoiding the “more expensive” basic overlap check (see Figure 1c). In
particular, if |δ(P, Q) − δ(P, R)| > rQ + rR , the data ball R cannot overlap the
query ball, thus the child node has not to be re-checked by basic filtering. Note
δ(P, Q) was computed in the previous (unsuccessful) parent’s basic filtering.
2.2 M-tree Construction
Starting at the root, a new object Oi is recursively inserted into the best subtree
T (Oj ), which is defined as the one for which the covering radius rOj must increase
the least in order to cover the new object. In case of ties, the subtree whose center
is closest to Oi is selected. The insertion algorithm proceeds recursively until a
leaf is reached and Oi is inserted into that leaf. A node’s overflow is managed
in a similar way as in the B-tree – two objects from the overflowed node are
selected as new centers, the node is split, and the two new centers (forming two
routing entries) are promoted to the parent node. If the parent overflows, the
procedure is repeated. If the root overflows, it is split and a new root is created.
3 Related work
In the last decade, there have been several modifications/successors of M-tree
introduced, some improving the performance of M-tree, some others improving
the performance but restricting the general metric case into vector spaces, and,
finally, some adjusting M-tree for an extended querying model.
In the first case, the Slim-tree [14] introduced two features. First, a new policy
of node splitting using the minimum spanning tree was proposed to reduce inter-
nal CPU costs during insertion (but not the number of distance computations).
The second feature was the slim-down algorithm, a post-processing technique for
redistribution of ground entries in order to reduce the volume of bottom-level
data regions. Another paper [12] extended the slim-down algorithm to the gen-
eral case, where also the routing entries are redistributed. The authors of [12]
also introduced the multi-way insertion of a new object, where an optimal leaf
node for the object is found. A major improvement in search efficiency has been
achieved by the proposal of PM-tree [13, 10] (see Section 5.1), where the M-tree
was combined with pivot-based techniques (like LAESA).
The second case is represented by the M+ -tree [16], where the nodes are fur-
ther partitioned by a hyper-plane (where a key dimension is used to isometrically
partition the space). Because of hyper-plane partitioning, the usage of M+ -tree
4. is limited just to Euclidean vector spaces. The BM+ -tree [17] is a generalization
of M+ -tree where the hyper-plane can be rotated, i.e. it has not to be parallel
to the coordinate axes (the restriction to Euclidean spaces remains).
The last category is represented by the M2 -tree [5], where the M-tree is
generalized to be used with an arbitrary aggregation of multiple metrics. Another
structure is the M3 -tree [3], a similar generalization of M-tree as the M2 -tree,
but restricted just to linear combinations of partial metrics, which allows to
effectively compute the lower/upper bounds when using query-weighted distance
functions. The last M-tree modification of this kind is the QIC-M-tree [6], where
a user-defined query distance (even non-metric) is supported, provided a lower-
bounding metric to the query distance (needed for indexing) is given by user.
In this paper we propose an extension belonging to the first category (i.e.
improving query performance of the general case), but which could be utilized
in all the M-tree modifications (M-tree family) overviewed in this section.
4 M*-tree
We propose the M*-tree, an extension of M-tree having each node additionally
equipped by nearest-neighbor graph (NN-graph). The motivation for employing
NN-graphs is related to the advantages of methods using global pivots (objects
which all the objects in dataset are referenced to). Usually, the global pivots are
used to filter out some non-relevant objects from the dataset (e.g. in LAESA [9]).
In general, the global pivots are good if they are far from each other and provide
different ”viewpoints” with respect to the dataset [2]. The M-tree, on the other
hand, is an example of a method using local pivots, where the local pivots are the
parent routing entries of nodes. As a hybrid structure, the PM-tree is combining
the ”local-pivoting strategies” of M-tree with the ”global-pivoting strategies” of
LAESA.
We can also state a ”closeness criterion” for good pivots, as follows. A pivot
is good in case it is close to a data object or, even better, close to the query
object. In both cases, a close pivot provides tight distance approximation to the
data/query object, so that filtering of data object by use of precomputed pivot-
to-query or pivot-to-data distances is more effective. Unfortunately, this criterion
cannot be effectively utilized for global pivots, because there is only a limited
number of global pivots, while there are many objects referenced to them (so
once we move a global pivot to become close to an object, many other objects
become handicapped). Nevertheless, the closeness criterion can be utilized in
local-pivoting strategies, because only a limited number of dataset objects are
referenced to a local pivot and, conversely, a dataset object is referenced to a
small number of pivots. Hence, a replacement of local pivot would impact only
a fraction of objects within the entire dataset.
4.1 Nearest-neighbor graphs
In M-tree node, there exists just one local pivot, the parent (which moreover
plays the role of center of the region). The parent filtering described in Section 2.1
5. is, actually, pivot-based filtering where the distance to the pivot is precomputed
so we can avoid computation of the query-to-object distance (see Figure 1c).
However, from the closeness criterion point of view, the parent is not guaranteed
to be close to a particular object in the node, it is rather a compromise which is
”close to all of them”.
Following the closeness criterion, in this paper we propose a technique where
all the objects in an M-tree node mutually play the roles of local pivots. In
order to reduce space and computation costs, each object in a node is explicitly
referenced just to its nearest neighbor. This fits the closeness criterion; we obtain
a close pivot for each of the node’s objects. In this way, for each node N we get
a list of triplets Oi , N N (Oi , N ), δ(Oi , N N (Oi , N )) , which we call the nearest-
neighbor graph (NN-graph) of node N , see Figure 2a.
The result is M*-tree, an extension of M-tree, where the nodes are addition-
ally equipped by NN-graphs, i.e. a routing entry is defined as
routl (Oi ) = [Oi , rOi , δ(Oi , P ar(Oi )), N N (Oi ), δ(Oi , N N (Oi )) , ptr(T (Oi ))]
while the ground entry is defined as
grnd(Oi ) = [Oi , δ(Oi , P ar(Oi )), N N (Oi ), δ(Oi , N N (Oi )) ]
Fig. 2. (a) NN-graph in M*-tree node (b) Filtering using sacrifice pivot
4.2 Query processing
In addition to M-tree’s basic and parent filtering, in M*-tree we can utilize the
NN-graph when filtering non-relevant routing/ground entries from the search.
NN-graph filtering. The filtering using NN-graph is similar to the parent
filtering, however, instead of the parent, we use an object S from the node.
First, we have to select such object S; then its distance to the query object Q is
explicitly computed. We call the object S a sacrifice pivot (or simply sacrifice),
since to ”rescue” other objects from basic filtering, this one must be ”sacrificed”
(i.e. the distance to the query has to be computed).
6. Lemma 1 (NN-graph filtering)
Let entry(R) be a routing or ground entry in node N to be checked against a
range query (Q, rQ ). Let entry(S) be a sacrifice entry in node N , which is R’s
nearest neighbor, or R is nearest neighbor to S. Then if
|δ(S, Q) − δ(S, R)| > rR + rQ
the entry(R) does not overlap the query and we can safely exclude entry(R) from
further processing (for a ground entry rR = 0).
Proof: Follows immediately from the triangle inequality. (a) Since δ(S, R) + rR
is the upper-bound distance of any object in (R, rR ) to S, we can extend or reduce
the radius of rout(S) to δ(S, R)+rR and then perform the standard overlap check
between (Q, rQ ) and (S, δ(S, R) + rR ), i.e. δ(S, Q) − δ(S, R) > rR + rQ . (b) Since
δ(S, R) − rR is the lower-bound distance of any object in (R, rR ) to S, we can
check whether the query (Q, rQ ) lies entirely inside the hole (S, δ(S, R) − rR ),
i.e. δ(S, R) − δ(S, Q) > rR + rQ .
When using a sacrifice S, all objects which are reverse nearest neighbors
(RNNs) to S can be passed to the NN-graph filtering (in addition to the single
nearest neighbor of S). The reverse nearest neighbors are objects in N having
S as their nearest neighbor. Note that for different sacrifices Si within the same
node N , their sets of RNNs are disjoint. The operator RNN can be defined as
+
RN N : S × N odes → 2S×R , where for a given sacrifice S and a node N , the
operator RN N returns a set of pairs Oi , di of reverse nearest neighbors.
The objects returned by RN N operator can be retrieved from the NN-graph
without a need of distance computation (i.e. ”for free”). See the NN-graph fil-
tering pseudocode in Listing 1.
Listing 1 (NN-graph filtering)
set FilterByNNGraph(Node N , sacrifice S, δ(S, Q) , RQuery (Q, rQ )) {
set notF iltered = ∅
for each entry(Oj ) in RNN(S, N ) do
if |δ(S, Q) − δ(S, Oj )| > rQ + rO j then
f iltered[entry(Oj )] = true;
else
add entry(Oj ) to notF iltered
return notF iltered
}
Range query. When implementing a query processing, the tree structure is
traversed such that non-overlapping nodes (their parent regions do not overlap
the query ball) are excluded from further processing (filtered out). In addition
to the basic and parent filtering, in M*-tree we can use the NN-graph filtering
step (inserted after the step of parent filtering and before the basic filtering),
while we hope some distance computations needed by basic filtering after an
unsuccessful parent filtering will be saved.
7. When processing a node N , there arises a question how to choose the or-
dering of N ’s entries passed to NN-graph filtering as sacrifices. We will discuss
various orderings in Section 4.2, however, now suppose we already have a heuris-
tic function H(N ) which returns an ordering on entries in N . In this order the
potential sacrifices are initially inserted into a queue SQ.
In each step of a node processing, the first object Si from SQ (a sacrifice
candidate) is fetched. Then, the sacrifice candidate is checked whether it can
be filtered by parent filtering. If not, the sacrifice candidate becomes a regular
sacrifice, so distance from Q to Si is computed, while all Si ’s RNNs (objects in
RN N (Si , N )) are tried to be filtered out by NN-graph filtering. Note the non-
filtered RNNs (remaining also in SQ) surely become sacrifices, because RNN
sets are disjoint, i.e., an object is passed at most once to the NN-graph filtering.
Hence, we can immediately move the non-filtered RNNs to the beginning of SQ
– this prevents some ”NN-filterable” objects in SQ to become sacrifices. See the
range query algorithm in Listing 2.
Listing 2 (range query algorithm)
QueryResult RangeQuery(Node N , RQuery (Q, rQ ), ordering heuristic H) {
let P be the parent routing object of N
/* if N is root then δ(Oi , P )=δ(P, Q)=0 */
let f iltered be an array of boolean flags, size of f iltered is |N |
set f iltered[entry(Oi )]=f alse, ∀entry(Oi ) ∈ N
let SQ be a queue filled with all entries of N , ordered by H(N )
if N is not a leaf then {
/* parent filtering */
for each rout(Oi ) in N do
if |δ(P, Q) − δ(Oi , P )| > rQ + rOi then
f iltered[rout(Oi )] = true;
/* NN-graph filtering */
while SQ not empty
fetch rout(Si ) from the beginning of SQ
if not f iltered[rout(Si )] then
compute δ(Si , Q)
N F = FilterByNNGraph(N , Si , δ(Si , Q) , (Q, rQ ))
move all entries in SQ ∩ N F to the beginning of SQ
/* basic filtering */
if δ(Si , Q) ≤ rQ + rSi then
RangeQuery(ptr(T (Si )), (Q, rQ ), H)
} else {
/* parent filtering */
for each grnd(Oi ) in N do
if |δ(P, Q) − δ(Oi , P )| > rQ then
f iltered[grnd(Oi )] = true;
/* NN-graph filtering */
while SQ not empty
fetch grnd(Si ) from the beginning of SQ
if not f iltered[grnd(Si )] then
compute δ(Si , Q)
N F = FilterByNNGraph(N , Si , δ(Si , Q) , (Q, rQ ))
move all entries in SQ ∩ N F to the beginning of SQ
/* basic filtering */
if δ(Si , Q) ≤ rQ then
add Si to the query result
}
}
8. kNN query. The kNN algorithm is a bit more difficult, since the query radius
rQ is not known at the beginning of kNN search. In [7] the authors applied a
modification of the well-known heuristic (based on priority queue) to the M-tree.
Due to lack of space we omit the listing of kNN pseudocode, however, its form
can be easily derived from the original M-tree’s kNN algorithm and the M*-tree’s
range query implementation presented above (for the code see [11]).
Choosing the Sacrifices. The order in which individual entries are treated
as sacrifices is crucial for the algorithms’ efficiency. Virtually, all entries of a
node can serve as sacrifices, however, when a good sacrifice is chosen at the
beginning, many of others can be filtered out, so the node processing requires
less sacrifices (distance computations, actually). We propose several heuristic
functions H which order all the entries in a node, thus setting a priority in
which individual entries should serve as sacrifices when processing a query.
– hMaxRNNCount
An ordering based on the number of RNNs belonging to an entry ei ,
i.e. |RN N (N, ei )|.
Hypothesis: The more RNNs, the greater probability the sacrifice would filter
more entries.
– hMinRNNDistance
An ordering based on the entry’s closest NN or RNN,
i.e. min{δ(ei , Oi )}, ∀Oi ∈ RN N (N, ei ) ∪ N N (N, ei ).
Hypothesis: An entry close to (R)NN stands for a close pivot, so there is a
greater probability of effective filtering (following the closeness criterion).
– hMinToParentDistance
An ordering based on the entry’s to-parent distance.
Hypothesis: The lower to-parent distance, the greater probability that the
entry is close to the query; for such an entry the basic filtering is unavoidable,
so we can use it as a sacrifice sooner.
4.3 M*-tree Construction
The M*-tree construction consists of inserting new data objects into leaves, and
of splitting overflowed nodes (see Listing 3). When splitting, the NN-graphs
of the new nodes must be created from scratch. However, this does not imply
additional computation costs, since we use M-tree’s MinMaxRad splitting policy
(originally denoted mM UB RAD) which computes all the pairwise distances among
entries of the node being split (the MinMaxRad has been proved to perform the
best [7]). Thus, these distances are reused when building the NN-graphs.
The search for the target leaf can use NN-graph filtering as well (see List-
ing 4). In the first phase, a candidate node is tried to be found, which spatially
covers the inserted object. If more such candidates are found, the one with short-
est distance to the inserted object is chosen. In case no candidate was found in
the first phase, in the second phase the closest routing entry is determined. The
search for candidate nodes recursively continues until a leaf is reached.
9. Listing 3 (dynamic object insertion)
Insert(Object Oi ) {
let N be the root node
node = FindLeaf(N ,Oi )
store ground entry grnd(Oi ) in the the target leaf node
update radii in the insertion path
while node is overflowed then
/* split the node, create NN-graphs, produce two routing entries, and return the parent node */
node = Split(node)
insert (update) the two new routing entries into node
}
When inserting a new ground entry into a leaf (or a routing entry into an
inner node after splitting), the existing NN-graph has to be updated. Although
searching for the nearest neighbor of the new entry could utilize the NN-graph
filtering (in a similar way as in kNN query), a check whether the new entry
became the new nearest neighbor to some of the old entries would lead to com-
putation of all the distances needed for a NN-graph update. Thus, we consider
simple update of the NN-graph, where all the distances between the new entry
and the old entries are computed.
Listing 4 (navigating to the leaf for insertion)
Node FindLeaf(Node N , Object N ew) {
if N is a leaf node then
return N
let P be the parent routing object of N /* if N is root then δ(Oi , P )=δ(P, N ew)=0 */
let f iltered be an array of boolean flags, size of f iltered is |N |
set f iltered[entry(Oi )]=f alse, ∀entry(Oi ) ∈ N
let usedSacrif ices = ∅ be an empty set
let SQ be a queue filled with all entries of N , ordered by H(N )
set candidateEntry = null
minDist = ∞
while SQ not empty {
fetch rout(Si ) from the beginning of SQ
/* parent filtering */
if |δ(P, N ew) − δ(Si , P )| > minDist + rSi then
f iltered[rout(Si )] = true;
if not f iltered[rout(Si )] then {
compute δ(Si , N ew)
insert Si , δ(Si , N ew) into usedSacrif ices
/* NN-graph filtering */
NF = ∅
for each Sj in usedSacrif ices do
N F = N F ∪ FilterByNNGraph(N , Sj , δ(Sj , N ew) , (N ew,minDist))
move all entries in SQ ∩ N F to the beginning of SQ
if δ(Si , N ew) ≤ rSi and δ(Si , N ew) ≤ minDist then
candidateEntry = Si
minDist = δ(Si , N ew)
}
}
if candidateEntry = null then {
do the same as in the previous while cycle, the difference is just in the condition when updating
a candidate entry, which is relaxed to:
if δ(Si , N ew) ≤ minDist then
candidateEntry = Si
minDist = δ(Si , N ew)
}
return FindLeaf(ptr(T (candidateEntry)), N ew)
}
10. 4.4 Analysis of Computation Costs
The worst case time complexities of single object insertion into M*-tree as well
as of querying are the same as in the M-tree, i.e. O(log n) in case of insertion
and O(n) in case of querying, where n is the dataset size. Hence, we are rather
interested in typical reduction/increase of absolute computation costs3 exhibited
by M*-tree, with respect to the original M-tree.
M*-tree Construction Costs. When inserting a new object into M*-tree,
the navigation to the target leaf makes use of NN-graph filtering, so we achieve
faster navigation. However, for insertion into the leaf itself the update of leaf’s
NN-graph is needed, which takes m distance computations for M*-tree instead
of no computation for M-tree (where m is the maximum number of entries in
a leaf). On the other side, the expensive splitting of a node does not require
any additional distance computation, since all pairwise distances have to be
computed to partition the node, regardless of using M-tree or M*-tree.
M*-tree Querying Costs. The range search is always more efficient in M*-tree
than in M-tree, because only such entries are chosen as sacrifices which cannot be
filtered by the parent, so for them distance computation is unavoidable. On the
other side, due to NN-graph filtering some of the entries can be filtered before
they become a sacrifice, thus distance computations are reduced in this case.
5 PM*-tree
To show the benefits of NN-graph filtering also on other members of the M-tree
family, we have implemented PM*-tree, a NN-graph-enhanced extension of the
PM-tree (see the next section for a brief overview). The only structural extension
over the PM-tree are the NN-graphs in nodes, similarly like M*-tree extends the
M-tree.
5.1 PM-tree
The idea of PM-tree [10, 13] is to combine the hierarchy of M-tree with a set of
p global pivots. In a PM-tree’s routing entry the original M-tree-inherited ball
region is further cut off by a set of rings (centered in the global pivots), so the
region volume becomes smaller (see Figure 3a). Similarly, the PM-tree ground
entries are extended by distances to the pivots (which are also interpreted as rings
due to approximations). Each ring stored in a routing/ground entry represents
a distance range (bounding the underlying data) with respect to a particular
pivot. The combination of all the p entry’s ranges produces a p-dimensional
minimum bounding rectangle (MBR), hence, the global pivots actually map the
3
The NN-graph filtering is used just to reduce the computation costs, the I/O costs
are the same as in the case of M-tree.
11. metric regions/data into a ”pivot space” of dimensionality p (see Figure 3b).
The number of pivots can be defined separately for routing and ground entries
– we typically choose less pivots for ground entries to reduce storage costs.
Prior to the standard M-tree ”ball filtering” (either basic or parent filtering),
a query ball mapped into a hyper-cube in the pivot space is checked for an
overlap with routing/ground entry’s MBRs – if they do not overlap, the entry is
filtered out without a distance computation (otherwise needed in M-tree’s basic
filtering). Actually, the overlap check can be also understood as L∞ filtering (i.e.
if the L∞ distance from a region to the query object Q is greater than rQ , the
region is not overlapped by query).
Fig. 3. Hierarchy of PM-tree regions (using global pivots P1 , P2 ): (a) metric space view
(b) pivot space view. (c) L∞ distance from Q to MBRs of routing/ground entries.
In PM*-tree we suppose the following ordering of steps when trying to filter
a non-relevant node:
parent filtering → L∞ filtering → NN-filtering → basic filtering
5.2 Choosing the Sacrifices
The idea of L∞ filtering can be also used to propose a PM*-tree-specific heuristics
for choosing the sacrifice ordering (in addition to the ones proposed for M*-tree).
– hMinLmaxDistance
An ordering based on the minimum L∞ distance from Q to the entry’s MBR
(see Figure 3c).
Hypothesis: The smaller L∞ distance, the greater probability that also the
δ distance is small, so that the entry has to be filtered by basic filtering
(requiring a distance computation).
– hMaxLmaxDistance
An ordering based on the maximum L∞ distance from Q to the entry’s MBR.
Hypothesis: The greater L∞ distance, the greater probability that also the δ
distance is great, so the entry’s RNNs are far from the query and could be
filtered by NN-graph.
12. 6 Experimental results
To verify the impact of NN-graph filtering, we have performed extensive exper-
imentation with M*-tree and PM*-tree on three datasets. The sets of query ob-
jects were selected as 500 random objects from each dataset, while the query sets
were excluded from indexing. We have monitored the computation costs (num-
ber of distance computations) required to index the datasets, as well as costs
needed to answer range and kNN queries. Each query test unit consisted of 500
query objects and the results were averaged. The computation costs for querying
on PM(*)-tree do not include the external distance computations needed to map
the query into the pivot space (this overhead is equal to the number of pivots
used, and cannot be affected by filtering techniques).
The datasets were indexed for varying dimensionality, capacity of entries per
inner node, and the number of pivots (in case of PM(*)-tree). Unless otherwise
stated, the PM-tree and PM*-tree indexes were built using 64 pivots (64 in
inner nodes and 32 in leaf nodes). Moreover, although the M-tree and PM-tree
indexes are designed for database environments, in this paper we are interested
just in computation costs (because the NN-graph filtering cannot affect I/O
costs). Therefore, rather than fixing a disk page size used for storage of a node,
we specify the inner node capacity (maximum of entries stored within an inner
node – set to 50 in most experiments).
6.1 Corel dataset
The first set of experiments was performed on the Corel dataset [8], consisting of
65,615 feature vectors of images (we have used the color moments). As indexing
metric the L1 distance was used. Unless otherwise stated, we have indexed the
first 8 out of 32 dimensions. In Table 1 see the statistics obtained when index-
ing the Corel dataset – the numbers of distance computations needed to index
all the vectors and the index file sizes. The computation costs and index sizes
of M*-tree/PM*-tree are represented as percentual growth with respect to the
M-tree/PM-tree values.
index type construction costs index file size
M-tree 3,708,968 4.7MB
M*-tree +22% +25.5%
PM-tree(64,32) 18,522,252 8.8MB
PM*-tree(64,32) +25.6% +0%
Table 1. Corel indexing statistics.
In Figure 4a see the M-tree and M*-tree querying performance with respect
to increasing capacity of nodes. We can see the M-tree performance improves up
to the capacity of 30 entries, while M*-tree steadily improves up to the capacity
of 100 entries – here the M*-tree is by up to 45% more efficient than M-tree.
The most effective heuristic is the hMaxRNNCount.
The impact of increasing PM*-tree node capacities is presented in Figure 4b.
The PM*-tree with hMinLmaxDistance heuristic performs the best, while the
13. Fig. 4. 5NN queries depending on varying node capacity (a) M*-tree (b) PM*-tree. (c)
Range queries depending on increasing dataset dimensionality.
other heuristics are even outperformed by the PM-tree. The gain in efficiency is
not so significant as in case of M*-tree (5%).
In Figure 4c see a comparison of M-tree and M*-tree when querying datasets
of increasing dimensionality. Again, the hMaxRNNCount heuristic performed the
best, the efficiency gain of M*-tree was significant – about 40% on average.
6.2 Polygons dataset
The second set of experiments was carried out on the Polygon dataset, a syn-
thetic dataset consisting of 1,000,000 randomly generated 2D polygons, each
having 5-10 vertices. As the indexing metric the Hausdorff set distance was em-
ployed (where L2 distance was used as partial distance on vertices). In Table 2
see the statistics obtained for the Polygons indexes.
index type construction costs index file size
M-tree 70,534,350 148.4MB
M*-tree +12,1% +5%
PM-tree(64,32) 291,128,463 202.7MB
PM*-tree(64,32) +17% +0%
Table 2. Polygons indexing statistics.
In Figure 5a the M*-tree 1NN performance with respect to the increasing
dataset size is presented. To provide a better comparison, the computation costs
are represented in proportion of distance computations needed to perform full
sequential search on the dataset of given size. The efficiency gain of M*-tree is
about 30% on average.
The costs for kNN queries are shown in Figure 5b, we can observe the effi-
ciency gain ranges from 30% in case of 1NN query to 23% in case of 100NN query.
As usual, the M*-tree equipped with the hMaxRNNCount heuristic performed the
best.
The performance of 1NN queries on PM*-tree is presented in Figure 5c, con-
sidering increasing number of pivots used. The PM*-tree performance improve-
ment with respect to PM-tree is quite low, ranging from 15% (2 and 4 pivots,
heuristic hMaxLmaxDistance) to 6% (≥ 8 pivots, heuristic hMinLmaxDistance).
14. Fig. 5. (a) 1NN queries depending on increasing dataset size. (b) kNN queries (c) 1NN
queries depending on increasing number of pivots used in PM-tree.
6.3 GenBank dataset
The last dataset in experiments was created by sampling 250,000 strings of
protein sequences (of lengths 50-100) from the GenBank file rel147 [1]. The edit
distance was used to index the GenBank dataset. In Table 3 see the indexing
statistics obtained for the GenBank dataset.
index type construction costs index file size
M-tree 17,726,084 54.5M
M*-tree +38,9% +18,2%
PM-tree(64,32) 77,316,482 66.8MB
PM*-tree(64,32) +20.6% +0%
Table 3. GenBank indexing statistics.
In Figure 6 see the results for kNN queries on M*-tree and PM*-tree, re-
spectively. Note that the GenBank dataset is generally hard to index, the best
achieved results for 1NN by M*-tree and PM*-tree are 136,619 (111,086 respec-
tively) distance computations, i.e. an equivalent of about half of the sequen-
tial search. Nevertheless, in these hard conditions the M*-tree and PM*-tree
have outperformed the M-tree and PM-tree quite significantly, by up to 20%.
As in the previous experiments, the hMaxRNNCount heuristic on M*-tree and
hMinLmaxDistance heuristic on PM*-tree performed the best.
6.4 Summary
The construction costs and index file sizes of all M*-tree and PM*-tree indexes
exhibited an increase, ranging from 5% to 25%. For PM*-tree the increase in
index file size was negligible in all cases.
The results on the small-sized Corel dataset have shown the M*-tree can sig-
nificantly outperform the M-tree. However, the PM*-tree performed only slightly
better than PM-tree. We suppose the superior performance of PM-tree simply
gives only a little room for improvements. Note the dataset of size ≈ 65,000 can
15. Fig. 6. kNN queries (a) M-tree (b) PM-tree.
be searched by the PM-tree for less than 300 distance computation – this cor-
responds, for example, to six paths of length 3 in the PM-tree where 15 entries
per node must be filtered by basic filtering.
We suppose the extent of M*-tree/PM*-tree performance gain is related to
the intrinsic dimensionality [4] of the respective dataset. The high-dimensional
GenBank dataset is hard to index, so any ”help” by additional filtering tech-
niques (like the NN-graph filtering) would result in better pruning of index
subtrees. On the other side, when considering the low-dimensional Corel and
Polygons datasets, the PM-tree alone is extremely successful (up to 10x faster
than M-tree), so we cannot achieve a significant gain in performance.
7 Conclusions
We have proposed an extension of M-tree family, based on utilization of nearest-
neighbor graphs in tree nodes. We have shown the NN-graphs can be successfully
implemented into index structures designed for a kind of local-pivot filtering. The
improvement is based on using so-called ”sacrifices” – selected entries in the tree
node which serve as local pivots to all entries being reverse nearest neighbors
(RNNs) to a sacrifice. Since the distances from a sacrifice to its RNNs are pre-
computed in the NN-graph, we could prune several subtrees for just one distance
computation. We have proposed several heuristics on choosing the sacrifices, and
the modified insertion, range and kNN query algorithms.
7.1 Future Work
The properties of NN-graph filtering open possibilities for other applications.
Generally, the metric access methods based on compact partitioning using local
pivots could be extended by NN-graphs. In the future we would like to integrate
the NN-graph filtering into other metric access methods, as a supplement to
their own filtering techniques.
16. Acknowledgments
ˇ
This research has been supported in part by grants GACR 201/05/P036 provided
by the Czech Science Foundation, and ”Information Society program” grant
number 1ET100300419.
References
1. D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp, and D. L.
Wheeler. Genbank. Nucleic Acids Res, 28(1):15–18, January 2000.
2. B. Bustos, G. Navarro, and E. Ch´vez. Pivot selection techniques for proximity
a
searching in metric spaces. Pattern Recognition Letters, 24(14):2357–2366, 2003.
3. B. Bustos and T. Skopal. Dynamic Similarity Search in Multi-Metric Spaces. In
Proceedings of ACM Multimedia, MIR workshop, pages 137–146. ACM Press, 2006.
4. E. Ch´vez and G. Navarro. A Probabilistic Spell for the Curse of Dimensionality.
a
In ALENEX’01, LNCS 2153, pages 147–160. Springer, 2001.
5. P. Ciaccia and M. Patella. The M2 -tree: Processing Complex Multi-Feature Queries
with Just One Index. In DELOS Workshop: Information Seeking, Searching and
Querying in Digital Libraries, Zurich, Switzerland, June 2000.
6. P. Ciaccia and M. Patella. Searching in metric spaces with user-defined and ap-
proximate distances. ACM Database Systems, 27(4):398–437, 2002.
7. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access Method for
Similarity Search in Metric Spaces. In VLDB’97, pages 426–435, 1997.
8. S. Hettich and S. Bay. The UCI KDD archive [http://kdd.ics.uci.edu], 1999.
9. M. L. Mic´, J. Oncina, and E. Vidal. An algorithm for finding nearest neighbour
o
in constant average time with a linear space complexity. In Int. Cnf. on Pattern
Recog., 1992.
10. T. Skopal. Pivoting M-tree: A Metric Access Method for Efficient Similarity
Search. In Proceedings of the 4th annual workshop DATESO, Desn´, Czech Re-
a
public, ISBN 80-248-0457-3, also available at CEUR, Volume 98, ISSN 1613-0073,
http://www.ceur-ws.org/Vol-98, pages 21–31, 2004.
11. T. Skopal and D. Hoksza. Electronic supplement for this paper,
http://urtax.ms.mff.cuni.cz/skopal/pub/suppADBIS07.pdf, 2007.
12. T. Skopal, J. Pokorn´, M. Kr´tk´, and V. Sn´ˇel. Revisiting M-tree Building
y ay as
Principles. In ADBIS, Dresden, pages 148–162. LNCS 2798, Springer, 2003.
13. T. Skopal, J. Pokorn´, and V. Sn´ˇel. Nearest Neighbours Search using the PM-
y as
tree. In DASFAA ’05, Beijing, China, pages 803–815. LNCS 3453, Springer, 2005.
14. C. Traina Jr., A. Traina, B. Seeger, and C. Faloutsos. Slim-Trees: High perfor-
mance metric trees minimizing overlap between nodes. Lecture Notes in Computer
Science, 1777, 2000.
15. P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric
Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc.,
Secaucus, NJ, USA, 2005.
16. X. Zhou, G. Wang, J. Y. Xu, and G. Yu. M+ -tree: A New Dynamical Multidi-
mensional Index for Metric Spaces. In Proceedings of the Fourteenth Australasian
Database Conference - ADC’03, Adelaide, Australia, 2003.
17. X. Zhou, G. Wang, X. Zhou, and G. Yu. BM+-Tree: A Hyperplane-Based Index
Method for High-Dimensional Metric Spaces. In DASFAA, pages 398–409, 2005.