Conclusions
• We present a new indexing scheme for the general purposes of similarity search on Earth Mover's Distance
• Our index method relies on the primal-dual theory to construct mapping functions from the origi
...
Conclusions
• We present a new indexing scheme for the general purposes of similarity search on Earth Mover's Distance
• Our index method relies on the primal-dual theory to construct mapping functions from the original probabilistic space to one-dimensional domain
• Our B+ tree-based index framework has
–High scalability
–High efficiency
–can handle High dimensional data
Outline
• Sources of HDD
• Challenges of HDD
• Searching and Mining Mixed Typed Data
–Similarity Function on k-n-match
–ItCompress
• Bregman Divergence: Towards Similarity Search on Non-metric Distance
• Earth Mover Distance: Similarity Search on Probabilistic Data
• Finding Patterns in High Dimensional Data
Sample/Row Enumeration Algorihtms
• To avoid searching the large column/item enumeration space, our mining algorithm search for patterms/rules in the sample/row enumeration space
• Our algorithms does not fitted into the column/item enumeration algorithms
• They are not YAARMA (Yet Another Association Rules Mining Algorithm)
• Column/item enumeration algorithms simply does not scale for microarray datasets
Tree Search based Algorithms
• Ullmann’s Algorithm (DFS)
• A refinement procedure based on matrix of possible future matched vertex pairs to prune unfruitful matches
• The simple enumeration algorithm for the isomorphisms between a graph G1 and a subgraph of another graph G2 with the adjacency matrices A1and A2
• An M’ matrix with |V1| rows and |V2 | columns can be used to permute the rows and columns of A2 to produce a further matrix P. If , then M’ specifies an isomorphism between G1 and the subgraph of G2.
(a1 i , j 1) ( pi, j 1)
P M '(M ' A )T
2
[Show More]