Query-Tree Matching Algorithm

This is a description of the algorithm searching for a query in the trees. For the sake of simplicity, a single-tree query is presumed.

The Algorithm

  1. Query Preprocessing (once for each query)

  2. File Preprocessing (once for each file)

  3. Matching heads (once for each file)

  4. Creating matching lists of tree-nodes for each query-node (once for each tree)

  5. Assembling a tree (once for each tree for which matching lists have been successfully created)

  6. Checking references (once for each assembled tree - a matching candidate)

Query Preprocessing

The global head is scanned first and positions of some attributes and all meta-attributes are stored (function scanGlobalHead)

The query tree(s) is preprocessed (function preprocessTrees) – some information about nodes is collected (depth_from_root, max_depth_to_leaf, nodes_in_subtree, depth_first_order, usage of meta-attribute _name, regular expressions are compiled)

File Preprocessing

The head is scanned and positions of some attributes (hide etc) are stored (function scanHead)

Matching Heads

The global head and the file head are matched (function matchHeads)

Creating matching lists of tree-nodes for each query-node

A deep-first list of query-nodes is created (function treeToTVS)

The tree is preprocessed (function preprocessTree) – some information about nodes is collected (depth_from_root, max_depth_to_leaf, nodes_in_subtree, depth_first_order)

For each query-node, a list of tree-nodes matching the query-node is created. References are ignored at this stage.

Assembling a tree

The algoritm starts from the nodes matching the query-root (function findSubtreesTVSList). It takes them one by one and tries to assemble the rest of the tree.

If all query-nodes have been processed, the assembled tree is a candidate for matching the query. All that remains is checking the references.

If not, the next node from the list of query-nodes is taken. They are ordered depth-first, therefore the query-father is always processed before all its query-sons. It tries to assemble a part of the tree from all tree-nodes matching this query-node one by one.

Checking references

In the assembled tree, all references are checked for validity (function isOKReferences). Number of occurrences at nodes that contain references is checked as well.