Iacono's working set structure

Iacono's working set data structure
Invented 2001
Invented by John Iacono
Asymptotic complexity
in big O notation
Space O(n)
Search O(log w(x))
Insert O(log n)
Delete O(log n)

In computer science, Iacono's working set structure[1] is a comparison based dictionary. It supports insertion, deletion and access operation to maintain a dynamic set of n {\displaystyle n} elements. The working set of an item x {\displaystyle x} is the set of elements that have been accessed in the structure since the last time that x {\displaystyle x} was accessed (or inserted if it was never accessed). Inserting and deleting in the working set structure takes O ( log n ) {\displaystyle O(\log n)} time while accessing an element x {\displaystyle x} takes O ( log w ( x ) ) {\displaystyle O(\log w(x))} . Here, w ( x ) {\displaystyle w(x)} represents the size of the working set of x {\displaystyle x} .

Structure

An example of a search for x {\displaystyle x} in the working set structure. After finding x {\displaystyle x} , it is removed from T 4 {\displaystyle T_{4}} and inserted into T 1 {\displaystyle T_{1}} . Finally, a shift from 1 to 4 is performed in which an element is removed from T i {\displaystyle T_{i}} and inserted into T i + 1 {\displaystyle T_{i+1}} for 1 i < 4 {\displaystyle 1\leq i<4} .

To store a dynamic set of n {\displaystyle n} elements, this structure consists of a series of Red–black trees, or other Self-balancing binary search trees T 1 , T 2 , , T k {\displaystyle T_{1},T_{2},\ldots ,T_{k}} , and a series of deques (Double-ended queues) Q 1 , Q 2 , Q k {\displaystyle Q_{1},Q_{2},\ldots Q_{k}} , where k = log log n {\displaystyle k=\lceil \log \log n\rceil } . For every 1 i k {\displaystyle 1\leq i\leq k} , tree T i {\displaystyle T_{i}} and deque Q i {\displaystyle Q_{i}} share the same contents and pointers are maintained between their corresponding elements. For every i < k {\displaystyle i<k} , the size of T i {\displaystyle T_{i}} and Q i {\displaystyle Q_{i}} is 2 2 i {\displaystyle 2^{2^{i}}} . Tree T k {\displaystyle T_{k}} and deque Q k {\displaystyle Q_{k}} consist of the remaining elements, i.e., their size is n i = 1 k 1 2 2 i {\displaystyle n-\sum _{i=1}^{k-1}2^{2^{i}}} . Therefore, the number of items in all trees and the number of elements in all deques both add up to n {\displaystyle n} . Every element that has been inserted in the data structure is stored in exactly one of the trees and its corresponding deque.

Working set Invariant

In the deques of this structure, elements are kept in sorted order according to their working set size. Formally, element x {\displaystyle x} lies after y {\displaystyle y} in deque Q i {\displaystyle Q_{i}} if and only if w ( x ) < w ( y ) {\displaystyle w(x)<w(y)} . Moreover, for every 1 i < k {\displaystyle 1\leq i<k} , the elements in deque Q i {\displaystyle Q_{i}} have a smaller working sets than the elements in deque Q i + 1 {\displaystyle Q_{i+1}} . This property is referred to as the Working set invariant. Every operation in the data structure maintains the Working set invariant.

Operations

The basic operation in this structure is called shift from h {\displaystyle h} to j {\displaystyle j} , where h {\displaystyle h} and j {\displaystyle j} are indices of some trees in the structure. Two cases are considered in a shift from h {\displaystyle h} to j {\displaystyle j} : If h < j {\displaystyle h<j} , then for every h i < j {\displaystyle h\leq i<j} , taken in increasing order, an item is dequeued from Q i {\displaystyle Q_{i}} and enqueued into Q i + 1 {\displaystyle Q_{i+1}} . The corresponding item is deleted from T i {\displaystyle T_{i}} and inserted into T i + 1 {\displaystyle T_{i+1}} . The running time of this operation is O ( i = h j log | T i | ) = O ( i = h j log 2 2 i ) = O ( 2 j ) {\displaystyle O(\sum _{i=h}^{j}\log |T_{i}|)=O(\sum _{i=h}^{j}\log 2^{2^{i}})=O(2^{j})} . Analogously, if j < h {\displaystyle j<h} , then for every j < i h {\displaystyle j<i\leq h} , taken in decreasing order, an item is dequeued from Q i {\displaystyle Q_{i}} and enqueued into Q i 1 {\displaystyle Q_{i-1}} . The corresponding item is deleted from T i {\displaystyle T_{i}} and inserted into T i 1 {\displaystyle T_{i-1}} . The running time of this operation is O ( i = j h log | T i | ) = O ( i = j h log 2 2 i ) = O ( 2 h ) {\displaystyle O(\sum _{i=j}^{h}\log |T_{i}|)=O(\sum _{i=j}^{h}\log 2^{2^{i}})=O(2^{h})} . Regardless of the case, after a shift operation, the size of T h {\displaystyle T_{h}} decreases by one whereas the size of T j {\displaystyle T_{j}} increases by one. Since that elements in the deques are sorted with respect to their working sets sizes, a shift operation maintains the Working set invariant.

To search for an element x {\displaystyle x} , search for x {\displaystyle x} in T 1 , T 2 , T k {\displaystyle T_{1},T_{2},\ldots T_{k}} , in increasing order, until finding a tree T j {\displaystyle T_{j}} containing x {\displaystyle x} . If no tree is found, the search is unsuccessful. If x {\displaystyle x} is found, it is deleted from T j {\displaystyle T_{j}} and then inserted into T 1 {\displaystyle T_{1}} , i.e., it is moved to the front of the structure. The search finishes by performing a shift from 1 {\displaystyle 1} to j {\displaystyle j} which restores the size of every tree and every deque to their size prior to the search operation. The running time of this search is O ( i = 1 j log 2 2 i ) = O ( 2 j ) {\displaystyle O(\sum _{i=1}^{j}\log 2^{2^{i}})=O(2^{j})} if the search was successful, or O ( i = j k log 2 2 i ) = O ( 2 k ) = O ( log n ) {\displaystyle O(\sum _{i=j}^{k}\log 2^{2^{i}})=O(2^{k})=O(\log n)} otherwise. By the Working set property, every element in trees T 1 , T 2 , , T j 1 {\displaystyle T_{1},T_{2},\ldots ,T_{j-1}} belongs to the working set of x {\displaystyle x} . In particular, every element in T j 1 {\displaystyle T_{j-1}} belongs to the working set of x {\displaystyle x} and hence, w ( x ) > | T j 1 | = 2 2 j 1 {\displaystyle w(x)>|T_{j-1}|=2^{2^{j-1}}} . Thus, the running time of a successful search is O ( 2 j ) = O ( log 2 2 j 1 ) = O ( log w ( x ) ) {\displaystyle O(2^{j})=O(\log 2^{2^{j-1}})=O(\log w(x))} .

Insert

Inserting an element x {\displaystyle x} into the structure is performed by inserting x {\displaystyle x} into T 1 {\displaystyle T_{1}} and enqueuing it into Q 1 {\displaystyle Q_{1}} . Insertion is completed by performing a shift from 1 {\displaystyle 1} to k {\displaystyle k} . To avoid overflow, if | T k | = 2 2 k {\displaystyle |T_{k}|=2^{2^{k}}} before the shift, i.e., if the last tree is full, then k {\displaystyle k} is incremented and a new empty T k {\displaystyle T_{k}} and Q k {\displaystyle Q_{k}} is created. The running time of this operation is dominated by the shift from 1 {\displaystyle 1} to k {\displaystyle k} whose running time is O ( 2 k ) = O ( 2 log log n ) = O ( log n ) {\displaystyle O(2^{k})=O(2^{\log \log n})=O(\log n)} . Since element x {\displaystyle x} , whose working set is the smallest, is enqueued in Q 1 {\displaystyle Q_{1}} , the Working set invariant is preserved after the shift.

Delete

Deleting an element x {\displaystyle x} is done by searching for x {\displaystyle x} on each tree in the structure, in increasing order, until finding a tree T j {\displaystyle T_{j}} that contains it (if non is found the deletion is unsuccessful). Item x {\displaystyle x} is deleted from T j {\displaystyle T_{j}} and Q j {\displaystyle Q_{j}} . Finally, a shift from k {\displaystyle k} to j {\displaystyle j} maintains the size of T j {\displaystyle T_{j}} equal to 2 2 j {\displaystyle 2^{2^{j}}} . The running time of this operation is O ( 2 k ) = O ( log n ) {\displaystyle O(2^{k})=O(\log n)} . The working set invariant is preserved as deleting an element does not change the order of the working set of the elements.

Discussion

Splay trees are self adjusting search trees introduced by Sleator and Tarjan[2] in 1985. Using restructuring heuristic, splay trees are able to achieve insert and delete operations in O ( log n ) {\displaystyle O(\log n)} amortized time, without storing any balance information at the nodes. Moreover, the Working Set Theorem for splay trees states that the cost to access an element in a splay tree is O ( log w ( x ) ) {\displaystyle O(\log w(x))} amortized. Iacono's workings set structure obtains the same running time for search, insert and delete in the worst-case. Therefore, offering an alternative to splay trees.

References

  1. ^ Iacono, John (2001). "Alternatives to splay trees with O(log n) worst-case access times" (PDF). Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms: 516–522.
  2. ^ Sleator, Daniel D.; Tarjan, Robert E. (1985), "Self-Adjusting Binary Search Trees" (PDF), Journal of the ACM, 32 (3): 652–686, doi:10.1145/3828.3835