Keeping vectors efficient

If you are a programmer, you probably know such data structures as java.util.ArrayList of Java and/or std::vector of C++. Both are simple arrays under the hood, and whenever there is no room for appending a new element, the implementation creates a larger array, copies the contents of the current array to it, deletes the current array, sets the newly created array as current, and, finally, appends the element normally. Basically, as long as there is room for appending, the running time of such an operation is constant. Obviously, if there is no room, the running time is \Theta(n). This is by no means informative, and we would like to understand the efficiency of such a data structure better. Now, let amortized analysis come to rescue. The idea behind amortized analysis is that you sum the costs of operations appearing in a sequence, and divide the sum by the number of operations in the sequence. Formally, if the cost sequence of an operation sequence is \langle c_1, c_2, \dots, c_n \rangle, the amortized running time is simply the average of them:

\displaystyle \frac{1}{n} \sum_{i = 1}^n c_i.

Now, we have two schemes for expanding the underlying array:

  1. arithmetic: choose a positive integer d and every time you have to expand the array, make it d array components longer;
  2. geometric: choose a real number q > 1, and every time you have to expand the array of length N, make it \lfloor qN \rfloor array components long. (If N_0 is the initial array length, we require \lfloor qN_0 \rfloor > N_0.)

Arithmetic scheme

Suppose the initial array capacity is m and d > 0 is given. Also we are given a sequence of appending operations of length n > m > 0. Now the total work is

\displaystyle \begin{aligned} C(n, m, d) &= \overbrace{\sum_{k = 0}^{\lceil \frac{n - m}{d} \rceil}(kd + m)}^A - \overbrace{\Bigg( m + \Bigg\lceil \frac{n - m}{d} \Bigg\rceil d - n \Bigg)}^R \\            &= \sum_{k=0}^{\lceil \frac{n-m}{d} \rceil}(kd + m) + n - m - \Bigg\lceil \frac{n - m}{d} \Bigg\rceil d \\            &= d \sum_{k = 1}^{\lceil \frac{n - m}{d} \rceil}  k + m \Bigg( \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + 1 \Bigg) + n - m - d \Bigg\lceil \frac{n - m}{d} \Bigg\rceil \\            &= \frac{d}{2} \Bigg[ \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + 1\Bigg] \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + m \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + n - d \Bigg\lceil \frac{n - m}{d} \Bigg\rceil \\            &= \frac{d}{2} \Bigg\lceil \frac{n - m}{d} \Bigg\rceil^2 + \Bigg(m - \frac{d}{2}\Bigg) \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + n. \end{aligned}

Above, A is the total work of expanding and filling the array sufficiently many times in order to accommodate n elements, and R denotes the number of elements we could have put before expanding once again.

Now, let us show that the amortized cost is linear in n. We start from the lower bound:

\displaystyle \begin{aligned} \frac{1}{n} C(n, m, d) &= \frac{1}{n} \Bigg[ \frac{d}{2} \Bigg\lceil \frac{n - m}{d} \Bigg\rceil^2 + \Bigg( m - \frac{d}{2} \Bigg) \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + n \Bigg] \\    & \geq \frac{d}{2n} \Bigg( \frac{n - m}{d} \Bigg)^2 + \frac{1}{n} \Bigg( m - \frac{d}{2} \Bigg) \Bigg( \frac{n - m}{d} \Bigg) + 1 \\    &= \frac{d}{2n} \Bigg( \frac{n^2 - 2nm + m^2}{d^2} \Bigg) + \frac{1}{nd} \Bigg( m - \frac{d}{2} \Bigg) \Bigg(n - m\Bigg) + 1 \\    &=  \frac{1}{2nd} \Bigg(n^2 - 2nm + m^2\Bigg) + \frac{1}{nd}\Bigg( nm - m^2 - \frac{nd}{2} + \frac{md}{2} \Bigg) + 1 \\    &= \frac{1}{2nd} \Bigg( n^2 - 2nm + m^2 \Bigg) + \frac{1}{2nd} \Bigg( 2nm - 2m^2 - nd + md \Bigg) + 1 \\    &= \frac{1}{2nd} \Bigg(n^2 - m^2 - nd + md + 2nd\Bigg) \\    &= \frac{1}{2nd} \Bigg( n^2 - m^2 + nd + md \Bigg) \\    &\geq \frac{n^2 - m^2}{2nd} \\    &= \frac{n}{2d} - \frac{m^2}{2nd} \\    &\overset{n > m}{>} \frac{n}{2d} - \frac{mn}{2nd}\\    &= \frac{n}{2d} - \frac{m}{2d} > 0. \end{aligned}

Next, we need to show that there exist c_1 > 0 and n_0^L \in \mathbb{N} \setminus \{0\} such that c_1 n \leq \frac{n}{2d} - \frac{m}{2d} for all n \geq n_0^L. Let us choose c_1 = \frac{1}{3d}. Now we have

\displaystyle \begin{aligned} \frac{1}{3d}n &\leq \frac{1}{2d}n - \frac{1}{2d}m \\ \frac{1}{3}n &\leq \frac{1}{2}n - \frac{1}{2}m \\ \frac{1}{6}n &\geq \frac{1}{2} m \\ n &\geq 3m = n_0^L. \end{aligned}

Since m is a constant, so is n_0^L. Next, let us derive an upper bound:

\displaystyle \begin{aligned} \frac{1}{n}C(n, m, d) &= \frac{1}{n} \Bigg[ \frac{d}{2} \Bigg\lceil \frac{n - m}{d} \Bigg\rceil^2 + \Bigg(m - \frac{d}{2}\Bigg) \Bigg\lceil \frac{n - m}{d} \Bigg\rceil + n\Bigg] \\     &\leq \frac{1}{n} \Bigg[ \frac{d}{2} \Bigg\lceil \frac{n}{d} \Bigg\rceil^2 + m \Bigg\lceil \frac{n}{d} \Bigg\rceil + n \Bigg] \\     &\leq \frac{1}{n} \Bigg[ \frac{d}{2} \Bigg( \frac{n}{d} + 1 \Bigg)^2 + m \Bigg( \frac{n}{d} + 1 \Bigg) + n \Bigg] \\     &= \frac{1}{n} \Bigg[ \frac{d}{2} \Bigg( \frac{n^2}{d^2} + \frac{2n}{d} + 1 \Bigg) + \frac{nm}{d} + m + n \Bigg] \\     &= \frac{1}{n} \Bigg[ \frac{n^2}{2d} + n + \frac{d}{2} + \frac{2nm}{2d} + m + n \Bigg] \\     &= \frac{n}{2d} + 1 + \frac{d}{2n} + \frac{m}{d} + \frac{m}{n} + 1 \\     &= \frac{n}{2d} + 2 + \overbrace{\frac{d}{2n}}^{< 1} + \frac{m}{d} + \overbrace{\frac{m}{n}}^{<1} \\     &< \frac{n}{2d} + 4 + \frac{m}{d} \\     &< \frac{n}{2d} + 4 + \frac{n}{d} \\     &= \frac{n}{2d} + \frac{2n}{2d} + 4 \\     &= \frac{3n}{2d} + 4. \end{aligned}

At this point we choose c_2 = 2, which leads to

\displaystyle \begin{aligned} \frac{3n}{2d} + 4 &\leq 2n \\ 3n + 8d &\leq 4dn \\ 4dn - 3n &\geq 8d \\ (4d-3)n &\geq 8d \\ n &\geq \frac{8d}{4d - 3} = n_0^R. \end{aligned}

If we choose n_0 = \max(n_0^L, n_0^R), we have that c_1 n \leq \frac{1}{n}C(n,m,d) \leq c_2 n for all n \geq n_0, which is the very definition of \Theta, and so, we have that \frac{1}{n}C(n,m,d) = \Theta(n).

Next, we will show that appending with geometric strategy runs in \Theta(1) amortized time. If we manage to show that the upper bound is constant, we can omit proving the lower bound. Suppose we start with the empty array of capacity m > 0. Also, we are given the expansion factor q > 1. As a slight technicality, we must choose q such that \lfloor qm \rfloor > m; otherwise, the very first expansion will have no effect, and, thus, by induction, none of them. Assuming the same unit costs for copying and appending elements like in the arithmetic strategy, the total work of appending n > m elements is

\displaystyle W(n, m, q) = m + mq + mq^2 + \cdots + mq^k.

We require k to be the smallest integer such that mq^k \geq n, which leads us to the following inequalities:

\displaystyle \begin{aligned} mq^k &\geq n \\ q^k &\geq \frac{n}{m} \\ \log_q q^k &\geq \log_q \Bigg( \frac{n}{m} \Bigg) \\ k &\geq \log_q n - \log_q m. \end{aligned}

Since k is required to be the smallest integer satisfying the above inequality, we can set k = \lceil \log_q n - \log_q m \rceil. Also,

\displaystyle \begin{aligned} qW(n, m, q) = mq + mq^2 + \cdots + mq^{k + 1} &\Longrightarrow W(n, m, q) - q W(n, m, q) = m(1 - q^{k + 1}) \\     &\Longrightarrow W(n, m, q) = m \frac{1 - q^{k + 1}}{1 - q}. \end{aligned}

Since k = \lceil \log_q n - \log_q m \rceil, we have

\displaystyle  \begin{aligned} W(n, m, q) &= m \frac{1 - q^{\lceil \log_q n - \log_q m \rceil + 1}}{1 - q} \\            &\leq m \frac{1 - q \cdot q^{\lceil \log_q n \rceil}}{1 - q} \\            &\leq m  \frac{1 - q \cdot q^{\log_q n + 1}}{1 - q} \\            &= m  \frac{1 - q^2n}{1 - q}. \end{aligned}

Now we have that

\displaystyle \begin{aligned} \frac{1}{n} W(n,m,q) &\leq \frac{1}{n} \Bigg[ m \frac{1 - q^2 n}{1 - q} \Bigg] \\                      &= \frac{1}{n} \Bigg[ \frac{m}{1 - q} - \frac{nmq^2}{1 - q} \Bigg] \\    &= \frac{m}{(1 - q)n} - \frac{mq^2}{1 - q} \\    &\leq  \frac{m}{1 - q} -  \frac{mq^2}{1 - q} \\    &=  \frac{m(1 - q^2)}{1 - q}, \end{aligned}

which is constant since m and q are constants. Hence, appending an element to an array-based data structure using geometric array expansion strategy runs in amortized constant time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s