Prefix sum

Written by Benjamin Schmid. Translated by Benjamin Schmid.

In this article we will have a look at prefix sums. We will start with the one dimensional case and expand later to two dimensions. This trick is easy to use but can be used as a powerful building block for many problems.

Problem

What kind of problems can be solved with a prefix sum? We are given a table of numbers and we have to calculate the sum in a part of the table very often. Or in other words: Given an $M\times N$ rectangle with numbers. Calculate the sum of all numbers for $Q$ different subrectangles.

Let’s first have a look at the one dimensional case where we have a list with $N$ numbers. The first approach is to sum up all numbers for the queried range. Something similar to this:

for segment in segments: # Q Queries
  sum = 0
  for i in range(segment.start, segment.end+1):
    sum += list_of_numbers[i]
  print(sum)

This is too slow if we have many queries because each of the $Q$ queries takes up to $N$ operations. The algorithm runs in $\mathcal{O}(N\cdot Q)$

Solution

The trick is do more work at the beginning and precalculate some values. With those additional informations we can answer the queries faster. Instead of summing up all numbers in $\mathcal{O}(N)$ for each query we can then calculate it in $\mathcal{O}(1)$ . Let’s find out how this works.

The list $A$ with the elements $a_{0}$ , $a_1$ , $a_2$ , $a_3$ , …, $a_{N-1}$ contains the numbers given in the input. To calculate the sum in the range $x$ to $y$ we can sum up all numbers from the beginning up to $y$ and then subtract the numbers from the beginning to $x-1$ . If we have several ranges all ending at $y$ the first sum (from the beginning to $y$ ) will not change. Only the ones we subtract (beginning to $x-1$ ) will change. Something similar holds if $x$ is not changed but only $y$ . This looks very good to precalculate something!

We will precalculate a list $B$ such that $b_i$ corresponds to the sum of the first $i$ numbers in the list $A$ (thus the sum from the beginning up to and exluding $i$ ). Now we can use the proposed algorithm: the sum in the range $x$ to $y$ is exactly $b_{y+1} - b_{x}$ . Why does this work?

Let’s look for example at the range $3 - 6$ . The sum we want to calculate is $a_3 + a_4 + a_5 + a_6$ . Then we have $b_7 = a_0 + a_1 + a_2 + a_3 + a_4 + a_5 + a_6$ and $b_3 = a_0 + a_1 + a_2$ . If we compute $b_7 - b_3$ we get $b_7 - b_3 =$ $(a_0 + a_1 + a_2 + a_3 + a_4 + a_5 + a_6) - (a_0 + a_1 + a_2) =$ $a_3 + a_4 + a_5 + a_6$ . This is exactly the sum we are looking for. We can remove the elements we added “too much” by subtracting $b_3$ .

This can be implemented like this:

# precalculation
b = [0] * (len(a)+1)
# b[0] = 0
for i in range(1, len(a)+1):
  b[i] = b[i-1] + a[i-1]

# queries
for segment in segments:
    print(b[segment.end+1] - b[segment.start])

The precalculation can be done in $\mathcal{O}(N)$ and all queries in $\mathcal{O}(Q)$ (as we can do one query in constant time). We improved the original solution from $\mathcal{O}(N\cdot Q)$ to $\mathcal{O}(N+Q)$

Solution for two dimensions

We can extend this trick to two dimensions.

Let $A$ be the given rectangle and $B$ the precalculated rectangle. Which values do we need in $B$ to be able to calculate the sum of subrectangles efficiently? As we can see in the image above the yellow rectangle can be calculated with the four other rectangles (they represent the sum of the covered fields). Those four rectangles have their upper left corner in the upper left corner of $A$ . This is exactly the same as in the one dimensioal case where all ranges touched the left end.

In $B$ we calculate in each field the sum of the subrectangle from the upper left corner to the current field. This can be done efficiently with several approaches. One possibility is to walk through the rectangle row by row from left to right and set $b_{y,x} = b_{y-1,x} + b_{y,x-1} - b_{y-1,x-1} + a_{y-1,x-1}$ . The subtraction is necessary as we would count this subrectangle twice otherwise.

We can now calculate the sum of a subrectangle from $(i_x|i_y)$ to $(j_x|j_y)$ with the formula $b_{j_y+1,j_x+1} - b_{i_y,j_x+1} - b_{j_y+1,i_x} + b_{i_y,i_x}$ . As in the one dimensional case we take a range too big (going to $(j_x|j_y)$ ) and then subtract the things we added too much (the left and upper rectangle). Unfortunately we subtracted the range to $(i_x-1|i_y-1)$ twice so we have to add it again. Note that we used different formulas for precalculation and queries.

The algorithm can be implemented as follows:

# precalculation
b = [[0] * (width + 1)] * (height + 1)
for y in range(1, height+1):
  for x in range(1, width+1):
    b[y][x] = b[y-1][x] + b[y][x-1] - b[y-1][x-1] + a[y-1][x-1]

# queries
for segment in segments:
  print(b[segment.y2+1][segment.x2+1]
        - b[segment.y1][segment.x2+1]
        - b[segment.y2+1][segment.x1]
        + b[segment.y1][segment.x1])

Sample task

To wrap things up let’s look at a task we can solve with prefix sums.

To come up with a better weather forecast temperatures are measured on a regular basis. A grid will be placed on top of the map and in each field the temperature at this location is noted. To produce the forecast average temperatures of many different, rectangular subregions are needed.

We could for each query sum up all values in the region and divide by the number of fields. But we just learned a better approach: we calculate the prefix sums and can answer each query in constant time (i.e. it does not matter how large the region or the whole map is).

Often prefix sums are hidden a bit and they are only a building block for the complete solution. Therefore it is very helpful to keep them in mind while solving tasks.