Prefix sum

Written by Benjamin Schmid. Translated by Benjamin Schmid.

In this article we will have a look at prefix sums. We will start with the one dimensional case and expand later to two dimensions. This trick is easy to use but can be used as a powerful building block for many problems.

Problem

What kind of problems can be solved with a prefix sum? We are given a table of numbers and we have to calculate the sum in a part of the table very often. Or in other words: Given an M×NM\times N rectangle with numbers. Calculate the sum of all numbers for QQ different subrectangles.

Let’s first have a look at the one dimensional case where we have a list with NN numbers. The first approach is to sum up all numbers for the queried range. Something similar to this:

for segment in segments: # Q Queries
  sum = 0
  for i in range(segment.start, segment.end+1):
    sum += list_of_numbers[i]
  print(sum)

This is too slow if we have many queries because each of the QQ queries takes up to NN operations. The algorithm runs in O(NQ)\mathcal{O}(N\cdot Q)

Solution

The trick is do more work at the beginning and precalculate some values. With those additional informations we can answer the queries faster. Instead of summing up all numbers in O(N)\mathcal{O}(N) for each query we can then calculate it in O(1)\mathcal{O}(1). Let’s find out how this works.

The list AA with the elements a0a_{0}, a1a_1, a2a_2, a3a_3, …, aN1a_{N-1} contains the numbers given in the input. To calculate the sum in the range xx to yy we can sum up all numbers from the beginning up to yy and then subtract the numbers from the beginning to x1x-1. If we have several ranges all ending at yy the first sum (from the beginning to yy) will not change. Only the ones we subtract (beginning to x1x-1) will change. Something similar holds if xx is not changed but only yy. This looks very good to precalculate something!

We will precalculate a list BB such that bib_i corresponds to the sum of the first ii numbers in the list AA (thus the sum from the beginning up to and exluding ii). Now we can use the proposed algorithm: the sum in the range xx to yy is exactly by+1bxb_{y+1} - b_{x}. Why does this work?

Let’s look for example at the range 363 - 6. The sum we want to calculate is a3+a4+a5+a6a_3 + a_4 + a_5 + a_6. Then we have b7=a0+a1+a2+a3+a4+a5+a6b_7 = a_0 + a_1 + a_2 + a_3 + a_4 + a_5 + a_6 and b3=a0+a1+a2b_3 = a_0 + a_1 + a_2. If we compute b7b3b_7 - b_3 we get b7b3=b_7 - b_3 = (a0+a1+a2+a3+a4+a5+a6)(a0+a1+a2)=(a_0 + a_1 + a_2 + a_3 + a_4 + a_5 + a_6) - (a_0 + a_1 + a_2) = a3+a4+a5+a6a_3 + a_4 + a_5 + a_6. This is exactly the sum we are looking for. We can remove the elements we added “too much” by subtracting b3b_3.

This can be implemented like this:

# precalculation
b = [0] * (len(a)+1)
# b[0] = 0
for i in range(1, len(a)+1):
  b[i] = b[i-1] + a[i-1]

# queries
for segment in segments:
    print(b[segment.end+1] - b[segment.start])

The precalculation can be done in O(N)\mathcal{O}(N) and all queries in O(Q)\mathcal{O}(Q) (as we can do one query in constant time). We improved the original solution from O(NQ)\mathcal{O}(N\cdot Q) to O(N+Q)\mathcal{O}(N+Q)

Solution for two dimensions

We can extend this trick to two dimensions.

Let AA be the given rectangle and BB the precalculated rectangle. Which values do we need in BB to be able to calculate the sum of subrectangles efficiently? As we can see in the image above the yellow rectangle can be calculated with the four other rectangles (they represent the sum of the covered fields). Those four rectangles have their upper left corner in the upper left corner of AA. This is exactly the same as in the one dimensioal case where all ranges touched the left end.

In BB we calculate in each field the sum of the subrectangle from the upper left corner to the current field. This can be done efficiently with several approaches. One possibility is to walk through the rectangle row by row from left to right and set by,x=by1,x+by,x1by1,x1+ay1,x1b_{y,x} = b_{y-1,x} + b_{y,x-1} - b_{y-1,x-1} + a_{y-1,x-1}. The subtraction is necessary as we would count this subrectangle twice otherwise.

We can now calculate the sum of a subrectangle from (ixiy)(i_x|i_y) to (jxjy)(j_x|j_y) with the formula bjy+1,jx+1biy,jx+1bjy+1,ix+biy,ixb_{j_y+1,j_x+1} - b_{i_y,j_x+1} - b_{j_y+1,i_x} + b_{i_y,i_x}. As in the one dimensional case we take a range too big (going to (jxjy)(j_x|j_y)) and then subtract the things we added too much (the left and upper rectangle). Unfortunately we subtracted the range to (ix1iy1)(i_x-1|i_y-1) twice so we have to add it again. Note that we used different formulas for precalculation and queries.

The algorithm can be implemented as follows:

# precalculation
b = [[0] * (width + 1)] * (height + 1)
for y in range(1, height+1):
  for x in range(1, width+1):
    b[y][x] = b[y-1][x] + b[y][x-1] - b[y-1][x-1] + a[y-1][x-1]

# queries
for segment in segments:
  print(b[segment.y2+1][segment.x2+1]
        - b[segment.y1][segment.x2+1]
        - b[segment.y2+1][segment.x1]
        + b[segment.y1][segment.x1])

Sample task

To wrap things up let’s look at a task we can solve with prefix sums.

To come up with a better weather forecast temperatures are measured on a regular basis. A grid will be placed on top of the map and in each field the temperature at this location is noted. To produce the forecast average temperatures of many different, rectangular subregions are needed.

We could for each query sum up all values in the region and divide by the number of fields. But we just learned a better approach: we calculate the prefix sums and can answer each query in constant time (i.e. it does not matter how large the region or the whole map is).

Often prefix sums are hidden a bit and they are only a building block for the complete solution. Therefore it is very helpful to keep them in mind while solving tasks.