스파스 테이블 ( Sparse table )

티스토리 뷰

범위 쿼리 처리 ( Range Query )

스파스 테이블 ( Sparse table )

RiKang 2017. 11. 2. 20:26

Table of Contents

1. 개요

스파스 테이블은 배열 Array[1...N], 함수 Function, 범위 [L, R] 이 주어질 때 Function(Array[L], Array[L+1], ..., Array[R]) 계산을 빠르게 하기 위한 자료구조입니다. 간단하게 예를 들면, 'A[3] ~ A[7] 사이의 최소값을 구하라.'같은 식입니다. 여기에서 함수는 최소값 찾기, L = 3, R = 7이 되겠지요.

스파스 테이블은 아래의 두 조건이 성립할 때 사용할 수 있습니다.

조건 1) Array에 저장된 값이 변하지 않야야 한다.

조건 2) Function은 결합 법칙이 성립해야 한다. (즉, F(a,b,c) = F(F(a,b),C) = F(a,F(b,c))가 성립해야 한다.)

이 조건들이 만족 되면 Sparse table 구성을 O(N log N), 쿼리 1개 처리를 O(log N)만에 할 수 있습니다. Function 에 따라서는, 쿼리 처리를 O(1) 만에 할 수도 있습니다. (쿼리 처리를 O(1)까지 줄일 수 있는 함수들은 대표적으로 F(a,b,c) = F(F(a,b),F(b,c)) 가 성립하는 함수들, 즉, min 함수와 max 함수가 있습니다.)

이 O(1) 이라는 시간 복잡도 때문에 Segment Tree 대신 Sparse table 을 이용해야만 하는 상황이 발생하기도 합니다. Segment tree 와 비교해 보면 '조건 1' 때문에 범용성이 떨어지는 게 단점, 코딩의 간편함(?)과 조금 더 빠른 쿼리 처리(특히, O(1)인 경우)가 장점입니다.

2. 1D sparse table

대부분의 Array 와 Function에 대해, Sparse table 은 아래와 같이 정의할 수 있습니다.

이 Table 은 간단한 동적 계획법으로 구할 수 있습니다.

위 두 식은 Table 의 정의에 따른 것이므로 자명하고, Function은 결합 법칙이 성립하는 함수이기 때문에

이와 같은 식이 성립하게 되지요. 그러므로 아래와 같이 O(N log N) 만에 Table 구성을 할 수 있습니다.

1
2
3
4
5

for i = 0 ... N-1
    table[i][0] = arr[i]
for j = 1 ... log_N
    for i = 0 ... N-2^j
        table[i][j] = function(table[i][j-1], table[i+2^(j-1)][j-1]);

cs

이제 Table 을 이용해 범위 [L,R] 이 주어졌을 때 Function(Array[L], Array[L+1], ..., Array[R]) 을 구하는 방법을 알아보겠습니다. 이를 위해 2^j <= R-L+1 (범위의 길이) 을 만족하는 최댓값 j 를 생각해 봐야 합니다. 그러면 table 의 정의에 의해

Function(Array[L], Array[L+1], ..., Array[R]) = Function( table[L][j], Array[L+2^j], Array(L+2^j+1), ..., Array[R]) 가 됩니다.

L부터 L+2^j-1 구간, 즉, 앞의 2^j 길이 구간을 table[L][j] 로 한 번에 계산할 수 있는 것입니다.

table[L][j]는 미리 구했기 때문에, 이제 Function( Array[L+2^j], Array(L+2^j+1), ..., Array[R]) 만 구하면 되는데, 이는 재귀적으로 위의 과정을 반복해서 구하면 됩니다. 2^j' <= R-(L+2^j)+1 (남은 범위의 길이) 을 만족하는 j'를 다시 찾는 것이지요.

여기에서 남은 범위의 길이가 2^j 미만이라는 것은 쉽게 알 수 있는데, ( 증명 법 : j의 정의에 의해 2^(j+1) > R - L + 1 -> 2^j > R - (L+2^j) + 1 ) 이런 점 때문에 새로운 범위에 대한 j, 즉 j'를 구할 때 ( 2^j' <= R - (L+2^j) + 1 를 만족하는 최댓값 j), log N 부터 따져 볼 필요없이 이전 범위에서의 j 부터 따져 보면 됩니다. 이를 이용해 아래와 같이 코딩하면 O(log N) 만에 쿼리를 처리할 수 있습니다.

1
2
3
4

for j = log_N ... 0
    if 2^j <= R-L+1
                answer = function(answer, table[L][j]);
                L = L + 2^j;

cs

위의 두 가지를 종합하면 아래와 같이 구현할 수 있습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

// code by RiKang, weeklyps.com
typedef int node_type;
 
namespace MIN_ver{
    node_type INIT = 1000000000;
    node_type node_calc(node_type n1, node_type n2){return min(n1,n2);}
}
 
namespace MAX_ver{
    node_type INIT = 0;
    node_type node_calc(node_type n1, node_type n2){return max(n1,n2);}
}
 
using namespace MAX_ver;
 
const int N = 1005, log_N = 11;
 
struct sparse_table{
    int tn;
    node_type table[N][log_N];
    node_type arr[N];
    void init(int leng){
        tn = leng;
        for(int i=0; i<tn; i++)
            table[i][0] = arr[i];
        for(int j=1; j<log_N; j++)
            for(int i=0; i<tn-(1<<j); i++)
                table[i][j] = node_calc(table[i][j-1], table[i+(1<<(j-1))][j-1]);
    }
    node_type query(int L, int R){
        node_type ans=INIT;
        for(int j = log_N-1; j>=0; j--)
            if((1<<j)<=R-L+1){
                ans = node_calc(ans, table[L][j]);
                L += (1<<j);
            }
        return ans;
    }
}T;
Colored by Color Scripter

cs

이 Sparse table 의 응용 버전(?)으로는 트리에서 공통 조상 찾기 (LCA) 가 대표적입니다.

3. 쿼리를 O(1)으로 처리하기

min 이나 max 함수같이 F(a,b,c) = F(F(a,b), F(b,c)) 가 성립한다고 해보겠습니다. 그러면 2^j <= R-L+1 (범위의 길이) 을 만족하는 최댓값 j 를 찾았을 때 아래와 같은 식이 성립합니다.

table[L][j] 이 L ~ L+2^j-1, table[R-2^j+1][j] 가 R-2^j+1 ~ R 을 커버하고 있는데, 이 두 구간이 L~R 전체 구간을 커버하고 있음은 쉽게 증명할 수 있습니다. ( * (2^j)+(2^j) > R-L+1 )

또한 F(a,b,c) = F(F(a,b), F(b,c)) 에 의해 가운데에 중복된 구간도 문제되지 않습니다. 전처리로 모든 길이에 대해 j 를 미리 구해 놓으면 j를 바로 알 수 있으므로 j를 구하는 것도 O(1)에 가능합니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

// code by RiKang, weeklyps.com
typedef int node_type;
 
namespace MIN_ver{
    node_type INIT = 1000000000;
    node_type node_calc(node_type n1, node_type n2){return min(n1,n2);}
}
 
namespace MAX_ver{
    node_type INIT = 0;
    node_type node_calc(node_type n1, node_type n2){return max(n1,n2);}
}
 
using namespace MAX_ver;
 
const int N = 1005, log_N = 11;
 
struct sparse_table{
    int tn;
    node_type table[N][log_N];
    node_type arr[N];
    int c[N];
    void init(int leng){
        tn = leng;
        for(int i=0; i<tn; i++)
            table[i][0] = arr[i];
        for(int j=1; j<log_N; j++)
            for(int i=0; i<tn-(1<<j); i++)
                table[i][j] = node_calc(table[i][j-1], table[i+(1<<(j-1))][j-1]);
        c[0] = 0;
        for(int i=1; i<=tn; i++){
            c[i] = c[i-1];
            while((1<<c[i])<=i) c[i]++;
            c[i]--;
        }
    }
    node_type query(int L, int R){
        int c1 = c[R-L+1];
        return node_calc(table[L][c1],table[R-(1<<c1)+1][c1]);
    }
}T;
Colored by Color Scripter

cs

4. 2D sparse table

Array[N][M] 에 대해 직사각형 형태의 쿼리가 들어오면, 그 안의 값들을 모두 Function 한 값을 찾는 것이 2D sparse table의 과제입니다. 이를 위해선 1차원과 같은 원리로

Table[i][j][di][dj] = Function( (i, j) ~ ( i + 2^di - 1, j + 2^dj - 1) ) 을 구하면 됩니다.

di = 0 일 때 : N개의 1D sparse table 을 만드는 것과 동일한 과정입니다.

di > 0 일 때 : j 와 dj를 상수로 보면 이 것도 1D sparse table 을 만드는 것과 동일한 과정입니다.

테이블 구성 시간 복잡도는 O( N M log N log M), 쿼리는 O(1) 입니다. 쿼리가 O(1)이 안되는 경우엔 2D sparse table 을 쓸 이유가 딱히 없기에 생략하겠습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

// code by RiKang, weeklyps.com
typedef short node_type;
 
namespace MIN_ver{
    node_type INIT = 10000;
    node_type node_calc(node_type n1, node_type n2){return min(n1,n2);}
}
 
namespace MAX_ver{
    node_type INIT = 0;
    node_type node_calc(node_type n1, node_type n2){return max(n1,n2);}
}
 
using namespace MAX_ver;
 
const int N = 1005, log_N = 11, M = 1005, log_M = 11;
 
struct sparse_table{ // 0-based
    int n1,m1;
    node_type T[N][M][log_N][log_M];
    node_type res[N][M]; // res : 기존 배열
    int c[N+M];
    void init(int n2,int m2){
        n1=n2,m1=m2;
    c[0]=0;
        for(int i=1; i<N+M; i++){
            c[i] = c[i-1];
            while((1<<c[i])<=i) c[i]++;
            c[i]--;
        }
        for(int i=0; i<n1; i++)
            for(int j=0; j<m1; j++)
                T[i][j][0][0] = res[i][j];
        for(int di=0; di<log_N; di++){
            if(di==0){
                for(int dj=1; dj<log_M; dj++)
                    for(int i=0; i<n1; i++)
                        for(int j=0; j+(1<<dj)<=m1; j++)
                            T[i][j][di][dj] = node_calc(T[i][j][di][dj-1], T[i][j+(1<<(dj-1))][di][dj-1]);
            }
            else{
                for(int dj=0; dj<log_M; dj++)
                    for(int i=0; i+(1<<di)<=n1; i++)
                        for(int j=0; j+(1<<dj)<=m1; j++)
                            T[i][j][di][dj] = node_calc(T[i][j][di-1][dj], T[i+(1<<(di-1))][j][di-1][dj]);
            }
        }
    }
    node_type query(int x1, int y1,int x2, int y2){
        if(x1>x2 || y1>y2) return INIT;
        int d1 = c[x2-x1+1];
        int d2 = c[y2-y1+1];
        int x3 = x2-(1<<d1)+1;
        int y3 = y2-(1<<d2)+1;
        node_type ret = T[x1][y1][d1][d2];
        ret = node_calc(ret, T[x1][y3][d1][d2]);
        ret = node_calc(ret, T[x3][y1][d1][d2]);
        ret = node_calc(ret, T[x3][y3][d1][d2]);
        return ret;
    }
}table;
Colored by Color Scripter

cs