트라이 (Trie)

Updated: March 27, 2021

트라이 이해

트라이(Trie)는 문자열 집합을 표현하는 아래 그림과 같은 자료구조이다.

출처 : https://aerocode.net/377

기본적으로 K진 트리구조를 이루고 있으며 문자열의 prefix를 빠르게 찾을 수 있다.

그러나 많은 공간이 필요하다는 단점이 존재한다. 예를들어 알파벳의 경우 총 26자리의 배열을 각 노드마다 확보해야 한다.

따라서 공간 복잡도는 O(포인터 크기 * 포인터 배열 개수 * 트라이 노드 개수)가 된다.

트라이 구현

#include <iostream>
#include <vector>
#include <cstring>
using namespace std;

const int ALPHABETS = 26;
int toNumber(char ch) { return ch-'A'; }

// 트라이의 한 노드를 나타내는 객체
struct TrieNode {
    TrieNode* children[ALPHABETS];

    // 이 노드가 종료 노드인가?
    bool terminal;
    TrieNode() : terminal(false) {
        memset(children,0,sizeof(children));
    }
    ~TrieNode() {
        for(int i=0;i<ALPHABETS;i++) 
            if(children[i]) delete children[i];		
    }
	// 이 노드를 루트로 하는 트라이에 문자열 key를 추가한다.

    void insert(const char* key) {
        // 문자열이 끝나면 terminal만 참으로 바꾸고 종료
        if(*key==0) terminal=true;
        else {
            int next=toNumber(*key);
            // 해당 자식 노드가 없다면 생성한다.
            if(children[next]==NULL) children[next]=new TrieNode();
            // 해당 자식 노드로 재귀호출
            children[next]->insert(key+1);
        }
    }

    // 이 노드를 루트로 하는 트라이에 문자열 key와 대응되는 노드를 찾는다.
    // 없으면 NULL을 반환
    TrieNode* find(const char* key) {
        if(*key==0) return this;
        
        int next=toNumber(*key);
        
        if(children[next]==NULL) return NULL;

        return children[next]->find(key+1);
    }
};

트라이 성능

$N$개의 원소가 들어있는 문자열 집합에서 길이 $M$인 접두사(prefix)를 찾기위한 시간 복잡도는 $O(M)$을 가진다. 그러나 메모리가 너무 많이 필요하기 때문에 대회에 출제된다면 문자열의 개수가 적은 경우가 나온다.

goooose

트라이 (Trie)

트라이 이해

트라이 구현

트라이 성능

Comments

You May Also Enjoy

Spark on Kubernetes

Python으로 Diffusion 바닥부터 구현하기[2] (im2col, UNet, DDPM)

Python으로 Diffusion 바닥부터 구현하기[1] (ResidualBlock, AttentionBlock, UpsampleBlock)

Model Context Protocol(MCP)