트라이 (Trie)

Updated: March 27, 2021

트라이 이해

트라이(Trie)는 문자열 집합을 표현하는 아래 그림과 같은 자료구조이다.

출처 : https://aerocode.net/377

기본적으로 K진 트리구조를 이루고 있으며 문자열의 prefix를 빠르게 찾을 수 있다.

그러나 많은 공간이 필요하다는 단점이 존재한다. 예를들어 알파벳의 경우 총 26자리의 배열을 각 노드마다 확보해야 한다.

따라서 공간 복잡도는 O(포인터 크기 * 포인터 배열 개수 * 트라이 노드 개수)가 된다.

트라이 구현

#include <iostream>
#include <vector>
#include <cstring>
using namespace std;

const int ALPHABETS = 26;
int toNumber(char ch) { return ch-'A'; }

// 트라이의 한 노드를 나타내는 객체
struct TrieNode {
    TrieNode* children[ALPHABETS];

    // 이 노드가 종료 노드인가?
    bool terminal;
    TrieNode() : terminal(false) {
        memset(children,0,sizeof(children));
    }
    ~TrieNode() {
        for(int i=0;i<ALPHABETS;i++) 
            if(children[i]) delete children[i];		
    }
	// 이 노드를 루트로 하는 트라이에 문자열 key를 추가한다.

    void insert(const char* key) {
        // 문자열이 끝나면 terminal만 참으로 바꾸고 종료
        if(*key==0) terminal=true;
        else {
            int next=toNumber(*key);
            // 해당 자식 노드가 없다면 생성한다.
            if(children[next]==NULL) children[next]=new TrieNode();
            // 해당 자식 노드로 재귀호출
            children[next]->insert(key+1);
        }
    }

    // 이 노드를 루트로 하는 트라이에 문자열 key와 대응되는 노드를 찾는다.
    // 없으면 NULL을 반환
    TrieNode* find(const char* key) {
        if(*key==0) return this;
        
        int next=toNumber(*key);
        
        if(children[next]==NULL) return NULL;

        return children[next]->find(key+1);
    }
};

트라이 성능

$N$개의 원소가 들어있는 문자열 집합에서 길이 $M$인 접두사(prefix)를 찾기위한 시간 복잡도는 $O(M)$을 가진다. 그러나 메모리가 너무 많이 필요하기 때문에 대회에 출제된다면 문자열의 개수가 적은 경우가 나온다.

goooose

트라이 (Trie)

트라이 이해

트라이 구현

트라이 성능

Leave a comment

You may also enjoy

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Airflow task 디자인

Finetuning Large Language Models 정리

AI Village Capture the Flag @ DEFCON31 후기