Pytorch layer 초기화 함수

Updated: December 31, 2021

모델링을 하게되면 초기화를 신경쓰지 않게 되는데 어떤식으로 이루어지는지 잘 모르고 있었습니다. 그래서 Linear layer를 선언했을 때 weight와 bias를 어떻게 초기화하는지 알아보고자 합니다.

Class Linear

Linear 레이어 모듈을 살펴보기 위해 pytorch 코드를 가져왔습니다.

class Linear(Module):
    __constants__ = ['in_features', 'out_features']
    in_features: int
    out_features: int
    weight: Tensor

    def __init__(self, in_features: int, out_features: int, 
                bias: bool = True, device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(
            torch.empty((out_features, in_features), **factory_kwargs))
        if bias:
            self.bias = Parameter(
                torch.empty(out_features, **factory_kwargs))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self) -> None:
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            init.uniform_(self.bias, -bound, bound)

    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self) -> str:
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

Linear layer를 선언하게 되면 __init__()함수에 의해 self.weight와 self.bias를 생성합니다.

이후 이 파라미터들은 self.reset_parameters()에 의해 초기화가 진행됩니다.

reset parameters()

def reset_parameters(self) -> None:
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
        init.uniform_(self.bias, -bound, bound)

먼저, self.weight는 kaiming_unifrom에 의해 초기화가 진행됩니다. kaiming initialize는 검색하시면 자세한 내용이 있기 때문에 넘어가겠습니다.

다음, self.bias는 weight의 bound를 계산한 후 unform(-bound, bound)에 의해 초기화가 진행됩니다.

initialize Bias

bias를 uniform(-bound, bound)로 초기화하고 있기 때문에 다시 찾아봤는데 In PyTorch how are layer weights and biases initialized by default?에서 bias는 LeCunn 초기화 방법을 사용한다고 합니다.

Lecunn 초기화 방법의 아이디어는 확률분포를 fan_in으로 조절하고자 하는 것이라고 합니다.

bound를 계산하기 전에 _calculate_fan_in_and_fan_out()이라는 함수를 통해 fan_in이라는 값을 계산하는데 input layer의 뉴런 수를 fan_in, output layer의 뉴런 수를 fan_out이라고 합니다.

lecunn init 논문인 Efficient BackProp의 섹션 4.6을 보면 sqrt(1/fan_in)으로 표준편자를 정하고 평균은 0인 uniform하게 초기화합니다.

이렇게 nn.Linear 레이어의 weight는 kaiming, bias는 lecunn 초기화가 진행되어 reset_parameters()가 끝나게 됩니다.

다른 초기화 방법들

초기화 방법들은 여러가지가 있고 torch의 모듈마다 초기화 방법이 달라집니다.

예를들면 embedding 레이어는 normal을 사용합니다.

다른 Xaiver, He 등의 여러가지 초기화 방법들 또한 사용가능합니다.

사용할때는 초기화를 신경쓰면서 사용하는 편은 아니지만 한번 코드 내부를 돌면서 이런식으로 진행하는구나 정도로 봐주시면 될 것 같습니다.

goooose

Pytorch layer 초기화 함수

Class Linear

reset parameters()

initialize Bias

다른 초기화 방법들

Reference

Leave a comment

You may also enjoy

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Airflow task 디자인

Finetuning Large Language Models 정리

AI Village Capture the Flag @ DEFCON31 후기