KyungHyun Lim

ML/AI/SW Developer

Week7 - Day2

Sep 14, 2021

ustage

1. 개인학습

GPT-1, BERT

2. 선택과제

2.1 Natural Language Processing

BLEU score 25 이상 달성하기

epoch: 10

Loss: label_smoothed_cross_entropy_with_alignment

너무 많이 정답지로 제공되는 target에 대한 패널티?
Dataset class 정의
source-target index들을 담고 있는 [:, 2] shape로 주어진 텐서에, 가중치 벡터는 각 target index의 빈도수의 역수로 계산됨
- E.g. alignments = [[5, 7], [2, 3], [1, 3], [4, 2]]
- weight vector = [1., 0.5, 0.5, 1] / 1개 2개 2개 1개

  # Dataset에서 alignment 정의하는 부분 index pair 형성
  alignments = [
      alignment + offset
      for align_idx, offset, src_len, tgt_len in zip(sort_order, offsets, src_lengths, tgt_lengths)
      for alignment in [samples[align_idx]['alignment'].view(-1, 2)]
      if check_alignment(alignment, src_len, tgt_len)
  ]

  # label_smoothed_cross_entropy_with_alignment => forward function
      def forward(self, model, sample, reduce=True):
      """Compute the loss for the given sample.
      Returns a tuple with three elements:
      1) the loss
      2) the sample size, which is used as the denominator for the gradient
      3) logging outputs to display while training
      """
      net_output = model(**sample['net_input'])
      loss, nll_loss = self.compute_loss(model, net_output, sample, reduce=reduce)
      sample_size = sample['target'].size(0) if self.args.sentence_avg else sample['ntokens']
      logging_output = {
          'loss': utils.item(loss.data) if reduce else loss.data,
          'nll_loss': utils.item(nll_loss.data) if reduce else nll_loss.data,
          'ntokens': sample['ntokens'],
          'nsentences': sample['target'].size(0),
          'sample_size': sample_size,
      }

      alignment_loss = None

      # Compute alignment loss only for training set and non dummy batches.
      if 'alignments' in sample and sample['alignments'] is not None:
          alignment_loss = self.compute_alignment_loss(sample, net_output)

      if alignment_loss is not None:
          logging_output['alignment_loss'] = utils.item(alignment_loss.data)
          loss += self.alignment_lambda * alignment_loss

      return loss, sample_size, logging_output

  def compute_alignment_loss(self, sample, net_output):
      attn_prob = net_output[1]['attn']
      # batch size, target size, source size
      bsz, tgt_sz, src_sz = attn_prob.shape
      attn = attn_prob.view(bsz * tgt_sz, src_sz)

      align = sample['alignments']
      align_weights = sample['align_weights'].float()

      if len(align) > 0:
          # Alignment loss computation. align (shape [:, 2]) contains the src-tgt index pairs corresponding to
          # the alignments. align_weights (shape [:]) contains the 1 / frequency of a tgt index for normalizing.
          loss = -((attn[align[:, 1][:, None], align[:, 0][:, None]]).log() * align_weights[:, None]).sum()
      else:
          return None

      return loss

Optimizer: adamax
Clip_Norm: 0.2
- 학습 안정화 관련, gradient 조절

참고링크
- command line tools
- github-fairseq

3. 찾아 볼것

Masked Language Model?? => How bi-direction?
BERT
- Packed sentence embedding ?
- [CLS] - Classification embedding ?