ML/AI/SW Developer

Week7 - Day2

1. 개인학습

2. 선택과제

2.1 Natural Language Processing

  • BLEU score 25 이상 달성하기
    • epoch: 10
    • Loss: label_smoothed_cross_entropy_with_alignment
      • 너무 많이 정답지로 제공되는 target에 대한 패널티?
      • Dataset class 정의
      • source-target index들을 담고 있는 [:, 2] shape로 주어진 텐서에, 가중치 벡터는 각 target index의 빈도수의 역수로 계산됨
        • E.g. alignments = [[5, 7], [2, 3], [1, 3], [4, 2]]
        • weight vector = [1., 0.5, 0.5, 1] / 1개 2개 2개 1개
        # Dataset에서 alignment 정의하는 부분 index pair 형성
        alignments = [
            alignment + offset
            for align_idx, offset, src_len, tgt_len in zip(sort_order, offsets, src_lengths, tgt_lengths)
            for alignment in [samples[align_idx]['alignment'].view(-1, 2)]
            if check_alignment(alignment, src_len, tgt_len)
        ]
      
        # label_smoothed_cross_entropy_with_alignment => forward function
            def forward(self, model, sample, reduce=True):
            """Compute the loss for the given sample.
            Returns a tuple with three elements:
            1) the loss
            2) the sample size, which is used as the denominator for the gradient
            3) logging outputs to display while training
            """
            net_output = model(**sample['net_input'])
            loss, nll_loss = self.compute_loss(model, net_output, sample, reduce=reduce)
            sample_size = sample['target'].size(0) if self.args.sentence_avg else sample['ntokens']
            logging_output = {
                'loss': utils.item(loss.data) if reduce else loss.data,
                'nll_loss': utils.item(nll_loss.data) if reduce else nll_loss.data,
                'ntokens': sample['ntokens'],
                'nsentences': sample['target'].size(0),
                'sample_size': sample_size,
            }
      
            alignment_loss = None
      
            # Compute alignment loss only for training set and non dummy batches.
            if 'alignments' in sample and sample['alignments'] is not None:
                alignment_loss = self.compute_alignment_loss(sample, net_output)
      
            if alignment_loss is not None:
                logging_output['alignment_loss'] = utils.item(alignment_loss.data)
                loss += self.alignment_lambda * alignment_loss
      
            return loss, sample_size, logging_output
      
        def compute_alignment_loss(self, sample, net_output):
            attn_prob = net_output[1]['attn']
            # batch size, target size, source size
            bsz, tgt_sz, src_sz = attn_prob.shape
            attn = attn_prob.view(bsz * tgt_sz, src_sz)
      
            align = sample['alignments']
            align_weights = sample['align_weights'].float()
      
            if len(align) > 0:
                # Alignment loss computation. align (shape [:, 2]) contains the src-tgt index pairs corresponding to
                # the alignments. align_weights (shape [:]) contains the 1 / frequency of a tgt index for normalizing.
                loss = -((attn[align[:, 1][:, None], align[:, 0][:, None]]).log() * align_weights[:, None]).sum()
            else:
                return None
      
            return loss
      
    • Optimizer: adamax
    • Clip_Norm: 0.2
      • 학습 안정화 관련, gradient 조절
  • 참고링크

3. 찾아 볼것

  • Masked Language Model?? => How bi-direction?
  • BERT
    • Packed sentence embedding ?
    • [CLS] - Classification embedding ?