Multivariable linear/logistic regression

KyungHyun Lim

ML/AI/SW Developer

Multivariable linear/logistic regression

Jul 26, 2021

ustage

1. Multivariable linear regression

Data
Q1    Q2    Q3    Final(y)
70    80    75    75
82    71    98    83.6
76    92    91    86.3
90    95    99    94.6

H(x) = w1 * Q1 + w2 * Q2 + w3 * Q3 + b

for e in range(1, epoch + 1)
   hyp = x_train.matmul(W) + b #H(x)

   cost = torch.mean((hyp - y_train)**2)

   optimizer = optim.zerograd() 
   # 기존 grad에 계속 더하게 됨으로 꼭 초기화를 해주어야 한다.
   cost.backward()
   optimizer.step()

# Use nn library
import torch.nn as nn

class MultivariableLinearRegressionModel(nn.Module):
   def __init__(self):
      super().__init__()
      self.linear = nn.Linear(3, 1)

   def forward(self, x):
      return self.linear(x)

model = MultivariableLinearRegressionModel()
hyp = model.forward(x_train)

2. Data loading

대용량 데이터 처러
- 시간적 한계
- 하드웨어적 한계

Minibatch Gradient Descent

전체가 아니라 일부를 조금씩 학습

import torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
   def __init__(self):
      # Omit

   # 구현해야할 메소드 1
   def _len_(self):
      return len(self.x_data)

   # 구현해야할 메소드 2
   def _getitem_(self, idx):
      x = torch.FloatTensor(self.x_data[idx])
      y = torch.FloatTensor(self.y_data[dix])
      return x, y

dataset = CustomDataset()

dataloader = DataLoader(dataset, batch_size=2, shuffle=True,)

epoch = 20
for e in range(1, epoch + 1):
   for idx, data in enumerate(dataloader):
      x_train, y_train = data
      pred = model.forward(x_train)

      #omit

3. Logistic regression

Logistic regression: binary classification
- $ H(X) = {1 \over 1 + e^{-W^TX}} $
  - 시그모이드 함수와 유사
  - $ P(X; W) $ W가 주어졌을 때, $x$ 일 확률
- $ cost(W) = -{1 \over m} \sum ylog(H(x)) + (1-y)(log()1-H(x)) $
Code

 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.optim as optim

 # omit # 

 hyp = 1 / (1 + torch.exp(-x_train.matmul(W) + b))
 
 cost = (-(y_train*torch.log(hyp)+(1-y_train)
      * torch.log(1-hpy))).mean()
 
 # same
 # cost = F.binary_cross_entoropy(hpy, y_train)

 class BinaryClassifier(nn.Module):
    def.__init__(self):
       super().__init__()
       self.layer = Sequentail(
         nn.linear = nn.Linear(8,1),
         nn.Sigmoid()
       )
   
    def forward(self, x):
       return self.layer(x)

model = BinaryClaasifier()

4. Softmax Classification

Cross Entropy
- 2개의 확률분포가 얼마나 비슷한지 나타내주는 수치
- $ H(P,Q)=-\mathbb E _{x \sim P(X)} [logQ(X)] $
- 최소화하는것으로 $P \leftarrow Q$ 근사
```
# 각각을 확률 값으로 표시
z = torch.FloatTensor([1,2,3])
hyp = F.softmax(z, dim=0)
hyp.sum() # 1
```
원핫 벡터 만들기

 # hyp와 동일한 크기의 0으로 채워진 벡터 생성
 one_hot = torch.zeros_like(hyp)
 # y라벨의 값에 따라 해당 위치의 값을 1로 변경
   #(dim, 벡터, 뿌릴 값)
 one_hot.scatter_(1, y.unsqueeze(1), 1)

 # Low level
 cost = (one_hot * -torch.log(hyp)).sum(dim=1).mean()
 # High level
   # = F.nll_loss(F.log_softmax(z, dim=1), y)
   # = F.cross_entropy(z, y) -> 원핫벡터 만드는 과정도 생략 가능