1. Shape convention
# 2d
|t| = (batch size, dim)
# 3d
|t| = (batch size, width, height) # IMAGE
|t| = (batch size, length, dim) # NLP
2. Numpy와 유사하게 Tensor 다루기 가능
import torch
# slicing
t = torch.FloatTensor([1,2,3,4])
t[0:2] # [1,2,3]
t[2,-1] # [3,4]
# Broadcasting -> 자동으로 계산해줌으로 주의!
m1 = torch.FloatTensor([[1,1]])
m2 = torch.FloatTensor([[2,2]])
m1 + m2 # [[5, 5]]
m2 = torch.FloatTensor([[2]])
m1 + m2 # [[3, 3]]
m2 = torch.FloatTensor([[3], [4]])
m1 + m2 # [[4, 4], [5, 5]]
# View (Reshape)
t = np.array([[[0,1,2],
[3,4,5]],
[[6,7,8],
[9,10,11]]])
ft = torch.FloatTensor(t) # (2, 2, 3)
ft.view([-1, 3]) # (4, 3)
ft.view([-1, 1, 3]) # (4, 1, 3)
# squeeze: 1차원 제거 / unsqueeze : 원하는 곳에 1차원 생성
# concat
# stacking
x = torch.FloatTensor([1, 2])
y = torch.FloatTensor([3, 4])
z = torch.FloatTensor([5, 6])
torch.stack([x,y,z]) # [[1,2], [3,4], [5,6]]
torch.stack([x,y,z], dim=1) # [[1,2,3], [4,5,6]]
# ones_like(x), zeros_like(x)
3. Linear regression
import torch
import torch.optimizer as optim
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = troch.FloatTensor([[2], [4], [6]])
W = torch.zeros(1, requires_grad=True) # 학습할 변수
b = torch.zeros(1, requires_grad=True)
optimizer = optim.SGD([W, b], lr=0.001)
epochs = 100
for e in range(1, 1+epochs):
hyp = x_train*W + b
cost = torch.mean((hyp - y_train)**2)
optimizer.zero_grad()
cost.backward()
optimizer.step()
if e % 5 == 0:
print(cost.item())
4. Deeper 이해하기?
- cost function
- MSE(mean square error), MAE(mean abolute error) 등
- $ {\partial cost \over \partial W}=\nabla W$
- cost를 줄이기 위해, grad W 값에 일정상수를 곱한 값을 빼는 방식
- $ W := W - \alpha \nabla W $
- $ \alpha : $ Learning rate
- $ W :$ Gradient
- W -= lr * gradient (in torch)