-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes
113 lines (91 loc) · 2.84 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
__________________model___________________________
- no fully connected layers
- max pooling -> avg pooling
- VGG-19 layers
each layer
- produces F^l (num_filters(N), height_x_width(M))
- N = num filters
- M = content/image dimensions
- F_ij = value of i-th filter response in content position j
- produces G^l (num_filters(N), num_filters(N)) (correlation matrix between filters in layer)
-
- CONTENT RECREATION:
- **what has worked**
- NORMALIZING THE LOSS
- SHALLOWER LAYER
- trial 1
- yea
- trial 2
- changed layer from 3 -> 1?
- normalized loss
- trial 3
- it was the layer number not the normalization that made the difference :(
- trial 4
- weight decay fucked it up?
- changed weight decay from 0.04 -> 0.0001
- trial 5
- best one yet!
- the weight decay change helped a lot
- using layer 1 helps a lot too.
- let me see what increasing the learning rate does
- trial 7
- wow increasing the learning rate did a lot, let me see how much further I can take this
- lr 0.1 -> 1.4
- converging even faster than trial 5
- loss at 11260: 7239
- trial 7_2
- changes:
- lr 1.4 -> 1.6
- scheduler -> 0.8 -> 0.9
- observations:
- stopped early, more noisy, forgot too change trial number
- trial 8
- changes:
- optimizer: Adam -> LBFGS
- lr .13 -> .12
- scheduler -> None
- observations:
- converged very fast
- loss went lower than trial 7 but results were worse
- trial 9
- changes:
- lr 0.12 -> 0.08
- observations:
- not as good as adam so far
- trial 10
- changes:
- max_iter: 30 -> 15
- observations:
- not really helping
- trial 11
- changes:
- lr 0.08 -> 0.15
- observations:
- fuck LBFGS going back to adam
- trial 12
- changes:
- **changed sum of loss to mean**
- observations:
- **very interesting** -> after defining loss as the mean instead of sum of squared error, images produces now appear more saturated and purple in color
- trial 12
- changes:
- **added mean loss to denominator for loss**
- observations:
- convergence happened much faster but quality of convergence remained the same
- later trials:
- observations/changes:
- **added normalization and mean subtraction** <-- FIXED KEY ERRORS
- **was the thing that was causing all the errors**
- **used non-white noise initialization**
- unsure of effectiveness
- **higher learning rate (0.2) allows for quick and accurate convergence**
- insights:
- high loss -> BAD, harder to converge
- white noise randint[0, 255] converges slower than rand[0, 1]
- (twice as slow)
STYLE RECREATION:
trial 1
- paper's alpha/beta ratio does not work
observations:
- my beta value was dominating the loss so I made alpha 1 and beta 1e-5
- the paper had it the opposite, maybe the style representation layer should've been normalized or something