-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgpt-medium-full.txt
205 lines (202 loc) · 43.5 KB
/
gpt-medium-full.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
[2023-03-21 10:48:59,017][__main__][INFO] -
alg: mend
lr: 1.0e-06
edit_lr: 0.0001
seed: 0
debug: false
model_save_pt: 5000
edit_bs: 1
silent: false
max_iters: 1000000
log_interval: 100
val_interval: 5000
lr_lr: 0.0001
batch_size: 2
val_batch_size: 5
accumulate_bs: 10
cedit: 0.1
cloc: 1.0
cbase: 1.0
val_steps: 500
device: cuda
base_loss: distill
oracle: false
train: true
train_base: false
opt: Adam
single_batch: false
archive: null
grad_clip: 100.0
ref: null
early_stop_patience: 20000
early_stop_key: loss/total_edit_val
dropout: 0.0
tokenizer: null
results_dir: null
no_grad_layers: null
eval_only: false
half: false
save: true
model:
pt: ${hydra:runtime.cwd}/data/fever/gpt2-medium.bin
name: gpt2-medium
class_name: GPT2ForSequenceClassification
tokenizer_class: GPT2TokenizerFast
tokenizer_name: gpt2-medium
inner_params:
- transformer.h.9.mlp.c_proj.weight
- transformer.h.9.mlp.c_fc.weight
- transformer.h.10.mlp.c_proj.weight
- transformer.h.10.mlp.c_fc.weight
- transformer.h.11.mlp.c_proj.weight
- transformer.h.11.mlp.c_fc.weight
data:
path: null
rephrase: true
zsre_nq: true
nq_path: ${hydra:runtime.cwd}/data/nq
wiki_webtext: true
n_edits: 1
eval:
verbose: true
log_interval: 100
final_eval: true
mend:
one_sided: false
n_hidden: 1
hidden_dim: null
init: id
norm: true
combine: true
x_only: false
delta_only: false
act: relu
rank: 1920
mlp_class: IDMLP
shared: true
task: fc
dataset: fever
tests: false
[2023-03-21 10:48:59,017][__main__][INFO] - Project base directory: /home/anonymous-xme/mend/mend
[2023-03-21 10:48:59,096][models][INFO] - Loading model class <class 'transformers.models.gpt2.modeling_gpt2.GPT2ForSequenceClassification'> with name gpt2-medium from cache dir /home/anonymous-xme/mend/mend/cache/
[2023-03-21 10:49:03,462][models][INFO] - Loading model initialization from /home/anonymous-xme/mend/mend/data/fever/gpt2-medium.bin
[2023-03-21 10:49:04,343][models][INFO] - Loaded model initialization
[2023-03-21 10:49:04,346][models][INFO] - Set 73 dropout modules to p=0.0
[2023-03-21 10:49:08,417][__main__][INFO] - Loading class MEND from module <module 'algs.mend' from '/home/anonymous-xme/mend/mend/algs/mend.py'>
[2023-03-21 10:49:08,417][algs.mend][INFO] - Hooked 6 modules
========== 4096 1024
========== 3
[2023-03-21 10:49:08,419][algs.mend][INFO] - Building Gradient Transform with MLP class <class 'nn.IDMLP'>
[2023-03-21 10:49:08,419][nn][INFO] - Building IDMLP (id) [5120, 5120, 5120]
========== 1024 4096
========== 3
[2023-03-21 10:49:08,560][algs.mend][INFO] - Building Gradient Transform with MLP class <class 'nn.IDMLP'>
[2023-03-21 10:49:08,560][nn][INFO] - Building IDMLP (id) [5120, 5120, 5120]
[2023-03-21 10:49:19,140][trainer][INFO] - Building optimizer <class 'torch.optim.adam.Adam'> with lr 1e-06
[2023-03-21 10:49:19,142][trainer][INFO] - Writing wandb run "fever - mend - gpt2-medium - 2023-03-21_10-48-58_9857706641" to /tmp/tmpw_kqsbue
[2023-03-21 10:49:23,152][trainer][INFO] - Step 0:
[2023-03-21 10:49:23,153][trainer][INFO] - loss/edit_train: 13.89681; loss/loc_train: 0.00000; edit/acc_train: 0.00000; edit/log_prob_train: -13.89681; edit/prob_train: 0.00000; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00001; perplexity/post_train: 1.00001; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.41937; loss/total_train: 1.38968; loss/total_edit_train: 1.38968; memory/alloc_max_train: 5400301568.00000; memory/res_max_train: 5706350592.00000
[2023-03-21 10:53:43,369][trainer][INFO] - Step 0:
[2023-03-21 10:53:43,369][trainer][INFO] - loss/edit_val: 5.93248; loss/loc_val: 0.15673; edit/acc_val: 0.44000; edit/log_prob_val: -5.93248; edit/prob_val: 0.44009; acc/pre_val: 0.78050; acc/post_val: 0.76800; nll/pre_val: 2.01246; perplexity/pre_val: 7.48168; nll/post_val: 2.15907; perplexity/post_val: 8.66304; n_tokens/pre_val: 4.00000; n_tokens/post_val: 4.00000; time/edit_val: 0.29059; loss/total_val: 0.74998; loss/total_edit_val: 0.74998; memory/alloc_max_val: 5400301568.00000; memory/res_max_val: 5913968640.00000; eval_time/elapsed: 260.18832; eval_time/average: 0.52038
[2023-03-21 10:53:43,379][trainer][INFO] - Saving model to /home/anonymous-xme/mend/mend/outputs/2023-03-21_10-48-58_9857706641/models/gpt2-medium.2023-03-21_10-48-58_9857706641
[2023-03-21 10:53:44,085][trainer][INFO] - Write complete.
[2023-03-21 10:54:51,173][trainer][INFO] - Step 100:
[2023-03-21 10:54:51,173][trainer][INFO] - loss/edit_train: 6.17644; loss/loc_train: 0.00003; edit/acc_train: 0.50000; edit/log_prob_train: -6.17644; edit/prob_train: 0.49957; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00003; perplexity/pre_train: 1.00003; nll/post_train: 0.00003; perplexity/post_train: 1.00003; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30240; loss/total_train: 0.61767; loss/total_edit_train: 0.61767; memory/alloc_max_train: 6067623848.96000; memory/res_max_train: 6533425397.76000; grad_train: 146.18368; lr/lr0_train: 0.00010; lr/lr1_train: 0.00010; lr/lr2_train: 0.00010; lr/lr3_train: 0.00010; lr/lr4_train: 0.00010; lr/lr5_train: 0.00010
[2023-03-21 10:55:58,116][trainer][INFO] - Step 200:
[2023-03-21 10:55:58,117][trainer][INFO] - loss/edit_train: 6.52258; loss/loc_train: 0.00001; edit/acc_train: 0.44000; edit/log_prob_train: -6.52258; edit/prob_train: 0.43902; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.00000; perplexity/post_train: 1.00000; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30360; loss/total_train: 0.65227; loss/total_edit_train: 0.65227; memory/alloc_max_train: 6139809950.72000; memory/res_max_train: 6595543040.00000; grad_train: 81.74108; lr/lr0_train: 0.00010; lr/lr1_train: 0.00010; lr/lr2_train: 0.00011; lr/lr3_train: 0.00010; lr/lr4_train: 0.00011; lr/lr5_train: 0.00011
[2023-03-21 10:57:03,616][trainer][INFO] - Step 300:
[2023-03-21 10:57:03,616][trainer][INFO] - loss/edit_train: 5.49369; loss/loc_train: 0.00000; edit/acc_train: 0.55000; edit/log_prob_train: -5.49369; edit/prob_train: 0.55009; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.00000; perplexity/post_train: 1.00000; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.29188; loss/total_train: 0.54937; loss/total_edit_train: 0.54937; memory/alloc_max_train: 6142564352.00000; memory/res_max_train: 6595543040.00000; grad_train: 57.83246; lr/lr0_train: 0.00011; lr/lr1_train: 0.00009; lr/lr2_train: 0.00011; lr/lr3_train: 0.00010; lr/lr4_train: 0.00012; lr/lr5_train: 0.00010
[2023-03-21 10:58:10,783][trainer][INFO] - Step 400:
[2023-03-21 10:58:10,784][trainer][INFO] - loss/edit_train: 6.71661; loss/loc_train: 0.00000; edit/acc_train: 0.48000; edit/log_prob_train: -6.71661; edit/prob_train: 0.47867; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00001; perplexity/post_train: 1.00001; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30390; loss/total_train: 0.67166; loss/total_edit_train: 0.67166; memory/alloc_max_train: 6150416389.12000; memory/res_max_train: 6604309135.36000; grad_train: 132.88409; lr/lr0_train: 0.00011; lr/lr1_train: 0.00009; lr/lr2_train: 0.00011; lr/lr3_train: 0.00010; lr/lr4_train: 0.00012; lr/lr5_train: 0.00011
[2023-03-21 10:59:17,205][trainer][INFO] - Step 500:
[2023-03-21 10:59:17,206][trainer][INFO] - loss/edit_train: 6.13525; loss/loc_train: 0.00000; edit/acc_train: 0.51000; edit/log_prob_train: -6.13525; edit/prob_train: 0.50851; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00012; perplexity/pre_train: 1.00012; nll/post_train: 0.00012; perplexity/post_train: 1.00012; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30063; loss/total_train: 0.61353; loss/total_edit_train: 0.61353; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 230.00507; lr/lr0_train: 0.00011; lr/lr1_train: 0.00009; lr/lr2_train: 0.00012; lr/lr3_train: 0.00010; lr/lr4_train: 0.00013; lr/lr5_train: 0.00011
[2023-03-21 11:00:23,948][trainer][INFO] - Step 600:
[2023-03-21 11:00:23,948][trainer][INFO] - loss/edit_train: 6.01449; loss/loc_train: 0.00000; edit/acc_train: 0.51000; edit/log_prob_train: -6.01449; edit/prob_train: 0.51007; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00001; perplexity/post_train: 1.00001; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30145; loss/total_train: 0.60145; loss/total_edit_train: 0.60145; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 190.70796; lr/lr0_train: 0.00011; lr/lr1_train: 0.00009; lr/lr2_train: 0.00013; lr/lr3_train: 0.00010; lr/lr4_train: 0.00014; lr/lr5_train: 0.00011
[2023-03-21 11:01:37,900][trainer][INFO] - Step 700:
[2023-03-21 11:01:37,901][trainer][INFO] - loss/edit_train: 5.99544; loss/loc_train: 0.00001; edit/acc_train: 0.52000; edit/log_prob_train: -5.99544; edit/prob_train: 0.52036; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.00002; perplexity/post_train: 1.00002; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.33328; loss/total_train: 0.59955; loss/total_edit_train: 0.59955; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 131.16586; lr/lr0_train: 0.00012; lr/lr1_train: 0.00009; lr/lr2_train: 0.00014; lr/lr3_train: 0.00010; lr/lr4_train: 0.00015; lr/lr5_train: 0.00012
[2023-03-21 11:02:44,860][trainer][INFO] - Step 800:
[2023-03-21 11:02:44,860][trainer][INFO] - loss/edit_train: 7.31217; loss/loc_train: 0.00002; edit/acc_train: 0.43000; edit/log_prob_train: -7.31217; edit/prob_train: 0.42917; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.00000; perplexity/post_train: 1.00000; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30142; loss/total_train: 0.73123; loss/total_edit_train: 0.73123; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 197.91730; lr/lr0_train: 0.00013; lr/lr1_train: 0.00009; lr/lr2_train: 0.00015; lr/lr3_train: 0.00010; lr/lr4_train: 0.00016; lr/lr5_train: 0.00013
[2023-03-21 11:03:51,887][trainer][INFO] - Step 900:
[2023-03-21 11:03:51,888][trainer][INFO] - loss/edit_train: 6.31975; loss/loc_train: 0.00004; edit/acc_train: 0.45000; edit/log_prob_train: -6.31975; edit/prob_train: 0.45189; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.00000; perplexity/post_train: 1.00000; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30261; loss/total_train: 0.63202; loss/total_edit_train: 0.63202; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 202.29385; lr/lr0_train: 0.00014; lr/lr1_train: 0.00009; lr/lr2_train: 0.00016; lr/lr3_train: 0.00011; lr/lr4_train: 0.00017; lr/lr5_train: 0.00013
[2023-03-21 11:04:59,015][trainer][INFO] - Step 1000:
[2023-03-21 11:04:59,016][trainer][INFO] - loss/edit_train: 5.91491; loss/loc_train: 0.00140; edit/acc_train: 0.48000; edit/log_prob_train: -5.91491; edit/prob_train: 0.47458; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00011; perplexity/post_train: 1.00011; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30265; loss/total_train: 0.59289; loss/total_edit_train: 0.59289; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 310.69439; lr/lr0_train: 0.00014; lr/lr1_train: 0.00009; lr/lr2_train: 0.00017; lr/lr3_train: 0.00011; lr/lr4_train: 0.00019; lr/lr5_train: 0.00013
[2023-03-21 11:06:06,754][trainer][INFO] - Step 1100:
[2023-03-21 11:06:06,755][trainer][INFO] - loss/edit_train: 6.16502; loss/loc_train: 0.00751; edit/acc_train: 0.41000; edit/log_prob_train: -6.16502; edit/prob_train: 0.41002; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.00168; perplexity/post_train: 1.00169; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30686; loss/total_train: 0.62401; loss/total_edit_train: 0.62401; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 592.31112; lr/lr0_train: 0.00015; lr/lr1_train: 0.00009; lr/lr2_train: 0.00018; lr/lr3_train: 0.00011; lr/lr4_train: 0.00020; lr/lr5_train: 0.00014
[2023-03-21 11:07:13,574][trainer][INFO] - Step 1200:
[2023-03-21 11:07:13,574][trainer][INFO] - loss/edit_train: 4.35095; loss/loc_train: 0.00267; edit/acc_train: 0.53000; edit/log_prob_train: -4.35095; edit/prob_train: 0.53533; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00031; perplexity/pre_train: 1.00031; nll/post_train: 0.00053; perplexity/post_train: 1.00053; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30056; loss/total_train: 0.43776; loss/total_edit_train: 0.43776; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 516.72688; lr/lr0_train: 0.00015; lr/lr1_train: 0.00010; lr/lr2_train: 0.00018; lr/lr3_train: 0.00011; lr/lr4_train: 0.00021; lr/lr5_train: 0.00014
[2023-03-21 11:08:20,119][trainer][INFO] - Step 1300:
[2023-03-21 11:08:20,120][trainer][INFO] - loss/edit_train: 3.39781; loss/loc_train: 0.01649; edit/acc_train: 0.60000; edit/log_prob_train: -3.39781; edit/prob_train: 0.60310; acc/pre_train: 0.99000; acc/post_train: 0.98000; nll/pre_train: 0.00851; perplexity/pre_train: 1.00854; nll/post_train: 0.02096; perplexity/post_train: 1.02118; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.29999; loss/total_train: 0.35627; loss/total_edit_train: 0.35627; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 564.34177; lr/lr0_train: 0.00016; lr/lr1_train: 0.00010; lr/lr2_train: 0.00019; lr/lr3_train: 0.00012; lr/lr4_train: 0.00022; lr/lr5_train: 0.00015
[2023-03-21 11:09:26,966][trainer][INFO] - Step 1400:
[2023-03-21 11:09:26,966][trainer][INFO] - loss/edit_train: 3.57366; loss/loc_train: 0.04989; edit/acc_train: 0.62000; edit/log_prob_train: -3.57366; edit/prob_train: 0.59456; acc/pre_train: 1.00000; acc/post_train: 0.96000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.06348; perplexity/post_train: 1.06553; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30192; loss/total_train: 0.40726; loss/total_edit_train: 0.40726; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 388.89512; lr/lr0_train: 0.00017; lr/lr1_train: 0.00010; lr/lr2_train: 0.00020; lr/lr3_train: 0.00012; lr/lr4_train: 0.00023; lr/lr5_train: 0.00016
[2023-03-21 11:10:32,952][trainer][INFO] - Step 1500:
[2023-03-21 11:10:32,952][trainer][INFO] - loss/edit_train: 3.25201; loss/loc_train: 0.06233; edit/acc_train: 0.54000; edit/log_prob_train: -3.25201; edit/prob_train: 0.52700; acc/pre_train: 1.00000; acc/post_train: 0.96000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.05207; perplexity/post_train: 1.05345; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.29820; loss/total_train: 0.38753; loss/total_edit_train: 0.38753; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 541.64818; lr/lr0_train: 0.00017; lr/lr1_train: 0.00011; lr/lr2_train: 0.00020; lr/lr3_train: 0.00013; lr/lr4_train: 0.00024; lr/lr5_train: 0.00016
[2023-03-21 11:11:40,413][trainer][INFO] - Step 1600:
[2023-03-21 11:11:40,413][trainer][INFO] - loss/edit_train: 2.04692; loss/loc_train: 0.05634; edit/acc_train: 0.58000; edit/log_prob_train: -2.04692; edit/prob_train: 0.61456; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00664; perplexity/pre_train: 1.00666; nll/post_train: 0.03190; perplexity/post_train: 1.03242; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30227; loss/total_train: 0.26104; loss/total_edit_train: 0.26104; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 536.42504; lr/lr0_train: 0.00018; lr/lr1_train: 0.00011; lr/lr2_train: 0.00021; lr/lr3_train: 0.00013; lr/lr4_train: 0.00025; lr/lr5_train: 0.00017
[2023-03-21 11:12:47,253][trainer][INFO] - Step 1700:
[2023-03-21 11:12:47,254][trainer][INFO] - loss/edit_train: 2.75208; loss/loc_train: 0.04720; edit/acc_train: 0.48000; edit/log_prob_train: -2.75208; edit/prob_train: 0.50672; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.01675; perplexity/post_train: 1.01690; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30132; loss/total_train: 0.32240; loss/total_edit_train: 0.32240; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 830.87480; lr/lr0_train: 0.00018; lr/lr1_train: 0.00011; lr/lr2_train: 0.00021; lr/lr3_train: 0.00013; lr/lr4_train: 0.00025; lr/lr5_train: 0.00017
[2023-03-21 11:13:54,520][trainer][INFO] - Step 1800:
[2023-03-21 11:13:54,520][trainer][INFO] - loss/edit_train: 2.15729; loss/loc_train: 0.05368; edit/acc_train: 0.48000; edit/log_prob_train: -2.15729; edit/prob_train: 0.50109; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00011; perplexity/pre_train: 1.00011; nll/post_train: 0.03066; perplexity/post_train: 1.03114; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30428; loss/total_train: 0.26941; loss/total_edit_train: 0.26941; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6641680384.00000; grad_train: 521.59366; lr/lr0_train: 0.00019; lr/lr1_train: 0.00011; lr/lr2_train: 0.00021; lr/lr3_train: 0.00013; lr/lr4_train: 0.00025; lr/lr5_train: 0.00018
[2023-03-21 11:15:01,799][trainer][INFO] - Step 1900:
[2023-03-21 11:15:01,799][trainer][INFO] - loss/edit_train: 1.84317; loss/loc_train: 0.06298; edit/acc_train: 0.56000; edit/log_prob_train: -1.84317; edit/prob_train: 0.54990; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.04281; perplexity/post_train: 1.04374; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30211; loss/total_train: 0.24729; loss/total_edit_train: 0.24729; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6642120785.92000; grad_train: 406.69386; lr/lr0_train: 0.00019; lr/lr1_train: 0.00012; lr/lr2_train: 0.00021; lr/lr3_train: 0.00013; lr/lr4_train: 0.00026; lr/lr5_train: 0.00019
[2023-03-21 11:16:08,643][trainer][INFO] - Step 2000:
[2023-03-21 11:16:08,644][trainer][INFO] - loss/edit_train: 1.72695; loss/loc_train: 0.03567; edit/acc_train: 0.59000; edit/log_prob_train: -1.72695; edit/prob_train: 0.58190; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.01636; perplexity/post_train: 1.01649; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30151; loss/total_train: 0.20837; loss/total_edit_train: 0.20837; memory/alloc_max_train: 6173925888.00000; memory/res_max_train: 6643777536.00000; grad_train: 517.29416; lr/lr0_train: 0.00019; lr/lr1_train: 0.00012; lr/lr2_train: 0.00022; lr/lr3_train: 0.00014; lr/lr4_train: 0.00026; lr/lr5_train: 0.00019
[2023-03-21 11:17:15,883][trainer][INFO] - Step 2100:
[2023-03-21 11:17:15,884][trainer][INFO] - loss/edit_train: 1.32720; loss/loc_train: 0.08007; edit/acc_train: 0.66000; edit/log_prob_train: -1.32720; edit/prob_train: 0.64206; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.05797; perplexity/post_train: 1.05968; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30199; loss/total_train: 0.21279; loss/total_edit_train: 0.21279; memory/alloc_max_train: 6173926492.16000; memory/res_max_train: 6643777536.00000; grad_train: 616.18992; lr/lr0_train: 0.00020; lr/lr1_train: 0.00012; lr/lr2_train: 0.00022; lr/lr3_train: 0.00014; lr/lr4_train: 0.00026; lr/lr5_train: 0.00020
[2023-03-21 11:18:22,672][trainer][INFO] - Step 2200:
[2023-03-21 11:18:22,673][trainer][INFO] - loss/edit_train: 2.09809; loss/loc_train: 0.04329; edit/acc_train: 0.56000; edit/log_prob_train: -2.09809; edit/prob_train: 0.53280; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.02807; perplexity/post_train: 1.02847; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.29997; loss/total_train: 0.25310; loss/total_edit_train: 0.25310; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 444.90930; lr/lr0_train: 0.00020; lr/lr1_train: 0.00012; lr/lr2_train: 0.00023; lr/lr3_train: 0.00014; lr/lr4_train: 0.00026; lr/lr5_train: 0.00021
[2023-03-21 11:19:29,761][trainer][INFO] - Step 2300:
[2023-03-21 11:19:29,762][trainer][INFO] - loss/edit_train: 1.69359; loss/loc_train: 0.05635; edit/acc_train: 0.63000; edit/log_prob_train: -1.69359; edit/prob_train: 0.58243; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.03495; perplexity/post_train: 1.03557; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30286; loss/total_train: 0.22571; loss/total_edit_train: 0.22571; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 415.02202; lr/lr0_train: 0.00020; lr/lr1_train: 0.00012; lr/lr2_train: 0.00023; lr/lr3_train: 0.00014; lr/lr4_train: 0.00027; lr/lr5_train: 0.00022
[2023-03-21 11:20:37,136][trainer][INFO] - Step 2400:
[2023-03-21 11:20:37,136][trainer][INFO] - loss/edit_train: 1.59063; loss/loc_train: 0.03702; edit/acc_train: 0.63000; edit/log_prob_train: -1.59063; edit/prob_train: 0.61029; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.01835; perplexity/post_train: 1.01852; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30304; loss/total_train: 0.19608; loss/total_edit_train: 0.19608; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 560.41795; lr/lr0_train: 0.00021; lr/lr1_train: 0.00013; lr/lr2_train: 0.00023; lr/lr3_train: 0.00014; lr/lr4_train: 0.00027; lr/lr5_train: 0.00023
[2023-03-21 11:21:39,964][trainer][INFO] - Step 2500:
[2023-03-21 11:21:39,964][trainer][INFO] - loss/edit_train: 1.22301; loss/loc_train: 0.03286; edit/acc_train: 0.63000; edit/log_prob_train: -1.22301; edit/prob_train: 0.62699; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.01082; perplexity/post_train: 1.01088; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.28524; loss/total_train: 0.15516; loss/total_edit_train: 0.15516; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 643.97137; lr/lr0_train: 0.00020; lr/lr1_train: 0.00013; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00024
[2023-03-21 11:22:41,946][trainer][INFO] - Step 2600:
[2023-03-21 11:22:41,946][trainer][INFO] - loss/edit_train: 1.30232; loss/loc_train: 0.04453; edit/acc_train: 0.58000; edit/log_prob_train: -1.30232; edit/prob_train: 0.57781; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.01979; perplexity/post_train: 1.01999; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.28052; loss/total_train: 0.17476; loss/total_edit_train: 0.17476; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 291.84660; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00028; lr/lr5_train: 0.00024
[2023-03-21 11:23:48,737][trainer][INFO] - Step 2700:
[2023-03-21 11:23:48,737][trainer][INFO] - loss/edit_train: 1.06474; loss/loc_train: 0.04708; edit/acc_train: 0.65000; edit/log_prob_train: -1.06474; edit/prob_train: 0.61908; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.01837; perplexity/post_train: 1.01854; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30267; loss/total_train: 0.15356; loss/total_edit_train: 0.15356; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 266.42852; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00028; lr/lr5_train: 0.00025
[2023-03-21 11:24:55,962][trainer][INFO] - Step 2800:
[2023-03-21 11:24:55,963][trainer][INFO] - loss/edit_train: 0.94740; loss/loc_train: 0.07843; edit/acc_train: 0.70000; edit/log_prob_train: -0.94740; edit/prob_train: 0.68253; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.06939; perplexity/post_train: 1.07186; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30155; loss/total_train: 0.17317; loss/total_edit_train: 0.17317; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 662.67806; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00025
[2023-03-21 11:26:02,776][trainer][INFO] - Step 2900:
[2023-03-21 11:26:02,777][trainer][INFO] - loss/edit_train: 1.10955; loss/loc_train: 0.08253; edit/acc_train: 0.73000; edit/log_prob_train: -1.10955; edit/prob_train: 0.66938; acc/pre_train: 1.00000; acc/post_train: 0.95000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.07650; perplexity/post_train: 1.07951; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30239; loss/total_train: 0.19349; loss/total_edit_train: 0.19349; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 495.27822; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00026
[2023-03-21 11:27:09,770][trainer][INFO] - Step 3000:
[2023-03-21 11:27:09,771][trainer][INFO] - loss/edit_train: 1.24500; loss/loc_train: 0.03550; edit/acc_train: 0.64000; edit/log_prob_train: -1.24500; edit/prob_train: 0.60913; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.02641; perplexity/post_train: 1.02676; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30268; loss/total_train: 0.16000; loss/total_edit_train: 0.16000; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 337.79123; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00026
[2023-03-21 11:28:16,719][trainer][INFO] - Step 3100:
[2023-03-21 11:28:16,720][trainer][INFO] - loss/edit_train: 1.19944; loss/loc_train: 0.05692; edit/acc_train: 0.66000; edit/log_prob_train: -1.19944; edit/prob_train: 0.65570; acc/pre_train: 1.00000; acc/post_train: 0.97000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.04661; perplexity/post_train: 1.04771; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30104; loss/total_train: 0.17686; loss/total_edit_train: 0.17686; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 404.29949; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00023; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00027
[2023-03-21 11:29:24,773][trainer][INFO] - Step 3200:
[2023-03-21 11:29:24,774][trainer][INFO] - loss/edit_train: 1.25447; loss/loc_train: 0.05528; edit/acc_train: 0.68000; edit/log_prob_train: -1.25447; edit/prob_train: 0.66151; acc/pre_train: 1.00000; acc/post_train: 0.96000; nll/pre_train: 0.00476; perplexity/pre_train: 1.00477; nll/post_train: 0.06834; perplexity/post_train: 1.07073; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.31020; loss/total_train: 0.18072; loss/total_edit_train: 0.18072; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 576.39880; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00027
[2023-03-21 11:30:32,271][trainer][INFO] - Step 3300:
[2023-03-21 11:30:32,271][trainer][INFO] - loss/edit_train: 0.65786; loss/loc_train: 0.05747; edit/acc_train: 0.80000; edit/log_prob_train: -0.65786; edit/prob_train: 0.74430; acc/pre_train: 0.98000; acc/post_train: 0.94000; nll/pre_train: 0.07251; perplexity/pre_train: 1.07520; nll/post_train: 0.12956; perplexity/post_train: 1.13833; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30511; loss/total_train: 0.12325; loss/total_edit_train: 0.12325; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 353.70439; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00028
[2023-03-21 11:31:39,163][trainer][INFO] - Step 3400:
[2023-03-21 11:31:39,163][trainer][INFO] - loss/edit_train: 1.20841; loss/loc_train: 0.05953; edit/acc_train: 0.71000; edit/log_prob_train: -1.20841; edit/prob_train: 0.66853; acc/pre_train: 1.00000; acc/post_train: 0.97000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.05610; perplexity/post_train: 1.05771; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30263; loss/total_train: 0.18037; loss/total_edit_train: 0.18037; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 523.38836; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00028
[2023-03-21 11:32:47,578][trainer][INFO] - Step 3500:
[2023-03-21 11:32:47,578][trainer][INFO] - loss/edit_train: 0.74341; loss/loc_train: 0.04502; edit/acc_train: 0.75000; edit/log_prob_train: -0.74341; edit/prob_train: 0.72990; acc/pre_train: 1.00000; acc/post_train: 0.97000; nll/pre_train: 0.00007; perplexity/pre_train: 1.00007; nll/post_train: 0.04093; perplexity/post_train: 1.04178; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30961; loss/total_train: 0.11936; loss/total_edit_train: 0.11936; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 501.76300; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00028
[2023-03-21 11:33:55,253][trainer][INFO] - Step 3600:
[2023-03-21 11:33:55,254][trainer][INFO] - loss/edit_train: 0.91230; loss/loc_train: 0.03448; edit/acc_train: 0.70000; edit/log_prob_train: -0.91230; edit/prob_train: 0.69152; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.03549; perplexity/post_train: 1.03612; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30496; loss/total_train: 0.12571; loss/total_edit_train: 0.12571; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 426.83295; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00015; lr/lr4_train: 0.00027; lr/lr5_train: 0.00029
[2023-03-21 11:35:02,910][trainer][INFO] - Step 3700:
[2023-03-21 11:35:02,911][trainer][INFO] - loss/edit_train: 0.85877; loss/loc_train: 0.03357; edit/acc_train: 0.68000; edit/log_prob_train: -0.85877; edit/prob_train: 0.65513; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.01711; perplexity/post_train: 1.01726; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30573; loss/total_train: 0.11945; loss/total_edit_train: 0.11945; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 319.28976; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00030
[2023-03-21 11:36:10,059][trainer][INFO] - Step 3800:
[2023-03-21 11:36:10,059][trainer][INFO] - loss/edit_train: 0.72800; loss/loc_train: 0.08730; edit/acc_train: 0.73000; edit/log_prob_train: -0.72800; edit/prob_train: 0.67978; acc/pre_train: 1.00000; acc/post_train: 0.96000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.09454; perplexity/post_train: 1.09916; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30261; loss/total_train: 0.16010; loss/total_edit_train: 0.16010; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 370.92647; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00025; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00030
[2023-03-21 11:37:16,499][trainer][INFO] - Step 3900:
[2023-03-21 11:37:16,500][trainer][INFO] - loss/edit_train: 0.69181; loss/loc_train: 0.01780; edit/acc_train: 0.70000; edit/log_prob_train: -0.69181; edit/prob_train: 0.68750; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00698; perplexity/post_train: 1.00700; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30268; loss/total_train: 0.08698; loss/total_edit_train: 0.08698; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 235.87843; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00025; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00030
[2023-03-21 11:38:23,981][trainer][INFO] - Step 4000:
[2023-03-21 11:38:23,982][trainer][INFO] - loss/edit_train: 0.96372; loss/loc_train: 0.04235; edit/acc_train: 0.68000; edit/log_prob_train: -0.96372; edit/prob_train: 0.68085; acc/pre_train: 0.99000; acc/post_train: 0.98000; nll/pre_train: 0.00792; perplexity/pre_train: 1.00795; nll/post_train: 0.03705; perplexity/post_train: 1.03774; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30521; loss/total_train: 0.13872; loss/total_edit_train: 0.13872; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 660.56594; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00025; lr/lr3_train: 0.00016; lr/lr4_train: 0.00027; lr/lr5_train: 0.00030
[2023-03-21 11:39:31,226][trainer][INFO] - Step 4100:
[2023-03-21 11:39:31,226][trainer][INFO] - loss/edit_train: 0.99848; loss/loc_train: 0.05177; edit/acc_train: 0.73000; edit/log_prob_train: -0.99848; edit/prob_train: 0.71656; acc/pre_train: 1.00000; acc/post_train: 0.97000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.07467; perplexity/post_train: 1.07753; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30360; loss/total_train: 0.15162; loss/total_edit_train: 0.15162; memory/alloc_max_train: 6173932544.00000; memory/res_max_train: 6643777536.00000; grad_train: 2050.13070; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00025; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00030
[2023-03-21 11:40:38,089][trainer][INFO] - Step 4200:
[2023-03-21 11:40:38,089][trainer][INFO] - loss/edit_train: 0.48124; loss/loc_train: 0.02703; edit/acc_train: 0.79000; edit/log_prob_train: -0.48124; edit/prob_train: 0.75274; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.02025; perplexity/post_train: 1.02046; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30267; loss/total_train: 0.07516; loss/total_edit_train: 0.07516; memory/alloc_max_train: 6173932569.60000; memory/res_max_train: 6643777536.00000; grad_train: 332.63607; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00025; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00031
[2023-03-21 11:41:45,298][trainer][INFO] - Step 4300:
[2023-03-21 11:41:45,298][trainer][INFO] - loss/edit_train: 0.80806; loss/loc_train: 0.04996; edit/acc_train: 0.78000; edit/log_prob_train: -0.80806; edit/prob_train: 0.74812; acc/pre_train: 0.98000; acc/post_train: 0.95000; nll/pre_train: 0.07927; perplexity/pre_train: 1.08249; nll/post_train: 0.12343; perplexity/post_train: 1.13137; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30503; loss/total_train: 0.13076; loss/total_edit_train: 0.13076; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 671.43915; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00031
[2023-03-21 11:42:52,164][trainer][INFO] - Step 4400:
[2023-03-21 11:42:52,164][trainer][INFO] - loss/edit_train: 0.68724; loss/loc_train: 0.01237; edit/acc_train: 0.77000; edit/log_prob_train: -0.68724; edit/prob_train: 0.72554; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00002; perplexity/pre_train: 1.00002; nll/post_train: 0.00345; perplexity/post_train: 1.00346; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30239; loss/total_train: 0.08109; loss/total_edit_train: 0.08109; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 307.95980; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00032
[2023-03-21 11:43:59,272][trainer][INFO] - Step 4500:
[2023-03-21 11:43:59,273][trainer][INFO] - loss/edit_train: 0.75092; loss/loc_train: 0.02451; edit/acc_train: 0.81000; edit/log_prob_train: -0.75092; edit/prob_train: 0.76508; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.02671; perplexity/post_train: 1.02707; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30309; loss/total_train: 0.09960; loss/total_edit_train: 0.09960; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 247.97887; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00032
[2023-03-21 11:45:07,411][trainer][INFO] - Step 4600:
[2023-03-21 11:45:07,412][trainer][INFO] - loss/edit_train: 1.07446; loss/loc_train: 0.02656; edit/acc_train: 0.76000; edit/log_prob_train: -1.07446; edit/prob_train: 0.73379; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.01066; perplexity/post_train: 1.01072; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30796; loss/total_train: 0.13401; loss/total_edit_train: 0.13401; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 649.84667; lr/lr0_train: 0.00020; lr/lr1_train: 0.00013; lr/lr2_train: 0.00024; lr/lr3_train: 0.00016; lr/lr4_train: 0.00028; lr/lr5_train: 0.00032
[2023-03-21 11:46:14,676][trainer][INFO] - Step 4700:
[2023-03-21 11:46:14,676][trainer][INFO] - loss/edit_train: 0.66639; loss/loc_train: 0.01012; edit/acc_train: 0.78000; edit/log_prob_train: -0.66639; edit/prob_train: 0.77308; acc/pre_train: 1.00000; acc/post_train: 1.00000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.00554; perplexity/post_train: 1.00556; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30319; loss/total_train: 0.07676; loss/total_edit_train: 0.07676; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 256.34840; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00017; lr/lr4_train: 0.00027; lr/lr5_train: 0.00032
[2023-03-21 11:47:21,799][trainer][INFO] - Step 4800:
[2023-03-21 11:47:21,800][trainer][INFO] - loss/edit_train: 0.69778; loss/loc_train: 0.05010; edit/acc_train: 0.84000; edit/log_prob_train: -0.69778; edit/prob_train: 0.78220; acc/pre_train: 1.00000; acc/post_train: 0.98000; nll/pre_train: 0.00001; perplexity/pre_train: 1.00001; nll/post_train: 0.05934; perplexity/post_train: 1.06114; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30435; loss/total_train: 0.11988; loss/total_edit_train: 0.11988; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 543.64795; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00017; lr/lr4_train: 0.00028; lr/lr5_train: 0.00032
[2023-03-21 11:48:28,959][trainer][INFO] - Step 4900:
[2023-03-21 11:48:28,960][trainer][INFO] - loss/edit_train: 0.81081; loss/loc_train: 0.03619; edit/acc_train: 0.80000; edit/log_prob_train: -0.81081; edit/prob_train: 0.74824; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00003; perplexity/pre_train: 1.00003; nll/post_train: 0.03456; perplexity/post_train: 1.03516; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30402; loss/total_train: 0.11727; loss/total_edit_train: 0.11727; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 195.74557; lr/lr0_train: 0.00020; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00017; lr/lr4_train: 0.00028; lr/lr5_train: 0.00033
[2023-03-21 11:49:36,138][trainer][INFO] - Step 5000:
[2023-03-21 11:49:36,138][trainer][INFO] - loss/edit_train: 0.48372; loss/loc_train: 0.02466; edit/acc_train: 0.82000; edit/log_prob_train: -0.48372; edit/prob_train: 0.80609; acc/pre_train: 1.00000; acc/post_train: 0.99000; nll/pre_train: 0.00000; perplexity/pre_train: 1.00000; nll/post_train: 0.02163; perplexity/post_train: 1.02186; n_tokens/pre_train: 1.00000; n_tokens/post_train: 1.00000; time/edit_train: 0.30448; loss/total_train: 0.07303; loss/total_edit_train: 0.07303; memory/alloc_max_train: 6173935104.00000; memory/res_max_train: 6643777536.00000; grad_train: 239.01756; lr/lr0_train: 0.00021; lr/lr1_train: 0.00014; lr/lr2_train: 0.00024; lr/lr3_train: 0.00017; lr/lr4_train: 0.00028; lr/lr5_train: 0.00033