Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何设置learning rate decay?Annealing the learning rate #1167

Closed
OleNet opened this issue Jan 17, 2017 · 8 comments
Closed

如何设置learning rate decay?Annealing the learning rate #1167

OleNet opened this issue Jan 17, 2017 · 8 comments
Assignees

Comments

@OleNet
Copy link
Contributor

OleNet commented Jan 17, 2017

optimizers.py里面,我看到了设置learning rate decay的参数, 不过有两个,learning_rate_decay_a=0.,
learning_rate_decay_b=0.,
请问这两个参数有什么区别呢、分别代表什么含义呢?我应该用哪一个呢?
似乎并没有相关的wiki、文档记录呢。

@reyoung reyoung self-assigned this Jan 17, 2017
@reyoung
Copy link
Collaborator

reyoung commented Jan 17, 2017

也对,这里应该加一下注释。。我下午加一下。。

@OleNet
Copy link
Contributor Author

OleNet commented Jan 17, 2017

多谢多谢

@reyoung
Copy link
Collaborator

reyoung commented Jan 17, 2017

See #1170

不过里面加了一些强类型的Wrapper,如果手动的设置这几个参数,就参考下这里的XXXLRS怎么实现的吧。

@lcy-seso
Copy link
Contributor

lcy-seso commented Jan 18, 2017

简单的贴一下之前的文档,供参考:

  • "learning_rate":学习率
  • "learning_rate_a"和"learning_rate_b":学习率衰减参数,具体衰减公式由learning_rate_schedule决定
    "learning_rate_schedule":配置不同的学习率递减模式,包括:
  • 1), "constant": lr = learning_rate
  • 2), "poly": lr = learning_rate * pow(1 + learning_rate_decay_a * num_samples_processed, -learning_rate_decay_b)
  • 3), "exp": lr = learning_rate * pow(learning_rate_decay_a, num_samples_processed / learning_rate_decay_b)
  • 4), "discexp": lr = learning_rate * pow(learning_rate_decay_a, floor(num_samples_processed / learning_rate_decay_b))
  • 5), "linear": lr = max(learning_rate - learning_rate_decay_a * num_sample_passed, learning_rate_decay_b)

特别地,自适应学习率算法 adagrad, adadelta, rmsprop 不能通过learning_rate_schedule进行学习率衰减,momentum 可以通过learning_rate_schedule进行学习率衰减

@lcy-seso
Copy link
Contributor

lcy-seso commented Jan 18, 2017

@qingqing01 提醒说,代码里面目前的逻辑 learning_rate_schedule 也会修改 adagrad, adadelta, rmsprop 的全局学习率。修正上一条的说法。

@THUHJ
Copy link

THUHJ commented Jul 11, 2017

您好,请问num_samples_processed具体是训练了多少个sample嘛?还是多少个batch?

然后第五个 "linear": lr = max(learning_rate - learning_rate_decay_a, learning_rate_decay_b) ,lr与num_samples_processed无关,不随着训练变化的嘛?

另外有没有能从optimizer得到当前的learing rate的方法呢?我看了一下code,并没有找到接口。使用optimizer.dict["opt_conf_proto"].learning_rate 也只能获得最开始初始的lr.

@lcy-seso
Copy link
Contributor

  1. num_samples_processed 是到目前为止处理了多少训练样本。
  2. 线性decay时,检查了代码,你是正确的,以上的回复有误。计算策略如下:
    max(learning_rate - lr_decay_a * num_sample_passed, lr_decay_b);
  3. 目前无法查看当前学习率, 学习率是一个矩阵,v2 目前还没有支持输出学习率统计量的功能。0.9.0时,设置 show_parameter_stats_period参数,单机运行时会输出学习率的统计量,例如均值,最大值,最小值等。

@THUHJ
Copy link

THUHJ commented Jul 19, 2017

好的~谢谢!

@reyoung reyoung closed this as completed Aug 1, 2017
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019
* refine flags doc,test=develop

* follow comments, test=develop
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
* fix_demo

* fix_demo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants