-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple gradient descent optimiser #1104
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1104 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 69 70 +1
Lines 7343 7413 +70
=========================================
+ Hits 7343 7413 +70
Continue to review full report at Codecov.
|
@ben18785 & @alisterde could you have a quick look at this PR for me? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelClerx it looks good to me and the structure will be really useful for the BFGS method.
Thanks @alisterde ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaelClerx -- a really nice and helpful addition.
I have made some changes to the example notebook: the reason the logistic problem wasn't converging was that the parameter scales are so different for this case. I rescaled them and now gradient descent runs fine.
Otherwise, have a look at my line comment about the name and see what you think.
Finally, shouldn't we have some tests for the parabolic error sensitivities?
Thanks @ben18785 ! I quite like how the notebook suggests improvements to this method :D |
Have added some tests and changed the name now @ben18785 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me code-wise.
Is this the canonical "gradient descent" optimiser or, for instance, are there examples with non-constant learning rate? I.e. is the name going to need to become more specific at some point in the future when someone implements a similar algorithm?
(Maybe for @ben18785)
It's a good question, we'll probably want to name it something more specific in the future. (Gradient descent is one of those things that became a class of methods, I suppose) But to be honest I'm not super worried about that, because this won't really be used in practice |
Then again, @fcooper8472 , the more specific ones do seem to have specific names, so could be fine to leave this as is even when we add more? See #1105 Someone will probably do these for fun at some point :D |
Sounds good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- thanks. I'll go ahead and merge
Closes #930