-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hager_Zhang first commit #1344
base: main
Are you sure you want to change the base?
Hager_Zhang first commit #1344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alisterde
This looks BIG! I've added some initial comments on coding and style, but I'll need some more explanation before I can review what the code actually does
@@ -0,0 +1,899 @@ | |||
# | |||
# Fixed learning-rate gradient descent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Fixed learning-rate gradient descent. | |
# Hager-Zhang line search. |
from numpy.linalg import norm | ||
|
||
|
||
class HagerZhang(pints.Optimiser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to have this consistent with the file name, so either HagerZhangLineSearch or _hager_zhang.py
|
||
class HagerZhang(pints.Optimiser): | ||
""" | ||
Gradient-descent method with a fixed learning rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring with references etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).
from numpy.linalg import norm | ||
|
||
|
||
class HagerZhang(pints.Optimiser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an interface for line search methods? E.g. class HagerZhang(pints.LineSearch)
with class LineSearch(pints.Optimiser)
?
def __init__(self, x0, sigma0=None, boundaries=None): | ||
super(HagerZhang, self).__init__(x0, sigma0, boundaries) | ||
|
||
# Set optimiser state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this method only works for 1d problems. Should there be a check in the constructor for this? Or in a LineSearch class that this one extends?
# the opposite slope condition (see function definition) | ||
self.theta = 0.5 | ||
|
||
self.__gamma = 0.66 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not using __ anywhere in PINTS, just _ to mark this as private
|
||
def __initialising(self, k, alpha_k0): | ||
''' | ||
This function is part of the Hager-Zhang line search method [1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line's not really needed!
self._current_f = np.asarray(proposed_f) | ||
self._current_dfdx = np.asarray(proposed_dfdx) | ||
|
||
elif self.wolfe_check is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif self.wolfe_check is True: | |
elif self.wolfe_check: |
""" | ||
Gradient-descent method with a fixed learning rate. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a big long # comment
here explaining the structure of this class and how the yield
stuff is supposed to work
|
||
self.__gamma = 0.66 | ||
|
||
# range (0, 1) small factor used in initial guess of step size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's quite a lot of properties declared down here. Seems this method has a lot of "state"!
- Are they all strictly necessary?
- And if so, would there be any sense in bundling them per subtask, e.g. some object for wolfe-related properties, something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?
r, x, s, b, px = self.problem() | ||
opt = pints.OptimisationController(r, x, method=method) | ||
m = opt.optimiser() | ||
_B = np.identity(m._n_parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the underscore?
b = 1.0 | ||
|
||
def temp(a=a, b=b, m=m): | ||
generator = m._HagerZhang__bisect_or_secant(a=a, b=b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the best way to test these is. As a rule of thumb you don't test the private API. So we need to figure out if this is an exception or if there's something we're not seeing that would break this down into smaller testable blocks with public methods
Maybe we can link to this somewhere in the file. I can never find it :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good @alisterde -- it's a beast of a method! My points are mainly surface level things as I'm finding following the logic tricky. As Michael says, having some description in the docstrings that describes the ask/telling would, I think, really help.
|
||
class HagerZhang(pints.Optimiser): | ||
""" | ||
Gradient-descent method with a fixed learning rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).
|
||
# As c1 approaches 0 and c2 approaches 1, the line search | ||
# terminates more quickly. | ||
self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the Armijo rule in the paper? Could you point to where this is from? I'm guessing this is "delta" on page 184 of Hager and Zhang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, where the default value was taken from?
# As c1 approaches 0 and c2 approaches 1, the line search | ||
# terminates more quickly. | ||
self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5 | ||
self._c2 = 0.9 # Parameter for curvature condition rule, c1 < c2 < 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is sigma on page 184 of the paper? Of course, we don't need to use the same parameter names as them but it might help in places to cross-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as above for the default value.
# range (0, 1), used in the ``self.__update()`` and | ||
# ``self.__initial_bracket()`` when the potential intervals violate | ||
# the opposite slope condition (see function definition) | ||
self.theta = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, would be good to know where the default value comes from here.
|
||
self.__gamma = 0.66 | ||
|
||
# range (0, 1) small factor used in initial guess of step size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?
proposed_grad = np.matmul(np.transpose(proposed_dfdx), self.__px) | ||
|
||
wolfe_curvature = (proposed_grad >= self._c2 * | ||
np.matmul(np.transpose(self._current_dfdx), | ||
self.__px)) | ||
|
||
exact_wolfe_suff_dec = (self._c1 * | ||
np.matmul(np.transpose(self._current_dfdx), | ||
self.__px) | ||
>= proposed_f - self._current_f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.matmul(np.transpose(self._current_dfdx), self.__px)
seems to be being repeated three times here? I'd precalculate it and then reuse.
|
||
# Checking if approximate wolfe conditions are meet. | ||
apprx_wolfe_suff_dec = ((2.0 * self._c1 - 1.0) * | ||
np.matmul(np.transpose(self._current_dfdx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another time.
approximate_wolfe = (apprx_wolfe_suff_dec and wolfe_curvature | ||
and apprx_wolfe_applies) | ||
|
||
# If wolfe conditions are meet the line search is stopped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
met
# is only decreased in subsequent boundary manipulation. | ||
return ps_2 * alpha_k0 | ||
|
||
def __very_close(self, x, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_very_close
? That makes it clear that it returns a Boolean.
suitable interval is found that contains the minimum. | ||
''' | ||
|
||
# (steps B0 from [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing it but I can't see "B0" in the paper?
@MichaelClerx @ben18785
I've got an initial draft of the Hager-Zhang line search algorithm #1160 it is passing all of the tests I've written and appears to be functional.
However, I want to write more tests to check specific conditions in the logic before I'll be confident in the algorithm, I also think I need to add a few more convergence checks but I'll address that tomorrow.
I would appreciate any initial thoughts you have about it.