Hager_Zhang first commit #1344

alisterde · 2021-05-24T15:59:17Z

@MichaelClerx @ben18785
I've got an initial draft of the Hager-Zhang line search algorithm #1160 it is passing all of the tests I've written and appears to be functional.
However, I want to write more tests to check specific conditions in the logic before I'll be confident in the algorithm, I also think I need to add a few more convergence checks but I'll address that tomorrow.

I would appreciate any initial thoughts you have about it.

MichaelClerx

Thanks @alisterde

This looks BIG! I've added some initial comments on coding and style, but I'll need some more explanation before I can review what the code actually does

MichaelClerx · 2021-05-25T10:22:01Z

pints/_optimisers/_hager_zhang_line_search.py

@@ -0,0 +1,899 @@
+#
+# Fixed learning-rate gradient descent.


Suggested change

# Fixed learning-rate gradient descent.

# Hager-Zhang line search.

MichaelClerx · 2021-05-25T10:22:37Z

pints/_optimisers/_hager_zhang_line_search.py

+from numpy.linalg import norm
+
+
+class HagerZhang(pints.Optimiser):


Best to have this consistent with the file name, so either HagerZhangLineSearch or _hager_zhang.py

MichaelClerx · 2021-05-25T10:22:54Z

pints/_optimisers/_hager_zhang_line_search.py

+
+class HagerZhang(pints.Optimiser):
+    """
+    Gradient-descent method with a fixed learning rate.


Needs a docstring with references etc.

Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).

MichaelClerx · 2021-05-25T10:24:53Z

pints/_optimisers/_hager_zhang_line_search.py

+from numpy.linalg import norm
+
+
+class HagerZhang(pints.Optimiser):


Do we need an interface for line search methods? E.g. class HagerZhang(pints.LineSearch) with class LineSearch(pints.Optimiser) ?

MichaelClerx · 2021-05-25T10:25:34Z

pints/_optimisers/_hager_zhang_line_search.py

+    def __init__(self, x0, sigma0=None, boundaries=None):
+        super(HagerZhang, self).__init__(x0, sigma0, boundaries)
+
+        # Set optimiser state


I take it this method only works for 1d problems. Should there be a check in the constructor for this? Or in a LineSearch class that this one extends?

MichaelClerx · 2021-05-25T10:28:35Z

pints/_optimisers/_hager_zhang_line_search.py

+        # the opposite slope condition (see function definition)
+        self.theta = 0.5
+
+        self.__gamma = 0.66


We're not using __ anywhere in PINTS, just _ to mark this as private

MichaelClerx · 2021-05-25T10:31:04Z

pints/_optimisers/_hager_zhang_line_search.py

+
+    def __initialising(self, k, alpha_k0):
+        '''
+        This function is part of the Hager-Zhang line search method [1].


This line's not really needed!

MichaelClerx · 2021-05-25T10:32:08Z

pints/_optimisers/_hager_zhang_line_search.py

+            self._current_f = np.asarray(proposed_f)
+            self._current_dfdx = np.asarray(proposed_dfdx)
+
+        elif self.wolfe_check is True:


Suggested change

elif self.wolfe_check is True:

elif self.wolfe_check:

MichaelClerx · 2021-05-25T10:34:37Z

pints/_optimisers/_hager_zhang_line_search.py

+    """
+    Gradient-descent method with a fixed learning rate.
+    """
+


I think we should add a big long # comment here explaining the structure of this class and how the yield stuff is supposed to work

MichaelClerx · 2021-05-25T10:36:34Z

pints/_optimisers/_hager_zhang_line_search.py

+
+        self.__gamma = 0.66
+
+        # range (0, 1) small factor used in initial guess of step size


There's quite a lot of properties declared down here. Seems this method has a lot of "state"!

Are they all strictly necessary?

And if so, would there be any sense in bundling them per subtask, e.g. some object for wolfe-related properties, something like that?

I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?

MichaelClerx · 2021-05-25T10:53:29Z

pints/tests/test_opt_hager_zhang.py

+        r, x, s, b, px = self.problem()
+        opt = pints.OptimisationController(r, x, method=method)
+        m = opt.optimiser()
+        _B = np.identity(m._n_parameters)


Why the underscore?

MichaelClerx · 2021-05-25T10:57:45Z

pints/tests/test_opt_hager_zhang.py

+        b = 1.0
+
+        def temp(a=a, b=b, m=m):
+            generator = m._HagerZhang__bisect_or_secant(a=a, b=b)


Not sure what the best way to test these is. As a rule of thumb you don't test the private API. So we need to figure out if this is an exception or if there's something we're not seeing that would break this down into smaller testable blocks with public methods

MichaelClerx · 2021-05-25T11:51:16Z

Maybe we can link to this somewhere in the file. I can never find it :D
https://docs.python.org/3.8/reference/expressions.html#yield-expressions

MichaelClerx · 2021-05-25T12:03:21Z

https://docs.python.org/3.8/glossary.html#term-asynchronous-generator
https://docs.python.org/3/reference/compound_stmts.html#coroutines

alisterde · 2021-06-01T11:15:46Z

This is the paper referenced:
https://epubs.siam.org/doi/abs/10.1137/030601880?casa_token=GhCspIQb5DUAAAAA:aVcINgBT6bK9q154vVI9UDxbnPuwNEUCc-Ph_tTVKR6hyfEjkWsk6xNZD6KQMG0ZGH7Xa8XnVCplew

ben18785

Looks really good @alisterde -- it's a beast of a method! My points are mainly surface level things as I'm finding following the logic tricky. As Michael says, having some description in the docstrings that describes the ask/telling would, I think, really help.

ben18785 · 2021-06-01T14:09:48Z

pints/_optimisers/_hager_zhang_line_search.py

+
+class HagerZhang(pints.Optimiser):
+    """
+    Gradient-descent method with a fixed learning rate.


Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).

ben18785 · 2021-06-01T14:16:13Z

pints/_optimisers/_hager_zhang_line_search.py

+
+        # As c1 approaches 0 and c2 approaches 1, the line search
+        # terminates more quickly.
+        self._c1 = 1E-4  # Parameter for Armijo condition rule, 0 < c1 < 0.5


I couldn't find the Armijo rule in the paper? Could you point to where this is from? I'm guessing this is "delta" on page 184 of Hager and Zhang?

Also, where the default value was taken from?

ben18785 · 2021-06-01T14:18:14Z

pints/_optimisers/_hager_zhang_line_search.py

+        # As c1 approaches 0 and c2 approaches 1, the line search
+        # terminates more quickly.
+        self._c1 = 1E-4  # Parameter for Armijo condition rule, 0 < c1 < 0.5
+        self._c2 = 0.9  # Parameter for curvature condition rule, c1 < c2 < 1.0


I'm guessing this is sigma on page 184 of the paper? Of course, we don't need to use the same parameter names as them but it might help in places to cross-check.

Similarly as above for the default value.

ben18785 · 2021-06-01T14:19:50Z

pints/_optimisers/_hager_zhang_line_search.py

+        # range (0, 1), used in the ``self.__update()`` and
+        # ``self.__initial_bracket()`` when the potential intervals violate
+        # the opposite slope condition (see function definition)
+        self.theta = 0.5


Again, would be good to know where the default value comes from here.

ben18785 · 2021-06-01T14:22:25Z

pints/_optimisers/_hager_zhang_line_search.py

+
+        self.__gamma = 0.66
+
+        # range (0, 1) small factor used in initial guess of step size


I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?

ben18785 · 2021-06-01T14:44:28Z

pints/_optimisers/_hager_zhang_line_search.py

+            proposed_grad = np.matmul(np.transpose(proposed_dfdx), self.__px)
+
+            wolfe_curvature = (proposed_grad >= self._c2 *
+                               np.matmul(np.transpose(self._current_dfdx),
+                                         self.__px))
+
+            exact_wolfe_suff_dec = (self._c1 *
+                                    np.matmul(np.transpose(self._current_dfdx),
+                                              self.__px)
+                                    >= proposed_f - self._current_f)


np.matmul(np.transpose(self._current_dfdx), self.__px) seems to be being repeated three times here? I'd precalculate it and then reuse.

ben18785 · 2021-06-01T14:44:45Z

pints/_optimisers/_hager_zhang_line_search.py

+
+            # Checking if approximate wolfe conditions are meet.
+            apprx_wolfe_suff_dec = ((2.0 * self._c1 - 1.0) *
+                                    np.matmul(np.transpose(self._current_dfdx),


Another time.

ben18785 · 2021-06-01T14:45:12Z

pints/_optimisers/_hager_zhang_line_search.py

+            approximate_wolfe = (apprx_wolfe_suff_dec and wolfe_curvature
+                                 and apprx_wolfe_applies)
+
+            # If wolfe conditions are meet the line search is stopped.


ben18785 · 2021-06-01T14:46:58Z

pints/_optimisers/_hager_zhang_line_search.py

+            # is only decreased in subsequent boundary manipulation.
+            return ps_2 * alpha_k0
+
+    def __very_close(self, x, y):


is_very_close? That makes it clear that it returns a Boolean.

ben18785 · 2021-06-01T14:48:44Z

pints/_optimisers/_hager_zhang_line_search.py

+                    suitable interval is found that contains the minimum.
+        '''
+
+        # (steps B0 from [1])


Perhaps I'm missing it but I can't see "B0" in the paper?

Hager_Zhang first commit

d573cb8

alisterde requested review from MichaelClerx and ben18785 May 24, 2021 15:59

MichaelClerx reviewed May 25, 2021

View reviewed changes

ben18785 requested changes Jun 1, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hager_Zhang first commit #1344

Hager_Zhang first commit #1344

alisterde commented May 24, 2021

MichaelClerx left a comment

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

ben18785 Jun 1, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

ben18785 Jun 1, 2021

MichaelClerx May 25, 2021

MichaelClerx May 25, 2021

MichaelClerx commented May 25, 2021

MichaelClerx commented May 25, 2021

alisterde commented Jun 1, 2021

ben18785 left a comment

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

ben18785 Jun 1, 2021

	# Fixed learning-rate gradient descent.
	# Hager-Zhang line search.

		from numpy.linalg import norm


		class HagerZhang(pints.Optimiser):


		self.__gamma = 0.66

		# range (0, 1) small factor used in initial guess of step size

Hager_Zhang first commit #1344

Are you sure you want to change the base?

Hager_Zhang first commit #1344

Conversation

alisterde commented May 24, 2021

MichaelClerx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaelClerx commented May 25, 2021

MichaelClerx commented May 25, 2021

alisterde commented Jun 1, 2021

ben18785 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment