The world’s leading publication for data science, AI, and ML professionals.

Combining Two Optimizers for a Powerful Method to Train Your Model

When Levenberg-Marquardt Meets Quasi-Newton. And Yes, We Build It from Scratch with Python!

Hands-on Tutorials

Photo by Joshua Aragon on Unsplash
Photo by Joshua Aragon on Unsplash
Table of Contents (read till the end to see how you can get the complete python code of this story)
Β· Seeing the Big Picture
Β· Defining The Problem
Β· A Hybrid Method
  ∘ Objective Function
  ∘ General Algorithm
  ∘ Levenberg-Marquardt Step
  ∘ Quasi-Newton Step
Β· Implementation
Β· Conclusion

Seeing the Big Picture

In previous articles, we’ve seen Gradient Descent and Conjugate Gradient algorithm in action, as two of the simplest optimization method there is. We implemented line search for searching the direction to which the objective function is optimized.

There is another way to generate steps called trust-region. Trust-region methods define a region around the current iterate within which they trust the model to be an adequate representation of the objective function, and then choose the step to be the approximate minimizer of the model in this region. In effect, unlike line search, they choose the direction and length of the step simultaneously.

There is also a whole lot of other optimization methods such as Newton’s method and Quasi-Newton methods. In this article, we will use a hybrid of the Levenberg-Marquardt and Quasi-Newton method, utilizing trust-region for the step choice. Furthermore, this article brings a different problem. Instead of finding an optimizer of an objective function, we will be given a dataset and a non-linear model of the data, and then find the best optimized parameters of the model. Optimized parameters are defined by those who give the least squares of residuals. Some people might call this a curve-fitting problem.

Let’s import some libraries.

Defining The Problem

Create a fictional dataset as follows. It has 11 observations and 2 features with the first feature acting like an index of the second.

t     y
------------------
[[ 0.0000 0.0000]
 [ 2.0000 3.5500]
 [ 4.0000 3.8200]
 [ 6.0000 2.9800]
 [ 8.0000 2.3200]
 [10.0000 1.4800]
 [12.0000 1.0200]
 [14.0000 0.8100]
 [16.0000 0.4100]
 [18.0000 0.4200]
 [20.0000 0.1500]]
Data shape: (11, 2)

Visualize the data as points (t, y) = (tα΅’, yα΅’) with i = 1, 2, …, 11 in the cartesius plane.

Image by author
Image by author

We immediately realize that the data has no linear trend (the point (0,0) here is not an outlier). Hence, the model that we use is also non-linear. Instead, we will use a linear combination of two exponential functions, which by definition is declared as

As mentioned before, parameters x = [x₁, xβ‚‚, x₃, xβ‚„] will be estimated using a hybrid method of Levenberg-Marquardt (LM) dan Quasi-Newton (QN).

A Hybrid Method

Objective Function

Image by vectors-market on flaticon
Image by vectors-market on flaticon

In general, we want to build a model M(x, t) for a collection of m data points (tα΅’, yα΅’). To do that, a concrete definition of the objective function is necessary. For a least-squares problem, this objective function is the sum of squares of the residual of each data point. A data point’s residual is defined by how far the actual data to the model prediction is, that is, by subtracting the actual data point by the model prediction of that point. Mathematically speaking, the residual for the i-th data point is

or in the vector form

Since the objective function is the sum of squares of the residual of each data point, it can be written as

The coefficient Β½ above is attached for convenience in derivating F.

Let ǁ⋅ǁ be a symbol for the maximum norm of a vector. Our mission is to find x which minimizes F. To do that, we will find x at which ǁg(x)ǁ β‰ˆ 0 where g = F’ is the first derivative of F. Derivating

w.r.t. each xβ±Ό using the chain rule, we obtain

Define the Jacobian matrix

the value of g can then be calculated as

For the python function below, besides returning g, it also returns J, which will come in handy later.

General Algorithm

Image by becris on flaticon
Image by becris on flaticon

Let x* be the minimizer of the sum of squared residuals. The least-squares problem can be solved by general optimization methods, but we shall present special methods that are more efficient. In 1988, Madsen [1] presented a hybrid method that combines the LM method (quadratic convergence if F(x*) = 0, linear convergence otherwise) with a QN method, which gives superlinear convergence, even if F(x*) β‰  0.

In the trust region method, the algorithm generates a trust model L of the behavior of F in the neighborhood of the current iterate x,

where h is the step to be taken. We assume that we know a positive number Ξ” such that the model is sufficiently accurate inside a ball with radius Ξ”, centered at x, and the algorithm determines the step as

In a damped version, the step is determined as

where the damping parameter ΞΌ β‰₯ 0. The term

is introduced to penalize large steps.

From now on, we will begin iterating. Hence, we will introduce an index k which means the variable in question is for the k-th iteration. For example, xβ‚– is the value of x at k-th iteration, and so Fβ‚– = F(xβ‚–), gβ‚– = g(xβ‚–), Jβ‚– = J(xβ‚–), and so on.

Before the iteration begins, it’s important to initialize the damping parameter ΞΌ = _ΞΌ_β‚€ and Hessian approximation _B_β‚€ (Hessian at iterate x is defined by F”(x) and its approximation B is used by the QN method).

A reasonable choice for _B_β‚€ is a symmetric, positive definite matrix I, that is, the identity matrix. The choice of _ΞΌ_β‚€ should be related to the size of the elements in

e.g. by letting

where _aα΅’α΅’_⁰ is the i-th diagonal component of _A_β‚€, and Ο„ is chosen by the user. In this article, we will set Ο„ = 1 Γ— 10⁻³.

For the python function below, besides returning _ΞΌ_β‚€, it also returns _A_β‚€, which will come in handy later.

After initializing _ΞΌ_β‚€ and _B_β‚€, the iteration starts with a series of steps with the LM method. If the performance indicates that F(x*) is significantly nonzero, then we switch to the QN method for better performance. We may get an indication that it is better to switch back to the LM method, so there is also a mechanism for that which will be covered later. The iteration keeps going until one of the following stopping criteria is satisfied:

  • The norm of the gradient of the objective function is close enough to zero, that is,
  • The norm of the step satisfies
  • The number of steps taken is 1000

In this article, _Ξ΅_₁ and _Ξ΅_β‚‚ will be set 1 Γ— 10⁻⁡.

After each step, whether it’s through the LM or QN method, Bβ‚– is updated by first defining

If

then update Bβ‚– by

where vβ‚– = Bβ‚–hβ‚–. Otherwise, Bβ‚– stays the same, that is, Bβ‚–β‚Šβ‚ = Bβ‚–.

The new xβ‚–β‚Šβ‚ = xβ‚– + hβ‚– found by the LM or QN method is then passed to the next iteration. This whole process could be written as the code below.

Now, let’s go into the detail of each step. What’s going on in the LMstep() and QNstep() functions above?

Levenberg-Marquardt Step

At the beginning of hybrid iterations, the LM method is used first. At k-th iteration, step hβ‚– is found by solving

and then calculate

where Fβ‚–β‚Šβ‚ is just F(xβ‚– + hβ‚–).

We then consider two cases:

  1. If Ο± > 0, this means the iteration is a success and we can update xβ‚–β‚Šβ‚ = xβ‚– + hβ‚–. At this point, we need to check whether F(x*) is far from zero, that is, by checking: if ǁ_gβ‚–_ǁ < 0.02 β‹… Fβ‚– for three consecutive successful iterations, then we will change to the QN method for the next iteration. Also, update
  1. If Ο± ≀ 0, then the iteration does not succeed and we won’t update xβ‚–, that is, xβ‚–β‚Šβ‚ = xβ‚–. For this case, update ΞΌβ‚–β‚Šβ‚ = ΞΌβ‚–Ξ½ and Ξ½ := 2Ξ½.

The update method of ΞΌ mentioned above is developed by Nielsen. We also have another variation by Marquardt as can be observed below.

The overall LM step can be seen as follows. Note that in the code below, we are passing Ξ”β‚–, which is the trust-region radius that will be used by the QN step in case we change the step calculation method to QN in future iterations. We don’t use Ξ”β‚– here.

Quasi-Newton Step

Since the QN method uses trust-region for step calculation, at the beginning when we move to the QN method, it’s important to initialize the trust-region radius

where hβ‚—β‚˜ is the step acquired by the previous LM step. But don’t worry, this initialization part is not needed since we always calculate Ξ”β‚– in both LM and QN steps. As you can see, the last formula above is already embedded in the LMstep() function line 14.

At k-th iteration, step hβ‚– is found by solving

If ǁ_hβ‚–_ǁ > Ξ”β‚–, then reduce the length of hβ‚– to be Ξ”β‚–, that is, by setting

Ξ”β‚– is then updated under several conditions of Ο±, which is calculated using the same formula as that we use at LM step:

  • If Ο± < 0.25, then set Ξ”β‚–β‚Šβ‚ = Ξ”β‚– / 2
  • If Ο± > 0.75, then set Ξ”β‚–β‚Šβ‚ = max{Ξ”β‚–, 3 ǁ_hβ‚–_ǁ}
  • Otherwise, Ξ”β‚–β‚Šβ‚ = Ξ”β‚–

    Next, we consider two cases:

  1. If Fβ‚–β‚Šβ‚ ≀ (1 + Ξ΄) Fβ‚– and ǁ_gβ‚–β‚Šβ‚_ǁ < ǁ_gβ‚–_ǁ, or Fβ‚–β‚Šβ‚ < Fβ‚–, then the iteration succeed and we can update xβ‚–β‚Šβ‚ = xβ‚– + hβ‚–. At this point, we focus on getting g closer to zero, so we accept a slight increase in the value of F, e.g. as little as Ξ΄ = √Ρ times the previous F, where Ξ΅ is the machine epsilon.
  2. Otherwise, the iteration is considered a failure, and xβ‚– won’t be updated, that is, xβ‚–β‚Šβ‚ = xβ‚–.

If the gradient does not decrease fast enough, that is, ǁ_gβ‚–β‚Šβ‚_ǁ β‰₯ ǁ_gβ‚–_ǁ, then we move to the LM step for the next iteration.

The overall QN step can be seen as follows. Note that in the code below, we calculate Ο± as we do in the LM step.

Implementation

We are now ready to apply our knowledge to the problem defined at the beginning of this article. We will vary the initial point in three different scenarios:

  • _x_β‚€ = [-1, 1, -10, 10]
  • _x_β‚€ = [-4, 1, 2, -3]
  • _x_β‚€ = [0, 0, 0, 0]

But first, let’s introduce a python function MakeAnimation() to animate the learning process of the algorithm.

Scenario 1

Iteration:1 x = [-1.0000 0.9500 -10.0000 9.9997], 
        gradient = 64932772695986634752.0000, step = LM
Iteration:2 x = [-1.0000 0.9001 -10.0000 9.9994], 
        gradient = 8871075343566318592.0000, step = LM
Iteration:3 x = [-1.0000 0.8510 -10.0000 9.9992], 
        gradient = 1250145055141384192.0000, step = LM
Iteration:4 x = [-1.0000 0.8050 -10.0000 9.9990], 
        gradient = 199964648437467488.0000, step = LM
Iteration:5 x = [-1.0000 0.7686 -10.0000 9.9988], 
        gradient = 46950703393293552.0000, step = LM
Iteration:6 x = [-1.0000 0.7454 -10.0000 9.9987], 
        gradient = 18613190948807260.0000, step = LM
Iteration:7 x = [-1.0000 0.7283 -10.0000 9.9986], 
        gradient = 9421138350386658.0000, step = LM
Iteration:8 x = [-1.0000 0.7127 -10.0000 9.9985], 
        gradient = 5067642599353893.0000, step = LM
Iteration:9 x = [-1.0000 0.6974 -10.0000 9.9984], 
        gradient = 2761173975265880.5000, step = LM
Iteration:10    x = [-1.0000 0.6822 -10.0000 9.9983], 
        gradient = 1508437660486741.0000, step = LM
Iteration:11    x = [-1.0000 0.6670 -10.0000 9.9983], 
        gradient = 824513384402736.2500, step = LM
Iteration:12    x = [-1.0000 0.6518 -10.0000 9.9982], 
        gradient = 450734053994430.5000, step = LM
Iteration:13    x = [-1.0000 0.6366 -10.0000 9.9981], 
        gradient = 246409971021028.8438, step = LM
Iteration:14    x = [-1.0000 0.6214 -10.0000 9.9980], 
        gradient = 134711382454059.4531, step = LM
Iteration:15    x = [-1.0000 0.6062 -10.0000 9.9980], 
        gradient = 73647395414745.6094, step = LM
Iteration:16    x = [-1.0000 0.5909 -10.0000 9.9979], 
        gradient = 40264084923738.5469, step = LM
Iteration:17    x = [-1.0000 0.5757 -10.0000 9.9978], 
        gradient = 22013352246245.9766, step = LM
Iteration:18    x = [-1.0000 0.5605 -10.0000 9.9977], 
        gradient = 12035471978657.0332, step = LM
Iteration:19    x = [-1.0000 0.5452 -10.0000 9.9976], 
        gradient = 6580356674017.7891, step = LM
Iteration:20    x = [-1.0000 0.5299 -10.0000 9.9976], 
        gradient = 3597874150520.1475, step = LM
Iteration:21    x = [-1.0000 0.5146 -10.0000 9.9975], 
        gradient = 1967223429070.2634, step = LM
Iteration:22    x = [-1.0000 0.4993 -10.0000 9.9974], 
        gradient = 1075656653534.5968, step = LM
Iteration:23    x = [-1.0000 0.4840 -10.0000 9.9973], 
        gradient = 588175745878.2034, step = LM
Iteration:24    x = [-1.0000 0.4687 -10.0000 9.9973], 
        gradient = 321629128815.2425, step = LM
Iteration:25    x = [-1.0000 0.4534 -10.0000 9.9972], 
        gradient = 175881428072.9430, step = LM
Iteration:26    x = [-1.0000 0.4380 -10.0000 9.9971], 
        gradient = 96183963098.1861, step = LM
Iteration:27    x = [-1.0000 0.4226 -10.0000 9.9970], 
        gradient = 52602379508.8333, step = LM
Iteration:28    x = [-1.0000 0.4072 -10.0000 9.9969], 
        gradient = 28769372395.6593, step = LM
Iteration:29    x = [-1.0000 0.3918 -10.0000 9.9969], 
        gradient = 15735488020.7650, step = LM
Iteration:30    x = [-1.0000 0.3763 -10.0000 9.9968], 
        gradient = 8607119032.2862, step = LM
Iteration:31    x = [-1.0000 0.3609 -10.0000 9.9967], 
        gradient = 4708326165.4597, step = LM
Iteration:32    x = [-1.0000 0.3454 -10.0000 9.9966], 
        gradient = 2575789155.9012, step = LM
Iteration:33    x = [-1.0000 0.3298 -10.0000 9.9965], 
        gradient = 1409268155.2911, step = LM
Iteration:34    x = [-1.0000 0.3142 -10.0000 9.9965], 
        gradient = 771119669.0625, step = LM
Iteration:35    x = [-1.0000 0.2986 -10.0000 9.9964], 
        gradient = 421988746.9021, step = LM
Iteration:36    x = [-1.0000 0.2829 -10.0000 9.9963], 
        gradient = 230960695.7163, step = LM
Iteration:37    x = [-1.0000 0.2672 -10.0000 9.9962], 
        gradient = 126427578.6262, step = LM
Iteration:38    x = [-1.0000 0.2514 -10.0000 9.9961], 
        gradient = 69218466.4778, step = LM
Iteration:39    x = [-1.0000 0.2356 -10.0000 9.9960], 
        gradient = 37904445.7661, step = LM
Iteration:40    x = [-1.0000 0.2197 -10.0000 9.9960], 
        gradient = 20761560.5190, step = LM
Iteration:41    x = [-1.0000 0.2036 -10.0000 9.9959], 
        gradient = 11374897.3379, step = LM
Iteration:42    x = [-1.0000 0.1875 -10.0000 9.9958], 
        gradient = 6234064.0376, step = LM
Iteration:43    x = [-1.0000 0.1713 -10.0000 9.9957], 
        gradient = 3417848.7040, step = LM
Iteration:44    x = [-1.0000 0.1550 -10.0000 9.9956], 
        gradient = 1874634.8523, step = LM
Iteration:45    x = [-1.0000 0.1385 -10.0000 9.9955], 
        gradient = 1028702.8782, step = LM
Iteration:46    x = [-1.0000 0.1218 -10.0000 9.9954], 
        gradient = 564808.7854, step = LM
Iteration:47    x = [-1.0000 0.1049 -10.0000 9.9953], 
        gradient = 310298.2705, step = LM
Iteration:48    x = [-1.0000 0.0877 -10.0000 9.9952], 
        gradient = 170587.3249, step = LM
Iteration:49    x = [-1.0000 0.0703 -10.0000 9.9951], 
        gradient = 93845.4326, step = LM
Iteration:50    x = [-1.0000 0.0524 -10.0000 9.9950], 
        gradient = 51660.5782, step = LM
Iteration:51    x = [-1.0000 0.0341 -10.0000 9.9949], 
        gradient = 28451.8639, step = LM
Iteration:52    x = [-1.0000 0.0153 -10.0000 9.9947], 
        gradient = 15670.8490, step = LM
Iteration:53    x = [-0.9999 -0.0043 -10.0000 9.9946], 
        gradient = 8624.8546, step = LM
Iteration:54    x = [-0.9999 -0.0246 -10.0000 9.9944], 
        gradient = 4736.1605, step = LM
Iteration:55    x = [-0.9998 -0.0461 -10.0000 9.9941], 
        gradient = 2587.7373, step = LM
Iteration:56    x = [-0.9996 -0.0687 -10.0000 9.9939], 
        gradient = 1399.9325, step = LM
Iteration:57    x = [-0.9993 -0.0928 -10.0000 9.9936], 
        gradient = 743.3551, step = LM
Iteration:58    x = [-0.9987 -0.1184 -10.0001 9.9931], 
        gradient = 381.2687, step = LM
Iteration:59    x = [-0.9976 -0.1456 -10.0001 9.9925], 
        gradient = 183.0379, step = LM
Iteration:60    x = [-0.9952 -0.1736 -10.0002 9.9917], 
        gradient = 76.6425, step = LM
Iteration:61    x = [-0.9894 -0.1972 -10.0005 9.9901], 
        gradient = 26.4382, step = LM
Iteration:62    x = [-0.9744 -0.2105 -10.0012 9.9871], 
        gradient = 6.8472, step = LM
Iteration:63    x = [-0.9336 -0.2136 -10.0030 9.9797], 
        gradient = 4.4702, step = LM
Iteration:64    x = [-0.8269 -0.2088 -10.0070 9.9631], 
        gradient = 4.2573, step = LM
Iteration:65    x = [-0.6379 -0.1953 -10.0103 9.9430], 
        gradient = 6.5882, step = LM
Iteration:66    x = [-0.5576 -0.1832 -10.0023 9.9499], 
        gradient = 1.3240, step = LM
Iteration:67    x = [-0.5559 -0.1827 -9.9933 9.9681], 
        gradient = 0.0227, step = LM
Iteration:68    x = [-0.5544 -0.1829 -10.0000 9.9888], 
        gradient = 0.0087, step = LM
Iteration:69    x = [-0.5527 -0.1832 -10.0389 10.0305], 
        gradient = 0.0117, step = LM
Iteration:70    x = [-0.5482 -0.1843 -10.1523 10.1442], 
        gradient = 0.0548, step = LM
Iteration:71    x = [-0.5372 -0.1868 -10.4402 10.4325], 
        gradient = 0.2795, step = LM
Iteration:72    x = [-0.5173 -0.1916 -11.0244 11.0175], 
        gradient = 1.0133, step = LM
Iteration:73    x = [-0.5025 -0.1958 -11.5877 11.5812], 
        gradient = 0.7806, step = LM
Iteration:74    x = [-0.4883 -0.2000 -12.1891 12.1835], 
        gradient = 0.8407, step = LM
Iteration:75    x = [-0.4787 -0.2030 -12.6707 12.6659], 
        gradient = 0.4784, step = LM
Iteration:76    x = [-0.4704 -0.2057 -13.1195 13.1156], 
        gradient = 0.4040, step = LM
Iteration:77    x = [-0.4656 -0.2073 -13.4055 13.4021], 
        gradient = 0.1502, step = LM
Iteration:78    x = [-0.4629 -0.2083 -13.5758 13.5728], 
        gradient = 0.0525, step = LM
Iteration:79    x = [-0.4622 -0.2085 -13.6183 13.6154], 
        gradient = 0.0030, step = LM
Iteration:80    x = [-0.4622 -0.2085 -13.6218 13.6188], 
        gradient = 0.0000, step = LM
Iteration:81    x = [-0.4622 -0.2085 -13.6218 13.6188], 
        gradient = 0.0000, step = LM
Final estimated parameters: [-0.4622 -0.2085 -13.6218 13.6188]

The algorithm runs for 81 iterations and produces parameter estimation result x* = [-0.46, -0.21, -13.62, 13.62] with ǁg(x*)ǁ β‰ˆ 0. In other words, the algorithm successfully converges with the stopping criterion ǁ_gβ‚–_ǁ ≀ _Ξ΅_₁ being met. It can also be seen that the algorithm always uses the LM method in curve-fitting the given data. This means that for every 3 successive iterations, there is one iterate xβ‚– that satisfies ǁ_gβ‚–_ǁ β‰₯ 0.02 β‹… Fβ‚–.

Scenario 2

Iteration:1 x = [-4.0000 0.9500 2.0000 -2.9990], 
        gradient = 5843944048763837440.0000, step = LM
Iteration:2 x = [-4.0000 0.9001 2.0000 -2.9981], 
        gradient = 798393836723220864.0000, step = LM
Iteration:3 x = [-4.0000 0.8510 2.0000 -2.9973], 
        gradient = 112511464378970688.0000, step = LM
Iteration:4 x = [-4.0000 0.8050 2.0000 -2.9965], 
        gradient = 17995869734728704.0000, step = LM
Iteration:5 x = [-4.0000 0.7687 2.0000 -2.9959], 
        gradient = 4224958744295105.0000, step = LM
Iteration:6 x = [-4.0000 0.7454 2.0000 -2.9955], 
        gradient = 1674832246409442.5000, step = LM
Iteration:7 x = [-4.0000 0.7283 2.0000 -2.9952], 
        gradient = 847705835813273.6250, step = LM
Iteration:8 x = [-4.0000 0.7128 2.0000 -2.9950], 
        gradient = 455980000478269.1875, step = LM
Iteration:9 x = [-4.0000 0.6975 2.0000 -2.9947], 
        gradient = 248446618239291.2812, step = LM
Iteration:10    x = [-4.0000 0.6823 2.0000 -2.9945], 
        gradient = 135727074063636.5469, step = LM
Iteration:11    x = [-4.0000 0.6671 2.0000 -2.9942], 
        gradient = 74188516827416.2656, step = LM
Iteration:12    x = [-4.0000 0.6519 2.0000 -2.9940], 
        gradient = 40556383042130.0234, step = LM
Iteration:13    x = [-4.0000 0.6367 2.0000 -2.9937], 
        gradient = 22171597557325.9102, step = LM
Iteration:14    x = [-4.0000 0.6215 2.0000 -2.9934], 
        gradient = 12121123175141.8398, step = LM
Iteration:15    x = [-4.0000 0.6063 2.0000 -2.9932], 
        gradient = 6626677353862.0400, step = LM
Iteration:16    x = [-4.0000 0.5910 2.0000 -2.9929], 
        gradient = 3622898207455.6821, step = LM
Iteration:17    x = [-4.0000 0.5758 2.0000 -2.9927], 
        gradient = 1980725783949.0017, step = LM
Iteration:18    x = [-4.0000 0.5606 2.0000 -2.9924], 
        gradient = 1082932002365.9087, step = LM
Iteration:19    x = [-4.0000 0.5453 2.0000 -2.9922], 
        gradient = 592089574010.1783, step = LM
Iteration:20    x = [-4.0000 0.5300 2.0000 -2.9919], 
        gradient = 323730717048.8542, step = LM
Iteration:21    x = [-4.0000 0.5148 2.0000 -2.9916], 
        gradient = 177007495788.6531, step = LM
Iteration:22    x = [-4.0000 0.4995 2.0000 -2.9914], 
        gradient = 96785827387.7305, step = LM
Iteration:23    x = [-4.0000 0.4842 2.0000 -2.9911], 
        gradient = 52923125898.6776, step = LM
Iteration:24    x = [-4.0000 0.4688 2.0000 -2.9909], 
        gradient = 28939714442.5817, step = LM
Iteration:25    x = [-4.0000 0.4535 2.0000 -2.9906], 
        gradient = 15825580540.4678, step = LM
Iteration:26    x = [-4.0000 0.4381 2.0000 -2.9903], 
        gradient = 8654531799.0783, step = LM
Iteration:27    x = [-4.0000 0.4228 2.0000 -2.9901], 
        gradient = 4733127210.2534, step = LM
Iteration:28    x = [-4.0000 0.4074 2.0000 -2.9898], 
        gradient = 2588665603.7885, step = LM
Iteration:29    x = [-4.0000 0.3919 2.0000 -2.9895], 
        gradient = 1415891078.6356, step = LM
Iteration:30    x = [-4.0000 0.3765 2.0000 -2.9893], 
        gradient = 774485568.1282, step = LM
Iteration:31    x = [-4.0000 0.3610 2.0000 -2.9890], 
        gradient = 423672773.1613, step = LM
Iteration:32    x = [-4.0000 0.3455 2.0000 -2.9887], 
        gradient = 231785640.8201, step = LM
Iteration:33    x = [-4.0000 0.3300 2.0000 -2.9885], 
        gradient = 126819882.7775, step = LM
Iteration:34    x = [-4.0000 0.3144 2.0000 -2.9882], 
        gradient = 69396974.7767, step = LM
Iteration:35    x = [-4.0000 0.2988 2.0000 -2.9879], 
        gradient = 37980046.8186, step = LM
Iteration:36    x = [-4.0000 0.2831 2.0000 -2.9876], 
        gradient = 20789502.5936, step = LM
Iteration:37    x = [-4.0000 0.2674 2.0000 -2.9874], 
        gradient = 11382078.3240, step = LM
Iteration:38    x = [-4.0000 0.2516 2.0000 -2.9871], 
        gradient = 6233151.4407, step = LM
Iteration:39    x = [-4.0000 0.2358 2.0000 -2.9868], 
        gradient = 3414510.4438, step = LM
Iteration:40    x = [-4.0000 0.2198 2.0000 -2.9865], 
        gradient = 1871192.9516, step = LM
Iteration:41    x = [-4.0000 0.2038 2.0000 -2.9862], 
        gradient = 1025947.0256, step = LM
Iteration:42    x = [-4.0000 0.1877 2.0000 -2.9859], 
        gradient = 562874.5612, step = LM
Iteration:43    x = [-4.0000 0.1714 2.0000 -2.9856], 
        gradient = 309077.3735, step = LM
Iteration:44    x = [-4.0000 0.1550 2.0000 -2.9853], 
        gradient = 169908.6045, step = LM
Iteration:45    x = [-4.0000 0.1384 2.0000 -2.9850], 
        gradient = 93547.0603, step = LM
Iteration:46    x = [-4.0000 0.1215 2.0000 -2.9847], 
        gradient = 51612.8671, step = LM
Iteration:47    x = [-4.0000 0.1044 2.0000 -2.9843], 
        gradient = 28559.4009, step = LM
Iteration:48    x = [-4.0000 0.0869 2.0000 -2.9839], 
        gradient = 15867.2252, step = LM
Iteration:49    x = [-4.0000 0.0688 2.0000 -2.9835], 
        gradient = 8865.7762, step = LM
Iteration:50    x = [-4.0000 0.0500 2.0000 -2.9830], 
        gradient = 4993.1844, step = LM
Iteration:51    x = [-4.0000 0.0303 2.0000 -2.9825], 
        gradient = 2843.2836, step = LM
Iteration:52    x = [-4.0000 0.0092 2.0000 -2.9818], 
        gradient = 1643.5749, step = LM
Iteration:53    x = [-4.0000 -0.0138 2.0000 -2.9809], 
        gradient = 969.1664, step = LM
Iteration:54    x = [-4.0000 -0.0397 2.0001 -2.9798], 
        gradient = 585.9656, step = LM
Iteration:55    x = [-4.0000 -0.0695 2.0001 -2.9782], 
        gradient = 364.7187, step = LM
Iteration:56    x = [-4.0000 -0.1047 2.0003 -2.9760], 
        gradient = 233.9465, step = LM
Iteration:57    x = [-4.0000 -0.1465 2.0005 -2.9730], 
        gradient = 154.2224, step = LM
Iteration:58    x = [-4.0000 -0.1954 2.0008 -2.9690], 
        gradient = 103.9455, step = LM
Iteration:59    x = [-4.0000 -0.2516 2.0014 -2.9638], 
        gradient = 71.2219, step = LM
Iteration:60    x = [-4.0000 -0.3152 2.0023 -2.9571], 
        gradient = 49.3390, step = LM
Iteration:61    x = [-4.0000 -0.3865 2.0037 -2.9486], 
        gradient = 34.3994, step = LM
Iteration:62    x = [-4.0000 -0.4659 2.0060 -2.9378], 
        gradient = 24.0583, step = LM
Iteration:63    x = [-3.9999 -0.5540 2.0095 -2.9242], 
        gradient = 16.8404, step = LM
Iteration:64    x = [-3.9999 -0.6510 2.0148 -2.9069], 
        gradient = 11.7791, step = LM
Iteration:65    x = [-3.9998 -0.7574 2.0228 -2.8848], 
        gradient = 8.2216, step = LM
Iteration:66    x = [-3.9998 -0.8733 2.0348 -2.8567], 
        gradient = 5.7186, step = LM
Iteration:67    x = [-3.9996 -0.9990 2.0522 -2.8209], 
        gradient = 3.9571, step = LM
Iteration:68    x = [-3.9995 -1.1349 2.0768 -2.7754], 
        gradient = 2.7180, step = LM
Iteration:69    x = [-3.9992 -1.2813 2.1103 -2.7183], 
        gradient = 1.8476, step = LM
Iteration:70    x = [-3.9987 -1.4390 2.1533 -2.6485], 
        gradient = 1.2392, step = LM
Iteration:71    x = [-3.9980 -1.6079 2.2038 -2.5681], 
        gradient = 0.8190, step = LM
Iteration:72    x = [-3.9968 -1.7865 2.2547 -2.4842], 
        gradient = 0.5357, step = LM
Iteration:73    x = [-3.9949 -1.9699 2.2945 -2.4089], 
        gradient = 0.3513, step = LM
Iteration:74    x = [-3.9920 -2.1517 2.3148 -2.3524], 
        gradient = 0.2345, step = LM
Iteration:75    x = [-3.9878 -2.3283 2.3156 -2.3155], 
        gradient = 0.1603, step = LM
Iteration:76    x = [-3.9863 -2.3635 2.3124 -2.3081], 
        gradient = 0.1487, step = QN
Iteration:77    x = [-3.9803 -2.4682 2.2972 -2.2863], 
        gradient = 0.1189, step = QN
Iteration:78    x = [-3.9577 -2.7803 2.2395 -2.2211], 
        gradient = 0.0613, step = QN
Iteration:79    x = [-3.9319 -3.0541 2.1785 -2.1639], 
        gradient = 0.0344, step = QN
Iteration:80    x = [-3.9000 -3.3271 2.1135 -2.1068], 
        gradient = 0.0193, step = QN
Iteration:81    x = [-3.8784 -3.4807 2.0776 -2.0748], 
        gradient = 0.0140, step = QN
Iteration:82    x = [-3.8657 -3.5574 2.0604 -2.0588], 
        gradient = 0.0119, step = QN
Iteration:83    x = [-3.8510 -3.6338 2.0437 -2.0428], 
        gradient = 0.0101, step = QN
Iteration:84    x = [-3.8335 -3.7098 2.0276 -2.0269], 
        gradient = 0.0086, step = QN
Iteration:85    x = [-3.8227 -3.7473 2.0198 -2.0191], 
        gradient = 0.0080, step = QN
Iteration:86    x = [-3.8090 -3.7840 2.0123 -2.0114], 
        gradient = 0.0074, step = QN
Iteration:87    x = [-3.7911 -3.8190 2.0053 -2.0041], 
        gradient = 0.0073, step = QN
Iteration:88    x = [-3.7664 -3.8489 1.9996 -1.9979], 
        gradient = 0.0076, step = QN
Iteration:89    x = [-3.7581 -3.8559 1.9998 -1.9981], 
        gradient = 0.0077, step = LM
Iteration:90    x = [-3.7330 -3.8766 2.0004 -1.9986], 
        gradient = 0.0081, step = LM
Iteration:91    x = [-3.6535 -3.9361 2.0028 -2.0011], 
        gradient = 0.0096, step = LM
Iteration:92    x = [-3.3739 -4.0948 2.0180 -2.0161], 
        gradient = 0.0168, step = LM
Iteration:93    x = [-1.8970 -4.4431 2.1579 -2.1553], 
        gradient = 0.3573, step = LM
Iteration:94    x = [-1.8970 -4.4431 2.1579 -2.1553], 
        gradient = 0.3573, step = LM
Iteration:95    x = [-1.8970 -4.4431 2.1579 -2.1553], 
        gradient = 0.3573, step = LM
Iteration:96    x = [-1.8970 -4.4431 2.1579 -2.1553], 
        gradient = 0.3573, step = LM
Iteration:97    x = [-0.4821 -4.4514 2.3330 -2.2979], 
        gradient = 12.9667, step = LM
Iteration:98    x = [-0.4821 -4.4514 2.3330 -2.2979], 
        gradient = 12.9667, step = LM
Iteration:99    x = [-0.4821 -4.4514 2.3330 -2.2979], 
        gradient = 12.9667, step = LM
Iteration:100   x = [-0.4821 -4.4514 2.3330 -2.2979], 
        gradient = 12.9667, step = LM
Iteration:101   x = [-0.4821 -4.4514 2.3330 -2.2979], 
        gradient = 12.9667, step = LM
Iteration:102   x = [-0.3361 -4.4514 2.3517 -2.2985], 
        gradient = 23.2634, step = LM
Iteration:103   x = [-0.3361 -4.4514 2.3517 -2.2985], 
        gradient = 23.2634, step = LM
Iteration:104   x = [-0.0195 -4.4514 2.3844 -2.3000], 
        gradient = 141.8487, step = LM
Iteration:105   x = [-0.0542 -4.4514 2.5491 -2.3129], 
        gradient = 2.7825, step = LM
Iteration:106   x = [-0.0622 -4.4515 2.7408 -2.3468], 
        gradient = 3.8567, step = LM
Iteration:107   x = [-0.0733 -4.4516 3.1240 -2.5063], 
        gradient = 5.6536, step = LM
Iteration:108   x = [-0.0887 -4.4519 3.7063 -3.0302], 
        gradient = 7.1668, step = LM
Iteration:109   x = [-0.1056 -4.4518 4.4259 -4.0061], 
        gradient = 6.3086, step = LM
Iteration:110   x = [-0.1190 -4.4484 5.0608 -4.9286], 
        gradient = 3.4276, step = LM
Iteration:111   x = [-0.1236 -4.4311 5.3256 -5.3076], 
        gradient = 0.7579, step = LM
Iteration:112   x = [-0.1239 -4.3717 5.3572 -5.3565], 
        gradient = 0.0396, step = LM
Iteration:113   x = [-0.1239 -4.1687 5.3583 -5.3584], 
        gradient = 0.0016, step = LM
Iteration:114   x = [-0.1240 -3.2559 5.3610 -5.3611], 
        gradient = 0.0427, step = LM
Iteration:115   x = [-0.1240 -3.2559 5.3610 -5.3611], 
        gradient = 0.0427, step = LM
Iteration:116   x = [-0.1240 -3.2559 5.3610 -5.3611], 
        gradient = 0.0427, step = LM
Iteration:117   x = [-0.1248 -1.1768 5.4018 -5.4026], 
        gradient = 4.7877, step = LM
Iteration:118   x = [-0.1248 -1.1768 5.4018 -5.4026], 
        gradient = 4.7877, step = LM
Iteration:119   x = [-0.1248 -1.1768 5.4018 -5.4026], 
        gradient = 4.7877, step = LM
Iteration:120   x = [-0.1248 -1.1768 5.4018 -5.4026], 
        gradient = 4.7877, step = LM
Iteration:121   x = [-0.1248 -1.1768 5.4018 -5.4026], 
        gradient = 4.7877, step = LM
Iteration:122   x = [-0.1244 -1.1290 5.5183 -5.4517], 
        gradient = 0.1904, step = LM
Iteration:123   x = [-0.1292 -1.0057 5.7614 -5.6610], 
        gradient = 1.1816, step = LM
Iteration:124   x = [-0.1382 -0.8383 6.2791 -6.1944], 
        gradient = 3.1482, step = LM
Iteration:125   x = [-0.1464 -0.7806 6.8138 -6.7630], 
        gradient = 1.7846, step = LM
Iteration:126   x = [-0.1599 -0.6524 7.7451 -7.7208], 
        gradient = 5.7805, step = LM
Iteration:127   x = [-0.1658 -0.6428 8.2978 -8.2944], 
        gradient = 0.8161, step = LM
Iteration:128   x = [-0.1770 -0.5710 9.2787 -9.2788], 
        gradient = 4.5341, step = LM
Iteration:129   x = [-0.1818 -0.5588 9.8477 -9.8546], 
        gradient = 0.7032, step = LM
Iteration:130   x = [-0.1902 -0.5204 10.7897 -10.7955], 
        gradient = 2.9184, step = LM
Iteration:131   x = [-0.1941 -0.5090 11.3439 -11.3510], 
        gradient = 0.6201, step = LM
Iteration:132   x = [-0.1999 -0.4878 12.1563 -12.1617], 
        gradient = 1.6519, step = LM
Iteration:133   x = [-0.2027 -0.4799 12.6128 -12.6178], 
        gradient = 0.3816, step = LM
Iteration:134   x = [-0.2059 -0.4697 13.1434 -13.1472], 
        gradient = 0.5800, step = LM
Iteration:135   x = [-0.2073 -0.4656 13.4035 -13.4068], 
        gradient = 0.1173, step = LM
Iteration:136   x = [-0.2082 -0.4629 13.5699 -13.5729], 
        gradient = 0.0504, step = LM
Iteration:137   x = [-0.2085 -0.4622 13.6148 -13.6177], 
        gradient = 0.0034, step = LM
Iteration:138   x = [-0.2085 -0.4622 13.6188 -13.6218], 
        gradient = 0.0000, step = LM
Iteration:139   x = [-0.2085 -0.4622 13.6188 -13.6218], 
        gradient = 0.0000, step = LM
Final estimated parameters: [-0.2085 -0.4622 13.6188 -13.6218]

The algorithm runs as many as 139 iterations and produces parameter estimation result x* = [-0.21, -0.46, 13.62, -13.62] with ǁg(x*)ǁ β‰ˆ 0. In other words, the algorithm successfully converges with the stopping criterion ǁ_gβ‚–_ǁ ≀ _Ξ΅_₁ being met. It can also be seen that during the curve fitting of the given data, the algorithm changes the step method to QN at iterations 76–88 before changing back to LM. This means that in the last 3 iterations before iteration 76, no iterate xβ‚– satisfies ǁ_gβ‚–_ǁ β‰₯ 0.02 β‹… Fβ‚– so that the QN method is considered able to converge more quickly than the LM method after iteration 76. Then, at iteration 88, we get ǁ_gβ‚–β‚Šβ‚_ǁ β‰₯ ǁ_gβ‚–_ǁ which means the gradient does not decrease fast enough using the QN method so that the method switches to LM for the next iteration.

Because the initial value _x_β‚€ is quite far from the solution, more iterations are needed than that in Scenario 1. Also, note that so far there are two solutions that satisfy the convergence value x*, namely x* = [-0.46, -0.21, -13.62, 13.62] and x* = [-0.21, -0.46, 13.62, -13.62].

Scenario 3

Iteration:1 x = [0.0000 0.0000 0.7705 0.7705], 
        gradient = 42.5292, step = LM
Iteration:2 x = [-0.0815 -0.0815 1.3987 1.3987], 
        gradient = 16.7516, step = LM
Iteration:3 x = [-0.0400 -0.0400 1.2150 1.2150], 
        gradient = 18.3468, step = LM
Iteration:4 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:5 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:6 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:7 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:8 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:9 x = [-0.0607 -0.0695 1.3851 1.3851], 
        gradient = 4.8184, step = LM
Iteration:10    x = [-0.0631 -0.0584 1.3830 1.3830], 
        gradient = 1.3008, step = LM
Iteration:11    x = [-0.0596 -0.0627 1.3811 1.3811], 
        gradient = 0.4178, step = LM
Iteration:12    x = [-0.0620 -0.0601 1.3793 1.3792], 
        gradient = 0.2770, step = LM
Iteration:13    x = [-0.0604 -0.0616 1.3775 1.3775], 
        gradient = 0.2522, step = LM
Iteration:14    x = [-0.0613 -0.0605 1.3758 1.3757], 
        gradient = 0.2432, step = LM
Iteration:15    x = [-0.0604 -0.0612 1.3737 1.3737], 
        gradient = 0.2332, step = LM
Iteration:16    x = [-0.0615 -0.0597 1.3708 1.3708], 
        gradient = 0.2541, step = LM
Iteration:17    x = [-0.0582 -0.0627 1.3678 1.3677], 
        gradient = 0.6396, step = LM
Iteration:18    x = [-0.0582 -0.0627 1.3678 1.3677], 
        gradient = 0.6396, step = LM
Iteration:19    x = [-0.0607 -0.0600 1.3668 1.3668], 
        gradient = 0.1995, step = LM
Iteration:20    x = [-0.0603 -0.0604 1.3658 1.3658], 
        gradient = 0.1930, step = LM
Iteration:21    x = [-0.0603 -0.0601 1.3639 1.3638], 
        gradient = 0.1840, step = LM
Iteration:22    x = [-0.0593 -0.0605 1.3587 1.3586], 
        gradient = 0.2076, step = LM
Iteration:23    x = [-0.0593 -0.0605 1.3587 1.3586], 
        gradient = 0.2076, step = LM
Iteration:24    x = [-0.0593 -0.0605 1.3587 1.3586], 
        gradient = 0.2076, step = LM
Iteration:25    x = [-0.0608 -0.0590 1.3571 1.3570], 
        gradient = 0.2311, step = LM
Iteration:26    x = [-0.0584 -0.0611 1.3556 1.3555], 
        gradient = 0.3672, step = LM
Iteration:27    x = [-0.0609 -0.0586 1.3545 1.3544], 
        gradient = 0.3171, step = LM
Iteration:28    x = [-0.0590 -0.0603 1.3536 1.3536], 
        gradient = 0.1788, step = LM
Iteration:29    x = [-0.0600 -0.0593 1.3528 1.3527], 
        gradient = 0.1280, step = LM
Iteration:30    x = [-0.0594 -0.0598 1.3519 1.3519], 
        gradient = 0.1237, step = LM
Iteration:31    x = [-0.0597 -0.0594 1.3510 1.3510], 
        gradient = 0.1192, step = LM
Iteration:32    x = [-0.0597 -0.0594 1.3508 1.3510], 
        gradient = 0.1183, step = QN
Iteration:33    x = [-0.0597 -0.0594 1.3500 1.3510], 
        gradient = 0.1155, step = QN
Iteration:34    x = [-0.0596 -0.0594 1.3475 1.3510], 
        gradient = 0.1084, step = QN
Iteration:35    x = [-0.0592 -0.0594 1.3401 1.3510], 
        gradient = 0.1055, step = QN
Iteration:36    x = [-0.0589 -0.0594 1.3328 1.3510], 
        gradient = 0.2028, step = QN
Iteration:37    x = [-0.0595 -0.0585 1.3319 1.3501], 
        gradient = 0.1538, step = LM
Iteration:38    x = [-0.0595 -0.0585 1.3319 1.3501], 
        gradient = 0.1538, step = LM
Iteration:39    x = [-0.0587 -0.0592 1.3314 1.3496], 
        gradient = 0.0704, step = LM
Iteration:40    x = [-0.0591 -0.0588 1.3310 1.3492], 
        gradient = 0.0640, step = LM
Iteration:41    x = [-0.0588 -0.0590 1.3306 1.3488], 
        gradient = 0.0618, step = LM
Iteration:42    x = [-0.0588 -0.0590 1.3304 1.3488], 
        gradient = 0.0615, step = QN
Iteration:43    x = [-0.0588 -0.0590 1.3300 1.3488], 
        gradient = 0.0610, step = QN
Iteration:44    x = [-0.0587 -0.0590 1.3288 1.3488], 
        gradient = 0.0582, step = QN
Iteration:45    x = [-0.0586 -0.0590 1.3275 1.3488], 
        gradient = 0.0571, step = QN
Iteration:46    x = [-0.0586 -0.0590 1.3263 1.3488], 
        gradient = 0.0554, step = QN
Iteration:47    x = [-0.0585 -0.0590 1.3251 1.3488], 
        gradient = 0.0646, step = QN
Iteration:48    x = [-0.0590 -0.0585 1.3247 1.3484], 
        gradient = 0.0576, step = LM
Iteration:49    x = [-0.0585 -0.0589 1.3243 1.3481], 
        gradient = 0.0494, step = LM
Iteration:50    x = [-0.0589 -0.0585 1.3240 1.3477], 
        gradient = 0.0455, step = LM
Iteration:51    x = [-0.0585 -0.0588 1.3237 1.3474], 
        gradient = 0.0409, step = LM
Iteration:52    x = [-0.0588 -0.0585 1.3233 1.3471], 
        gradient = 0.0393, step = LM
Iteration:53    x = [-0.0585 -0.0588 1.3230 1.3468], 
        gradient = 0.0377, step = LM
Iteration:54    x = [-0.0588 -0.0585 1.3228 1.3465], 
        gradient = 0.0363, step = LM
Iteration:55    x = [-0.0585 -0.0587 1.3225 1.3462], 
        gradient = 0.0348, step = LM
Iteration:56    x = [-0.0587 -0.0585 1.3222 1.3459], 
        gradient = 0.0335, step = LM
Iteration:57    x = [-0.0585 -0.0587 1.3219 1.3457], 
        gradient = 0.0321, step = LM
Iteration:58    x = [-0.0587 -0.0585 1.3217 1.3454], 
        gradient = 0.0307, step = LM
Iteration:59    x = [-0.0584 -0.0587 1.3214 1.3452], 
        gradient = 0.0294, step = LM
Iteration:60    x = [-0.0587 -0.0584 1.3212 1.3449], 
        gradient = 0.0319, step = LM
Iteration:61    x = [-0.0584 -0.0587 1.3209 1.3447], 
        gradient = 0.0330, step = LM
Iteration:62    x = [-0.0587 -0.0584 1.3207 1.3445], 
        gradient = 0.0357, step = LM
Iteration:63    x = [-0.0584 -0.0587 1.3205 1.3442], 
        gradient = 0.0362, step = LM
Iteration:64    x = [-0.0586 -0.0584 1.3203 1.3440], 
        gradient = 0.0363, step = LM
Iteration:65    x = [-0.0584 -0.0586 1.3201 1.3439], 
        gradient = 0.0321, step = LM
Iteration:66    x = [-0.0586 -0.0584 1.3199 1.3437], 
        gradient = 0.0281, step = LM
Iteration:67    x = [-0.0584 -0.0586 1.3198 1.3435], 
        gradient = 0.0228, step = LM
Iteration:68    x = [-0.0585 -0.0584 1.3196 1.3434], 
        gradient = 0.0203, step = LM
Iteration:69    x = [-0.0584 -0.0585 1.3195 1.3432], 
        gradient = 0.0195, step = LM
Iteration:70    x = [-0.0585 -0.0584 1.3193 1.3431], 
        gradient = 0.0188, step = LM
Iteration:71    x = [-0.0584 -0.0585 1.3192 1.3429], 
        gradient = 0.0181, step = LM
Iteration:72    x = [-0.0585 -0.0584 1.3190 1.3428], 
        gradient = 0.0174, step = LM
Iteration:73    x = [-0.0584 -0.0585 1.3189 1.3426], 
        gradient = 0.0166, step = LM
Iteration:74    x = [-0.0585 -0.0583 1.3187 1.3425], 
        gradient = 0.0175, step = LM
Iteration:75    x = [-0.0583 -0.0585 1.3186 1.3423], 
        gradient = 0.0212, step = LM
Iteration:76    x = [-0.0585 -0.0583 1.3185 1.3422], 
        gradient = 0.0264, step = LM
Iteration:77    x = [-0.0583 -0.0585 1.3183 1.3421], 
        gradient = 0.0264, step = LM
Iteration:78    x = [-0.0585 -0.0583 1.3182 1.3420], 
        gradient = 0.0201, step = LM
Iteration:79    x = [-0.0583 -0.0584 1.3182 1.3419], 
        gradient = 0.0130, step = LM
Iteration:80    x = [-0.0584 -0.0583 1.3181 1.3418], 
        gradient = 0.0124, step = LM
Iteration:81    x = [-0.0584 -0.0584 1.3180 1.3417], 
        gradient = 0.0120, step = LM
Iteration:82    x = [-0.0584 -0.0583 1.3179 1.3416], 
        gradient = 0.0115, step = LM
Iteration:83    x = [-0.0583 -0.0584 1.3178 1.3415], 
        gradient = 0.0110, step = LM
Iteration:84    x = [-0.0584 -0.0583 1.3176 1.3414], 
        gradient = 0.0186, step = LM
Iteration:85    x = [-0.0584 -0.0583 1.3176 1.3414], 
        gradient = 0.0186, step = LM
Iteration:86    x = [-0.0583 -0.0584 1.3176 1.3413], 
        gradient = 0.0112, step = LM
Iteration:87    x = [-0.0584 -0.0583 1.3175 1.3412], 
        gradient = 0.0095, step = LM
Iteration:88    x = [-0.0583 -0.0584 1.3174 1.3412], 
        gradient = 0.0092, step = LM
Iteration:89    x = [-0.0583 -0.0584 1.3174 1.3412], 
        gradient = 0.0090, step = QN
Iteration:90    x = [-0.0583 -0.0584 1.3173 1.3412], 
        gradient = 0.0260, step = QN
Iteration:91    x = [-0.0584 -0.0583 1.3173 1.3411], 
        gradient = 0.0085, step = LM
Iteration:92    x = [-0.0583 -0.0584 1.3172 1.3411], 
        gradient = 0.0083, step = LM
Iteration:93    x = [-0.0583 -0.0583 1.3171 1.3410], 
        gradient = 0.0080, step = LM
Iteration:94    x = [-0.0583 -0.0583 1.3171 1.3409], 
        gradient = 0.0077, step = LM
Iteration:95    x = [-0.0584 -0.0583 1.3170 1.3409], 
        gradient = 0.0075, step = LM
Iteration:96    x = [-0.0583 -0.0584 1.3169 1.3408], 
        gradient = 0.0110, step = LM
Iteration:97    x = [-0.0584 -0.0582 1.3169 1.3407], 
        gradient = 0.0165, step = LM
Iteration:98    x = [-0.0583 -0.0584 1.3168 1.3407], 
        gradient = 0.0109, step = LM
Iteration:99    x = [-0.0583 -0.0583 1.3168 1.3406], 
        gradient = 0.0062, step = LM
Iteration:100   x = [-0.0583 -0.0583 1.3167 1.3406], 
        gradient = 0.0059, step = LM
Iteration:101   x = [-0.0583 -0.0583 1.3167 1.3406], 
        gradient = 0.0057, step = LM
Iteration:102   x = [-0.0583 -0.0583 1.3167 1.3405], 
        gradient = 0.0055, step = LM
Iteration:103   x = [-0.0583 -0.0583 1.3166 1.3404], 
        gradient = 0.0075, step = LM
Iteration:104   x = [-0.0583 -0.0583 1.3166 1.3404], 
        gradient = 0.0075, step = LM
Iteration:105   x = [-0.0583 -0.0583 1.3165 1.3404], 
        gradient = 0.0090, step = LM
Iteration:106   x = [-0.0583 -0.0583 1.3165 1.3403], 
        gradient = 0.0099, step = LM
Iteration:107   x = [-0.0583 -0.0583 1.3164 1.3403], 
        gradient = 0.0064, step = LM
Iteration:108   x = [-0.0583 -0.0583 1.3164 1.3403], 
        gradient = 0.0064, step = QN
Iteration:109   x = [-0.0583 -0.0583 1.3164 1.3403], 
        gradient = 0.0043, step = LM
Iteration:110   x = [-0.0583 -0.0583 1.3164 1.3402], 
        gradient = 0.0041, step = LM
Iteration:111   x = [-0.0583 -0.0583 1.3164 1.3402], 
        gradient = 0.0040, step = LM
Iteration:112   x = [-0.0583 -0.0583 1.3163 1.3402], 
        gradient = 0.0038, step = LM
Iteration:113   x = [-0.0583 -0.0583 1.3163 1.3401], 
        gradient = 0.0050, step = LM
Iteration:114   x = [-0.0583 -0.0583 1.3163 1.3401], 
        gradient = 0.0050, step = LM
Iteration:115   x = [-0.0583 -0.0583 1.3162 1.3401], 
        gradient = 0.0045, step = LM
Iteration:116   x = [-0.0583 -0.0583 1.3162 1.3401], 
        gradient = 0.0044, step = LM
Iteration:117   x = [-0.0583 -0.0583 1.3162 1.3400], 
        gradient = 0.0040, step = LM
Iteration:118   x = [-0.0583 -0.0583 1.3162 1.3400], 
        gradient = 0.0040, step = QN
Iteration:119   x = [-0.0583 -0.0583 1.3162 1.3400], 
        gradient = 0.0037, step = LM
Iteration:120   x = [-0.0583 -0.0583 1.3161 1.3400], 
        gradient = 0.0032, step = LM
Iteration:121   x = [-0.0583 -0.0583 1.3161 1.3400], 
        gradient = 0.0030, step = LM
Iteration:122   x = [-0.0583 -0.0583 1.3161 1.3400], 
        gradient = 0.0027, step = LM
Iteration:123   x = [-0.0583 -0.0583 1.3161 1.3399], 
        gradient = 0.0026, step = LM
Iteration:124   x = [-0.0583 -0.0583 1.3161 1.3399], 
        gradient = 0.0025, step = LM
Iteration:125   x = [-0.0583 -0.0583 1.3160 1.3399], 
        gradient = 0.0024, step = LM
Iteration:126   x = [-0.0583 -0.0583 1.3160 1.3399], 
        gradient = 0.0023, step = LM
Iteration:127   x = [-0.0583 -0.0583 1.3160 1.3399], 
        gradient = 0.0022, step = LM
Iteration:128   x = [-0.0583 -0.0583 1.3160 1.3398], 
        gradient = 0.0021, step = LM
Iteration:129   x = [-0.0583 -0.0583 1.3160 1.3398], 
        gradient = 0.0020, step = LM
Iteration:130   x = [-0.0583 -0.0583 1.3160 1.3398], 
        gradient = 0.0020, step = LM
Iteration:131   x = [-0.0583 -0.0583 1.3159 1.3398], 
        gradient = 0.0022, step = LM
Iteration:132   x = [-0.0583 -0.0583 1.3159 1.3398], 
        gradient = 0.0024, step = LM
Iteration:133   x = [-0.0583 -0.0583 1.3159 1.3398], 
        gradient = 0.0026, step = LM
Iteration:134   x = [-0.0583 -0.0583 1.3159 1.3397], 
        gradient = 0.0025, step = LM
Iteration:135   x = [-0.0583 -0.0583 1.3159 1.3397], 
        gradient = 0.0023, step = LM
Iteration:136   x = [-0.0583 -0.0583 1.3159 1.3397], 
        gradient = 0.0018, step = LM
Iteration:137   x = [-0.0583 -0.0583 1.3159 1.3397], 
        gradient = 0.0015, step = LM
Iteration:138   x = [-0.0583 -0.0583 1.3158 1.3397], 
        gradient = 0.0014, step = LM
Iteration:139   x = [-0.0583 -0.0583 1.3158 1.3397], 
        gradient = 0.0014, step = LM
Final estimated parameters: [-0.0583 -0.0583 1.3158 1.3397]

The algorithm runs for 139 iterations and produces parameter estimation result x* = [-0.06, -0.06, 1.32, 1.34] with ǁ_gβ‚–_ǁ β‰ˆ 0.0014 > _Ξ΅_₁. In other words, the algorithm has not successfully converged, but the iteration has been terminated because the stopping criterion ǁ_hβ‚–_ǁ ≀ _Ξ΅_β‚‚ (ǁ_xβ‚–_ǁ + _Ξ΅_β‚‚) has been met. It can also be seen that during the curve fitting of the given data, the algorithm repeatedly changed the step method to QN and vice versa to LM for the reasons described in Scenario 2. From the graph above, obviously, the solution x* found is not good enough.

By looking at the value of x during iteration, it’s observed that _x_₁ β‰ˆ _x_β‚‚ and _x_₃ β‰ˆ _x_β‚„. Hence, we can conclude that the selection of the initial value _x_β‚€ = [0, 0, 0, 0] is a bad choice because the value is symmetrical and the algorithm is not able to break the symmetry. You can try and see the same thing happens in choosing other initial symmetrical values such as _x_β‚€ = [-1, -1, -1, -1], _x_β‚€ = [1, 1, 1, 1], or _x_β‚€ = [2, 2, 2, 2]. Orrr, this is just a coincidence. What do you think?

What else do you find interesting from the experiment?

Conclusion

We’ve developed an efficient algorithm to solve a non-linear least squares problem. This algorithm combines Levenberg-Marquardt and Quasi-Newton methods, utilizing trust-region for the step choice. The algorithm can accurately find the best parameters of the model in superlinear convergence rate. However, poor parameter initialization may lead to bad results. In our study case, this algorithm finds the following two models that fit the data:

Of course, these two models are equivalent by rearranging the components of parameter x.


πŸ”₯ Hi there! If you enjoy this story and want to support me as a writer, consider becoming a member. For only $5 a month, you’ll get unlimited access to all stories on Medium. If you sign up using my link, I’ll earn a small commission.

πŸ”– Want to know more about how classical Machine Learning models work and how they optimize their parameters? Or an example of MLOps megaprojects? What about cherry-picked top-notch articles of all time? Continue reading:

Machine Learning from Scratch

Advanced Optimization Methods

MLOps Megaproject

My Best Stories

Data Science in R


[1] K. Madsen (1988): A Combined Gauss-Newton and Quasi-Newton Method for Non-Linear Least Squares. Institute for Numerical Analysis (now part of IMM), DTU. Report NI-88–10.


Related Articles