Hands-on Tutorials

Table of Contents (read till the end to see how you can get the complete python code of this story)
Β· Seeing the Big Picture
Β· Defining The Problem
Β· A Hybrid Method
β Objective Function
β General Algorithm
β Levenberg-Marquardt Step
β Quasi-Newton Step
Β· Implementation
Β· Conclusion
Seeing the Big Picture
In previous articles, we’ve seen Gradient Descent and Conjugate Gradient algorithm in action, as two of the simplest optimization method there is. We implemented line search for searching the direction to which the objective function is optimized.
There is another way to generate steps called trust-region. Trust-region methods define a region around the current iterate within which they trust the model to be an adequate representation of the objective function, and then choose the step to be the approximate minimizer of the model in this region. In effect, unlike line search, they choose the direction and length of the step simultaneously.
There is also a whole lot of other optimization methods such as Newton’s method and Quasi-Newton methods. In this article, we will use a hybrid of the Levenberg-Marquardt and Quasi-Newton method, utilizing trust-region for the step choice. Furthermore, this article brings a different problem. Instead of finding an optimizer of an objective function, we will be given a dataset and a non-linear model of the data, and then find the best optimized parameters of the model. Optimized parameters are defined by those who give the least squares of residuals. Some people might call this a curve-fitting problem.
Let’s import some libraries.
Defining The Problem
Create a fictional dataset as follows. It has 11 observations and 2 features with the first feature acting like an index of the second.
t y
------------------
[[ 0.0000 0.0000]
[ 2.0000 3.5500]
[ 4.0000 3.8200]
[ 6.0000 2.9800]
[ 8.0000 2.3200]
[10.0000 1.4800]
[12.0000 1.0200]
[14.0000 0.8100]
[16.0000 0.4100]
[18.0000 0.4200]
[20.0000 0.1500]]
Data shape: (11, 2)
Visualize the data as points (t, y) = (tα΅’, yα΅’) with i = 1, 2, …, 11 in the cartesius plane.

We immediately realize that the data has no linear trend (the point (0,0) here is not an outlier). Hence, the model that we use is also non-linear. Instead, we will use a linear combination of two exponential functions, which by definition is declared as

As mentioned before, parameters x = [xβ, xβ, xβ, xβ] will be estimated using a hybrid method of Levenberg-Marquardt (LM) dan Quasi-Newton (QN).
A Hybrid Method
Objective Function

In general, we want to build a model M(x, t) for a collection of m data points (tα΅’, yα΅’). To do that, a concrete definition of the objective function is necessary. For a least-squares problem, this objective function is the sum of squares of the residual of each data point. A data point’s residual is defined by how far the actual data to the model prediction is, that is, by subtracting the actual data point by the model prediction of that point. Mathematically speaking, the residual for the i-th data point is

or in the vector form

Since the objective function is the sum of squares of the residual of each data point, it can be written as

The coefficient Β½ above is attached for convenience in derivating F.
Let Ηβ Η be a symbol for the maximum norm of a vector. Our mission is to find x which minimizes F. To do that, we will find x at which Ηg(x)Η β 0 where g = F’ is the first derivative of F. Derivating

w.r.t. each xβ±Ό using the chain rule, we obtain

Define the Jacobian matrix

the value of g can then be calculated as

For the python function below, besides returning g, it also returns J, which will come in handy later.
General Algorithm

Let x* be the minimizer of the sum of squared residuals. The least-squares problem can be solved by general optimization methods, but we shall present special methods that are more efficient. In 1988, Madsen [1] presented a hybrid method that combines the LM method (quadratic convergence if F(x*) = 0, linear convergence otherwise) with a QN method, which gives superlinear convergence, even if F(x*) β 0.
In the trust region method, the algorithm generates a trust model L of the behavior of F in the neighborhood of the current iterate x,

where h is the step to be taken. We assume that we know a positive number Ξ such that the model is sufficiently accurate inside a ball with radius Ξ, centered at x, and the algorithm determines the step as

In a damped version, the step is determined as

where the damping parameter ΞΌ β₯ 0. The term

is introduced to penalize large steps.
From now on, we will begin iterating. Hence, we will introduce an index k which means the variable in question is for the k-th iteration. For example, xβ is the value of x at k-th iteration, and so Fβ = F(xβ), gβ = g(xβ), Jβ = J(xβ), and so on.
Before the iteration begins, it’s important to initialize the damping parameter ΞΌ = _ΞΌ_β and Hessian approximation _B_β (Hessian at iterate x is defined by F”(x) and its approximation B is used by the QN method).
A reasonable choice for _B_β is a symmetric, positive definite matrix I, that is, the identity matrix. The choice of _ΞΌ_β should be related to the size of the elements in

e.g. by letting

where _aα΅’α΅’_β° is the i-th diagonal component of _A_β, and Ο is chosen by the user. In this article, we will set Ο = 1 Γ 10β»Β³.
For the python function below, besides returning _ΞΌ_β, it also returns _A_β, which will come in handy later.
After initializing _ΞΌ_β and _B_β, the iteration starts with a series of steps with the LM method. If the performance indicates that F(x*) is significantly nonzero, then we switch to the QN method for better performance. We may get an indication that it is better to switch back to the LM method, so there is also a mechanism for that which will be covered later. The iteration keeps going until one of the following stopping criteria is satisfied:
- The norm of the gradient of the objective function is close enough to zero, that is,

- The norm of the step satisfies

- The number of steps taken is 1000
In this article, _Ξ΅_β and _Ξ΅_β will be set 1 Γ 10β»β΅.
After each step, whether it’s through the LM or QN method, Bβ is updated by first defining

If

then update Bβ by

where vβ = Bβhβ. Otherwise, Bβ stays the same, that is, Bβββ = Bβ.
The new xβββ = xβ + hβ found by the LM or QN method is then passed to the next iteration. This whole process could be written as the code below.
Now, let’s go into the detail of each step. What’s going on in the LMstep()
and QNstep()
functions above?
Levenberg-Marquardt Step
At the beginning of hybrid iterations, the LM method is used first. At k-th iteration, step hβ is found by solving

and then calculate

where Fβββ is just F(xβ + hβ).
We then consider two cases:
- If Ο± > 0, this means the iteration is a success and we can update xβββ = xβ + hβ. At this point, we need to check whether F(x*) is far from zero, that is, by checking: if Η_gβ_Η < 0.02 β Fβ for three consecutive successful iterations, then we will change to the QN method for the next iteration. Also, update

- If Ο± β€ 0, then the iteration does not succeed and we won’t update xβ, that is, xβββ = xβ. For this case, update ΞΌβββ = ΞΌβΞ½ and Ξ½ := 2Ξ½.
The update method of ΞΌ mentioned above is developed by Nielsen. We also have another variation by Marquardt as can be observed below.
The overall LM step can be seen as follows. Note that in the code below, we are passing Ξβ, which is the trust-region radius that will be used by the QN step in case we change the step calculation method to QN in future iterations. We don’t use Ξβ here.
Quasi-Newton Step
Since the QN method uses trust-region for step calculation, at the beginning when we move to the QN method, it’s important to initialize the trust-region radius

where hββ is the step acquired by the previous LM step. But don’t worry, this initialization part is not needed since we always calculate Ξβ in both LM and QN steps. As you can see, the last formula above is already embedded in the LMstep()
function line 14.
At k-th iteration, step hβ is found by solving

If Η_hβ_Η > Ξβ, then reduce the length of hβ to be Ξβ, that is, by setting

Ξβ is then updated under several conditions of Ο±, which is calculated using the same formula as that we use at LM step:
- If Ο± < 0.25, then set Ξβββ = Ξβ / 2
- If Ο± > 0.75, then set Ξβββ = max{Ξβ, 3 Η_hβ_Η}
-
Otherwise, Ξβββ = Ξβ
Next, we consider two cases:
- If Fβββ β€ (1 + Ξ΄) Fβ and Η_gβββ_Η < Η_gβ_Η, or Fβββ < Fβ, then the iteration succeed and we can update xβββ = xβ + hβ. At this point, we focus on getting g closer to zero, so we accept a slight increase in the value of F, e.g. as little as Ξ΄ = βΞ΅ times the previous F, where Ξ΅ is the machine epsilon.
- Otherwise, the iteration is considered a failure, and xβ won’t be updated, that is, xβββ = xβ.
If the gradient does not decrease fast enough, that is, Η_gβββ_Η β₯ Η_gβ_Η, then we move to the LM step for the next iteration.
The overall QN step can be seen as follows. Note that in the code below, we calculate Ο± as we do in the LM step.
Implementation
We are now ready to apply our knowledge to the problem defined at the beginning of this article. We will vary the initial point in three different scenarios:
- _x_β = [-1, 1, -10, 10]
- _x_β = [-4, 1, 2, -3]
- _x_β = [0, 0, 0, 0]
But first, let’s introduce a python function MakeAnimation()
to animate the learning process of the algorithm.
Scenario 1
Iteration:1 x = [-1.0000 0.9500 -10.0000 9.9997],
gradient = 64932772695986634752.0000, step = LM
Iteration:2 x = [-1.0000 0.9001 -10.0000 9.9994],
gradient = 8871075343566318592.0000, step = LM
Iteration:3 x = [-1.0000 0.8510 -10.0000 9.9992],
gradient = 1250145055141384192.0000, step = LM
Iteration:4 x = [-1.0000 0.8050 -10.0000 9.9990],
gradient = 199964648437467488.0000, step = LM
Iteration:5 x = [-1.0000 0.7686 -10.0000 9.9988],
gradient = 46950703393293552.0000, step = LM
Iteration:6 x = [-1.0000 0.7454 -10.0000 9.9987],
gradient = 18613190948807260.0000, step = LM
Iteration:7 x = [-1.0000 0.7283 -10.0000 9.9986],
gradient = 9421138350386658.0000, step = LM
Iteration:8 x = [-1.0000 0.7127 -10.0000 9.9985],
gradient = 5067642599353893.0000, step = LM
Iteration:9 x = [-1.0000 0.6974 -10.0000 9.9984],
gradient = 2761173975265880.5000, step = LM
Iteration:10 x = [-1.0000 0.6822 -10.0000 9.9983],
gradient = 1508437660486741.0000, step = LM
Iteration:11 x = [-1.0000 0.6670 -10.0000 9.9983],
gradient = 824513384402736.2500, step = LM
Iteration:12 x = [-1.0000 0.6518 -10.0000 9.9982],
gradient = 450734053994430.5000, step = LM
Iteration:13 x = [-1.0000 0.6366 -10.0000 9.9981],
gradient = 246409971021028.8438, step = LM
Iteration:14 x = [-1.0000 0.6214 -10.0000 9.9980],
gradient = 134711382454059.4531, step = LM
Iteration:15 x = [-1.0000 0.6062 -10.0000 9.9980],
gradient = 73647395414745.6094, step = LM
Iteration:16 x = [-1.0000 0.5909 -10.0000 9.9979],
gradient = 40264084923738.5469, step = LM
Iteration:17 x = [-1.0000 0.5757 -10.0000 9.9978],
gradient = 22013352246245.9766, step = LM
Iteration:18 x = [-1.0000 0.5605 -10.0000 9.9977],
gradient = 12035471978657.0332, step = LM
Iteration:19 x = [-1.0000 0.5452 -10.0000 9.9976],
gradient = 6580356674017.7891, step = LM
Iteration:20 x = [-1.0000 0.5299 -10.0000 9.9976],
gradient = 3597874150520.1475, step = LM
Iteration:21 x = [-1.0000 0.5146 -10.0000 9.9975],
gradient = 1967223429070.2634, step = LM
Iteration:22 x = [-1.0000 0.4993 -10.0000 9.9974],
gradient = 1075656653534.5968, step = LM
Iteration:23 x = [-1.0000 0.4840 -10.0000 9.9973],
gradient = 588175745878.2034, step = LM
Iteration:24 x = [-1.0000 0.4687 -10.0000 9.9973],
gradient = 321629128815.2425, step = LM
Iteration:25 x = [-1.0000 0.4534 -10.0000 9.9972],
gradient = 175881428072.9430, step = LM
Iteration:26 x = [-1.0000 0.4380 -10.0000 9.9971],
gradient = 96183963098.1861, step = LM
Iteration:27 x = [-1.0000 0.4226 -10.0000 9.9970],
gradient = 52602379508.8333, step = LM
Iteration:28 x = [-1.0000 0.4072 -10.0000 9.9969],
gradient = 28769372395.6593, step = LM
Iteration:29 x = [-1.0000 0.3918 -10.0000 9.9969],
gradient = 15735488020.7650, step = LM
Iteration:30 x = [-1.0000 0.3763 -10.0000 9.9968],
gradient = 8607119032.2862, step = LM
Iteration:31 x = [-1.0000 0.3609 -10.0000 9.9967],
gradient = 4708326165.4597, step = LM
Iteration:32 x = [-1.0000 0.3454 -10.0000 9.9966],
gradient = 2575789155.9012, step = LM
Iteration:33 x = [-1.0000 0.3298 -10.0000 9.9965],
gradient = 1409268155.2911, step = LM
Iteration:34 x = [-1.0000 0.3142 -10.0000 9.9965],
gradient = 771119669.0625, step = LM
Iteration:35 x = [-1.0000 0.2986 -10.0000 9.9964],
gradient = 421988746.9021, step = LM
Iteration:36 x = [-1.0000 0.2829 -10.0000 9.9963],
gradient = 230960695.7163, step = LM
Iteration:37 x = [-1.0000 0.2672 -10.0000 9.9962],
gradient = 126427578.6262, step = LM
Iteration:38 x = [-1.0000 0.2514 -10.0000 9.9961],
gradient = 69218466.4778, step = LM
Iteration:39 x = [-1.0000 0.2356 -10.0000 9.9960],
gradient = 37904445.7661, step = LM
Iteration:40 x = [-1.0000 0.2197 -10.0000 9.9960],
gradient = 20761560.5190, step = LM
Iteration:41 x = [-1.0000 0.2036 -10.0000 9.9959],
gradient = 11374897.3379, step = LM
Iteration:42 x = [-1.0000 0.1875 -10.0000 9.9958],
gradient = 6234064.0376, step = LM
Iteration:43 x = [-1.0000 0.1713 -10.0000 9.9957],
gradient = 3417848.7040, step = LM
Iteration:44 x = [-1.0000 0.1550 -10.0000 9.9956],
gradient = 1874634.8523, step = LM
Iteration:45 x = [-1.0000 0.1385 -10.0000 9.9955],
gradient = 1028702.8782, step = LM
Iteration:46 x = [-1.0000 0.1218 -10.0000 9.9954],
gradient = 564808.7854, step = LM
Iteration:47 x = [-1.0000 0.1049 -10.0000 9.9953],
gradient = 310298.2705, step = LM
Iteration:48 x = [-1.0000 0.0877 -10.0000 9.9952],
gradient = 170587.3249, step = LM
Iteration:49 x = [-1.0000 0.0703 -10.0000 9.9951],
gradient = 93845.4326, step = LM
Iteration:50 x = [-1.0000 0.0524 -10.0000 9.9950],
gradient = 51660.5782, step = LM
Iteration:51 x = [-1.0000 0.0341 -10.0000 9.9949],
gradient = 28451.8639, step = LM
Iteration:52 x = [-1.0000 0.0153 -10.0000 9.9947],
gradient = 15670.8490, step = LM
Iteration:53 x = [-0.9999 -0.0043 -10.0000 9.9946],
gradient = 8624.8546, step = LM
Iteration:54 x = [-0.9999 -0.0246 -10.0000 9.9944],
gradient = 4736.1605, step = LM
Iteration:55 x = [-0.9998 -0.0461 -10.0000 9.9941],
gradient = 2587.7373, step = LM
Iteration:56 x = [-0.9996 -0.0687 -10.0000 9.9939],
gradient = 1399.9325, step = LM
Iteration:57 x = [-0.9993 -0.0928 -10.0000 9.9936],
gradient = 743.3551, step = LM
Iteration:58 x = [-0.9987 -0.1184 -10.0001 9.9931],
gradient = 381.2687, step = LM
Iteration:59 x = [-0.9976 -0.1456 -10.0001 9.9925],
gradient = 183.0379, step = LM
Iteration:60 x = [-0.9952 -0.1736 -10.0002 9.9917],
gradient = 76.6425, step = LM
Iteration:61 x = [-0.9894 -0.1972 -10.0005 9.9901],
gradient = 26.4382, step = LM
Iteration:62 x = [-0.9744 -0.2105 -10.0012 9.9871],
gradient = 6.8472, step = LM
Iteration:63 x = [-0.9336 -0.2136 -10.0030 9.9797],
gradient = 4.4702, step = LM
Iteration:64 x = [-0.8269 -0.2088 -10.0070 9.9631],
gradient = 4.2573, step = LM
Iteration:65 x = [-0.6379 -0.1953 -10.0103 9.9430],
gradient = 6.5882, step = LM
Iteration:66 x = [-0.5576 -0.1832 -10.0023 9.9499],
gradient = 1.3240, step = LM
Iteration:67 x = [-0.5559 -0.1827 -9.9933 9.9681],
gradient = 0.0227, step = LM
Iteration:68 x = [-0.5544 -0.1829 -10.0000 9.9888],
gradient = 0.0087, step = LM
Iteration:69 x = [-0.5527 -0.1832 -10.0389 10.0305],
gradient = 0.0117, step = LM
Iteration:70 x = [-0.5482 -0.1843 -10.1523 10.1442],
gradient = 0.0548, step = LM
Iteration:71 x = [-0.5372 -0.1868 -10.4402 10.4325],
gradient = 0.2795, step = LM
Iteration:72 x = [-0.5173 -0.1916 -11.0244 11.0175],
gradient = 1.0133, step = LM
Iteration:73 x = [-0.5025 -0.1958 -11.5877 11.5812],
gradient = 0.7806, step = LM
Iteration:74 x = [-0.4883 -0.2000 -12.1891 12.1835],
gradient = 0.8407, step = LM
Iteration:75 x = [-0.4787 -0.2030 -12.6707 12.6659],
gradient = 0.4784, step = LM
Iteration:76 x = [-0.4704 -0.2057 -13.1195 13.1156],
gradient = 0.4040, step = LM
Iteration:77 x = [-0.4656 -0.2073 -13.4055 13.4021],
gradient = 0.1502, step = LM
Iteration:78 x = [-0.4629 -0.2083 -13.5758 13.5728],
gradient = 0.0525, step = LM
Iteration:79 x = [-0.4622 -0.2085 -13.6183 13.6154],
gradient = 0.0030, step = LM
Iteration:80 x = [-0.4622 -0.2085 -13.6218 13.6188],
gradient = 0.0000, step = LM
Iteration:81 x = [-0.4622 -0.2085 -13.6218 13.6188],
gradient = 0.0000, step = LM
Final estimated parameters: [-0.4622 -0.2085 -13.6218 13.6188]
The algorithm runs for 81 iterations and produces parameter estimation result x* = [-0.46, -0.21, -13.62, 13.62] with Ηg(x*)Η β 0. In other words, the algorithm successfully converges with the stopping criterion Η_gβ_Η β€ _Ξ΅_β being met. It can also be seen that the algorithm always uses the LM method in curve-fitting the given data. This means that for every 3 successive iterations, there is one iterate xβ that satisfies Η_gβ_Η β₯ 0.02 β Fβ.
Scenario 2
Iteration:1 x = [-4.0000 0.9500 2.0000 -2.9990],
gradient = 5843944048763837440.0000, step = LM
Iteration:2 x = [-4.0000 0.9001 2.0000 -2.9981],
gradient = 798393836723220864.0000, step = LM
Iteration:3 x = [-4.0000 0.8510 2.0000 -2.9973],
gradient = 112511464378970688.0000, step = LM
Iteration:4 x = [-4.0000 0.8050 2.0000 -2.9965],
gradient = 17995869734728704.0000, step = LM
Iteration:5 x = [-4.0000 0.7687 2.0000 -2.9959],
gradient = 4224958744295105.0000, step = LM
Iteration:6 x = [-4.0000 0.7454 2.0000 -2.9955],
gradient = 1674832246409442.5000, step = LM
Iteration:7 x = [-4.0000 0.7283 2.0000 -2.9952],
gradient = 847705835813273.6250, step = LM
Iteration:8 x = [-4.0000 0.7128 2.0000 -2.9950],
gradient = 455980000478269.1875, step = LM
Iteration:9 x = [-4.0000 0.6975 2.0000 -2.9947],
gradient = 248446618239291.2812, step = LM
Iteration:10 x = [-4.0000 0.6823 2.0000 -2.9945],
gradient = 135727074063636.5469, step = LM
Iteration:11 x = [-4.0000 0.6671 2.0000 -2.9942],
gradient = 74188516827416.2656, step = LM
Iteration:12 x = [-4.0000 0.6519 2.0000 -2.9940],
gradient = 40556383042130.0234, step = LM
Iteration:13 x = [-4.0000 0.6367 2.0000 -2.9937],
gradient = 22171597557325.9102, step = LM
Iteration:14 x = [-4.0000 0.6215 2.0000 -2.9934],
gradient = 12121123175141.8398, step = LM
Iteration:15 x = [-4.0000 0.6063 2.0000 -2.9932],
gradient = 6626677353862.0400, step = LM
Iteration:16 x = [-4.0000 0.5910 2.0000 -2.9929],
gradient = 3622898207455.6821, step = LM
Iteration:17 x = [-4.0000 0.5758 2.0000 -2.9927],
gradient = 1980725783949.0017, step = LM
Iteration:18 x = [-4.0000 0.5606 2.0000 -2.9924],
gradient = 1082932002365.9087, step = LM
Iteration:19 x = [-4.0000 0.5453 2.0000 -2.9922],
gradient = 592089574010.1783, step = LM
Iteration:20 x = [-4.0000 0.5300 2.0000 -2.9919],
gradient = 323730717048.8542, step = LM
Iteration:21 x = [-4.0000 0.5148 2.0000 -2.9916],
gradient = 177007495788.6531, step = LM
Iteration:22 x = [-4.0000 0.4995 2.0000 -2.9914],
gradient = 96785827387.7305, step = LM
Iteration:23 x = [-4.0000 0.4842 2.0000 -2.9911],
gradient = 52923125898.6776, step = LM
Iteration:24 x = [-4.0000 0.4688 2.0000 -2.9909],
gradient = 28939714442.5817, step = LM
Iteration:25 x = [-4.0000 0.4535 2.0000 -2.9906],
gradient = 15825580540.4678, step = LM
Iteration:26 x = [-4.0000 0.4381 2.0000 -2.9903],
gradient = 8654531799.0783, step = LM
Iteration:27 x = [-4.0000 0.4228 2.0000 -2.9901],
gradient = 4733127210.2534, step = LM
Iteration:28 x = [-4.0000 0.4074 2.0000 -2.9898],
gradient = 2588665603.7885, step = LM
Iteration:29 x = [-4.0000 0.3919 2.0000 -2.9895],
gradient = 1415891078.6356, step = LM
Iteration:30 x = [-4.0000 0.3765 2.0000 -2.9893],
gradient = 774485568.1282, step = LM
Iteration:31 x = [-4.0000 0.3610 2.0000 -2.9890],
gradient = 423672773.1613, step = LM
Iteration:32 x = [-4.0000 0.3455 2.0000 -2.9887],
gradient = 231785640.8201, step = LM
Iteration:33 x = [-4.0000 0.3300 2.0000 -2.9885],
gradient = 126819882.7775, step = LM
Iteration:34 x = [-4.0000 0.3144 2.0000 -2.9882],
gradient = 69396974.7767, step = LM
Iteration:35 x = [-4.0000 0.2988 2.0000 -2.9879],
gradient = 37980046.8186, step = LM
Iteration:36 x = [-4.0000 0.2831 2.0000 -2.9876],
gradient = 20789502.5936, step = LM
Iteration:37 x = [-4.0000 0.2674 2.0000 -2.9874],
gradient = 11382078.3240, step = LM
Iteration:38 x = [-4.0000 0.2516 2.0000 -2.9871],
gradient = 6233151.4407, step = LM
Iteration:39 x = [-4.0000 0.2358 2.0000 -2.9868],
gradient = 3414510.4438, step = LM
Iteration:40 x = [-4.0000 0.2198 2.0000 -2.9865],
gradient = 1871192.9516, step = LM
Iteration:41 x = [-4.0000 0.2038 2.0000 -2.9862],
gradient = 1025947.0256, step = LM
Iteration:42 x = [-4.0000 0.1877 2.0000 -2.9859],
gradient = 562874.5612, step = LM
Iteration:43 x = [-4.0000 0.1714 2.0000 -2.9856],
gradient = 309077.3735, step = LM
Iteration:44 x = [-4.0000 0.1550 2.0000 -2.9853],
gradient = 169908.6045, step = LM
Iteration:45 x = [-4.0000 0.1384 2.0000 -2.9850],
gradient = 93547.0603, step = LM
Iteration:46 x = [-4.0000 0.1215 2.0000 -2.9847],
gradient = 51612.8671, step = LM
Iteration:47 x = [-4.0000 0.1044 2.0000 -2.9843],
gradient = 28559.4009, step = LM
Iteration:48 x = [-4.0000 0.0869 2.0000 -2.9839],
gradient = 15867.2252, step = LM
Iteration:49 x = [-4.0000 0.0688 2.0000 -2.9835],
gradient = 8865.7762, step = LM
Iteration:50 x = [-4.0000 0.0500 2.0000 -2.9830],
gradient = 4993.1844, step = LM
Iteration:51 x = [-4.0000 0.0303 2.0000 -2.9825],
gradient = 2843.2836, step = LM
Iteration:52 x = [-4.0000 0.0092 2.0000 -2.9818],
gradient = 1643.5749, step = LM
Iteration:53 x = [-4.0000 -0.0138 2.0000 -2.9809],
gradient = 969.1664, step = LM
Iteration:54 x = [-4.0000 -0.0397 2.0001 -2.9798],
gradient = 585.9656, step = LM
Iteration:55 x = [-4.0000 -0.0695 2.0001 -2.9782],
gradient = 364.7187, step = LM
Iteration:56 x = [-4.0000 -0.1047 2.0003 -2.9760],
gradient = 233.9465, step = LM
Iteration:57 x = [-4.0000 -0.1465 2.0005 -2.9730],
gradient = 154.2224, step = LM
Iteration:58 x = [-4.0000 -0.1954 2.0008 -2.9690],
gradient = 103.9455, step = LM
Iteration:59 x = [-4.0000 -0.2516 2.0014 -2.9638],
gradient = 71.2219, step = LM
Iteration:60 x = [-4.0000 -0.3152 2.0023 -2.9571],
gradient = 49.3390, step = LM
Iteration:61 x = [-4.0000 -0.3865 2.0037 -2.9486],
gradient = 34.3994, step = LM
Iteration:62 x = [-4.0000 -0.4659 2.0060 -2.9378],
gradient = 24.0583, step = LM
Iteration:63 x = [-3.9999 -0.5540 2.0095 -2.9242],
gradient = 16.8404, step = LM
Iteration:64 x = [-3.9999 -0.6510 2.0148 -2.9069],
gradient = 11.7791, step = LM
Iteration:65 x = [-3.9998 -0.7574 2.0228 -2.8848],
gradient = 8.2216, step = LM
Iteration:66 x = [-3.9998 -0.8733 2.0348 -2.8567],
gradient = 5.7186, step = LM
Iteration:67 x = [-3.9996 -0.9990 2.0522 -2.8209],
gradient = 3.9571, step = LM
Iteration:68 x = [-3.9995 -1.1349 2.0768 -2.7754],
gradient = 2.7180, step = LM
Iteration:69 x = [-3.9992 -1.2813 2.1103 -2.7183],
gradient = 1.8476, step = LM
Iteration:70 x = [-3.9987 -1.4390 2.1533 -2.6485],
gradient = 1.2392, step = LM
Iteration:71 x = [-3.9980 -1.6079 2.2038 -2.5681],
gradient = 0.8190, step = LM
Iteration:72 x = [-3.9968 -1.7865 2.2547 -2.4842],
gradient = 0.5357, step = LM
Iteration:73 x = [-3.9949 -1.9699 2.2945 -2.4089],
gradient = 0.3513, step = LM
Iteration:74 x = [-3.9920 -2.1517 2.3148 -2.3524],
gradient = 0.2345, step = LM
Iteration:75 x = [-3.9878 -2.3283 2.3156 -2.3155],
gradient = 0.1603, step = LM
Iteration:76 x = [-3.9863 -2.3635 2.3124 -2.3081],
gradient = 0.1487, step = QN
Iteration:77 x = [-3.9803 -2.4682 2.2972 -2.2863],
gradient = 0.1189, step = QN
Iteration:78 x = [-3.9577 -2.7803 2.2395 -2.2211],
gradient = 0.0613, step = QN
Iteration:79 x = [-3.9319 -3.0541 2.1785 -2.1639],
gradient = 0.0344, step = QN
Iteration:80 x = [-3.9000 -3.3271 2.1135 -2.1068],
gradient = 0.0193, step = QN
Iteration:81 x = [-3.8784 -3.4807 2.0776 -2.0748],
gradient = 0.0140, step = QN
Iteration:82 x = [-3.8657 -3.5574 2.0604 -2.0588],
gradient = 0.0119, step = QN
Iteration:83 x = [-3.8510 -3.6338 2.0437 -2.0428],
gradient = 0.0101, step = QN
Iteration:84 x = [-3.8335 -3.7098 2.0276 -2.0269],
gradient = 0.0086, step = QN
Iteration:85 x = [-3.8227 -3.7473 2.0198 -2.0191],
gradient = 0.0080, step = QN
Iteration:86 x = [-3.8090 -3.7840 2.0123 -2.0114],
gradient = 0.0074, step = QN
Iteration:87 x = [-3.7911 -3.8190 2.0053 -2.0041],
gradient = 0.0073, step = QN
Iteration:88 x = [-3.7664 -3.8489 1.9996 -1.9979],
gradient = 0.0076, step = QN
Iteration:89 x = [-3.7581 -3.8559 1.9998 -1.9981],
gradient = 0.0077, step = LM
Iteration:90 x = [-3.7330 -3.8766 2.0004 -1.9986],
gradient = 0.0081, step = LM
Iteration:91 x = [-3.6535 -3.9361 2.0028 -2.0011],
gradient = 0.0096, step = LM
Iteration:92 x = [-3.3739 -4.0948 2.0180 -2.0161],
gradient = 0.0168, step = LM
Iteration:93 x = [-1.8970 -4.4431 2.1579 -2.1553],
gradient = 0.3573, step = LM
Iteration:94 x = [-1.8970 -4.4431 2.1579 -2.1553],
gradient = 0.3573, step = LM
Iteration:95 x = [-1.8970 -4.4431 2.1579 -2.1553],
gradient = 0.3573, step = LM
Iteration:96 x = [-1.8970 -4.4431 2.1579 -2.1553],
gradient = 0.3573, step = LM
Iteration:97 x = [-0.4821 -4.4514 2.3330 -2.2979],
gradient = 12.9667, step = LM
Iteration:98 x = [-0.4821 -4.4514 2.3330 -2.2979],
gradient = 12.9667, step = LM
Iteration:99 x = [-0.4821 -4.4514 2.3330 -2.2979],
gradient = 12.9667, step = LM
Iteration:100 x = [-0.4821 -4.4514 2.3330 -2.2979],
gradient = 12.9667, step = LM
Iteration:101 x = [-0.4821 -4.4514 2.3330 -2.2979],
gradient = 12.9667, step = LM
Iteration:102 x = [-0.3361 -4.4514 2.3517 -2.2985],
gradient = 23.2634, step = LM
Iteration:103 x = [-0.3361 -4.4514 2.3517 -2.2985],
gradient = 23.2634, step = LM
Iteration:104 x = [-0.0195 -4.4514 2.3844 -2.3000],
gradient = 141.8487, step = LM
Iteration:105 x = [-0.0542 -4.4514 2.5491 -2.3129],
gradient = 2.7825, step = LM
Iteration:106 x = [-0.0622 -4.4515 2.7408 -2.3468],
gradient = 3.8567, step = LM
Iteration:107 x = [-0.0733 -4.4516 3.1240 -2.5063],
gradient = 5.6536, step = LM
Iteration:108 x = [-0.0887 -4.4519 3.7063 -3.0302],
gradient = 7.1668, step = LM
Iteration:109 x = [-0.1056 -4.4518 4.4259 -4.0061],
gradient = 6.3086, step = LM
Iteration:110 x = [-0.1190 -4.4484 5.0608 -4.9286],
gradient = 3.4276, step = LM
Iteration:111 x = [-0.1236 -4.4311 5.3256 -5.3076],
gradient = 0.7579, step = LM
Iteration:112 x = [-0.1239 -4.3717 5.3572 -5.3565],
gradient = 0.0396, step = LM
Iteration:113 x = [-0.1239 -4.1687 5.3583 -5.3584],
gradient = 0.0016, step = LM
Iteration:114 x = [-0.1240 -3.2559 5.3610 -5.3611],
gradient = 0.0427, step = LM
Iteration:115 x = [-0.1240 -3.2559 5.3610 -5.3611],
gradient = 0.0427, step = LM
Iteration:116 x = [-0.1240 -3.2559 5.3610 -5.3611],
gradient = 0.0427, step = LM
Iteration:117 x = [-0.1248 -1.1768 5.4018 -5.4026],
gradient = 4.7877, step = LM
Iteration:118 x = [-0.1248 -1.1768 5.4018 -5.4026],
gradient = 4.7877, step = LM
Iteration:119 x = [-0.1248 -1.1768 5.4018 -5.4026],
gradient = 4.7877, step = LM
Iteration:120 x = [-0.1248 -1.1768 5.4018 -5.4026],
gradient = 4.7877, step = LM
Iteration:121 x = [-0.1248 -1.1768 5.4018 -5.4026],
gradient = 4.7877, step = LM
Iteration:122 x = [-0.1244 -1.1290 5.5183 -5.4517],
gradient = 0.1904, step = LM
Iteration:123 x = [-0.1292 -1.0057 5.7614 -5.6610],
gradient = 1.1816, step = LM
Iteration:124 x = [-0.1382 -0.8383 6.2791 -6.1944],
gradient = 3.1482, step = LM
Iteration:125 x = [-0.1464 -0.7806 6.8138 -6.7630],
gradient = 1.7846, step = LM
Iteration:126 x = [-0.1599 -0.6524 7.7451 -7.7208],
gradient = 5.7805, step = LM
Iteration:127 x = [-0.1658 -0.6428 8.2978 -8.2944],
gradient = 0.8161, step = LM
Iteration:128 x = [-0.1770 -0.5710 9.2787 -9.2788],
gradient = 4.5341, step = LM
Iteration:129 x = [-0.1818 -0.5588 9.8477 -9.8546],
gradient = 0.7032, step = LM
Iteration:130 x = [-0.1902 -0.5204 10.7897 -10.7955],
gradient = 2.9184, step = LM
Iteration:131 x = [-0.1941 -0.5090 11.3439 -11.3510],
gradient = 0.6201, step = LM
Iteration:132 x = [-0.1999 -0.4878 12.1563 -12.1617],
gradient = 1.6519, step = LM
Iteration:133 x = [-0.2027 -0.4799 12.6128 -12.6178],
gradient = 0.3816, step = LM
Iteration:134 x = [-0.2059 -0.4697 13.1434 -13.1472],
gradient = 0.5800, step = LM
Iteration:135 x = [-0.2073 -0.4656 13.4035 -13.4068],
gradient = 0.1173, step = LM
Iteration:136 x = [-0.2082 -0.4629 13.5699 -13.5729],
gradient = 0.0504, step = LM
Iteration:137 x = [-0.2085 -0.4622 13.6148 -13.6177],
gradient = 0.0034, step = LM
Iteration:138 x = [-0.2085 -0.4622 13.6188 -13.6218],
gradient = 0.0000, step = LM
Iteration:139 x = [-0.2085 -0.4622 13.6188 -13.6218],
gradient = 0.0000, step = LM
Final estimated parameters: [-0.2085 -0.4622 13.6188 -13.6218]
The algorithm runs as many as 139 iterations and produces parameter estimation result x* = [-0.21, -0.46, 13.62, -13.62] with Ηg(x*)Η β 0. In other words, the algorithm successfully converges with the stopping criterion Η_gβ_Η β€ _Ξ΅_β being met. It can also be seen that during the curve fitting of the given data, the algorithm changes the step method to QN at iterations 76β88 before changing back to LM. This means that in the last 3 iterations before iteration 76, no iterate xβ satisfies Η_gβ_Η β₯ 0.02 β Fβ so that the QN method is considered able to converge more quickly than the LM method after iteration 76. Then, at iteration 88, we get Η_gβββ_Η β₯ Η_gβ_Η which means the gradient does not decrease fast enough using the QN method so that the method switches to LM for the next iteration.
Because the initial value _x_β is quite far from the solution, more iterations are needed than that in Scenario 1. Also, note that so far there are two solutions that satisfy the convergence value x*, namely x* = [-0.46, -0.21, -13.62, 13.62] and x* = [-0.21, -0.46, 13.62, -13.62].
Scenario 3
Iteration:1 x = [0.0000 0.0000 0.7705 0.7705],
gradient = 42.5292, step = LM
Iteration:2 x = [-0.0815 -0.0815 1.3987 1.3987],
gradient = 16.7516, step = LM
Iteration:3 x = [-0.0400 -0.0400 1.2150 1.2150],
gradient = 18.3468, step = LM
Iteration:4 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:5 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:6 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:7 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:8 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:9 x = [-0.0607 -0.0695 1.3851 1.3851],
gradient = 4.8184, step = LM
Iteration:10 x = [-0.0631 -0.0584 1.3830 1.3830],
gradient = 1.3008, step = LM
Iteration:11 x = [-0.0596 -0.0627 1.3811 1.3811],
gradient = 0.4178, step = LM
Iteration:12 x = [-0.0620 -0.0601 1.3793 1.3792],
gradient = 0.2770, step = LM
Iteration:13 x = [-0.0604 -0.0616 1.3775 1.3775],
gradient = 0.2522, step = LM
Iteration:14 x = [-0.0613 -0.0605 1.3758 1.3757],
gradient = 0.2432, step = LM
Iteration:15 x = [-0.0604 -0.0612 1.3737 1.3737],
gradient = 0.2332, step = LM
Iteration:16 x = [-0.0615 -0.0597 1.3708 1.3708],
gradient = 0.2541, step = LM
Iteration:17 x = [-0.0582 -0.0627 1.3678 1.3677],
gradient = 0.6396, step = LM
Iteration:18 x = [-0.0582 -0.0627 1.3678 1.3677],
gradient = 0.6396, step = LM
Iteration:19 x = [-0.0607 -0.0600 1.3668 1.3668],
gradient = 0.1995, step = LM
Iteration:20 x = [-0.0603 -0.0604 1.3658 1.3658],
gradient = 0.1930, step = LM
Iteration:21 x = [-0.0603 -0.0601 1.3639 1.3638],
gradient = 0.1840, step = LM
Iteration:22 x = [-0.0593 -0.0605 1.3587 1.3586],
gradient = 0.2076, step = LM
Iteration:23 x = [-0.0593 -0.0605 1.3587 1.3586],
gradient = 0.2076, step = LM
Iteration:24 x = [-0.0593 -0.0605 1.3587 1.3586],
gradient = 0.2076, step = LM
Iteration:25 x = [-0.0608 -0.0590 1.3571 1.3570],
gradient = 0.2311, step = LM
Iteration:26 x = [-0.0584 -0.0611 1.3556 1.3555],
gradient = 0.3672, step = LM
Iteration:27 x = [-0.0609 -0.0586 1.3545 1.3544],
gradient = 0.3171, step = LM
Iteration:28 x = [-0.0590 -0.0603 1.3536 1.3536],
gradient = 0.1788, step = LM
Iteration:29 x = [-0.0600 -0.0593 1.3528 1.3527],
gradient = 0.1280, step = LM
Iteration:30 x = [-0.0594 -0.0598 1.3519 1.3519],
gradient = 0.1237, step = LM
Iteration:31 x = [-0.0597 -0.0594 1.3510 1.3510],
gradient = 0.1192, step = LM
Iteration:32 x = [-0.0597 -0.0594 1.3508 1.3510],
gradient = 0.1183, step = QN
Iteration:33 x = [-0.0597 -0.0594 1.3500 1.3510],
gradient = 0.1155, step = QN
Iteration:34 x = [-0.0596 -0.0594 1.3475 1.3510],
gradient = 0.1084, step = QN
Iteration:35 x = [-0.0592 -0.0594 1.3401 1.3510],
gradient = 0.1055, step = QN
Iteration:36 x = [-0.0589 -0.0594 1.3328 1.3510],
gradient = 0.2028, step = QN
Iteration:37 x = [-0.0595 -0.0585 1.3319 1.3501],
gradient = 0.1538, step = LM
Iteration:38 x = [-0.0595 -0.0585 1.3319 1.3501],
gradient = 0.1538, step = LM
Iteration:39 x = [-0.0587 -0.0592 1.3314 1.3496],
gradient = 0.0704, step = LM
Iteration:40 x = [-0.0591 -0.0588 1.3310 1.3492],
gradient = 0.0640, step = LM
Iteration:41 x = [-0.0588 -0.0590 1.3306 1.3488],
gradient = 0.0618, step = LM
Iteration:42 x = [-0.0588 -0.0590 1.3304 1.3488],
gradient = 0.0615, step = QN
Iteration:43 x = [-0.0588 -0.0590 1.3300 1.3488],
gradient = 0.0610, step = QN
Iteration:44 x = [-0.0587 -0.0590 1.3288 1.3488],
gradient = 0.0582, step = QN
Iteration:45 x = [-0.0586 -0.0590 1.3275 1.3488],
gradient = 0.0571, step = QN
Iteration:46 x = [-0.0586 -0.0590 1.3263 1.3488],
gradient = 0.0554, step = QN
Iteration:47 x = [-0.0585 -0.0590 1.3251 1.3488],
gradient = 0.0646, step = QN
Iteration:48 x = [-0.0590 -0.0585 1.3247 1.3484],
gradient = 0.0576, step = LM
Iteration:49 x = [-0.0585 -0.0589 1.3243 1.3481],
gradient = 0.0494, step = LM
Iteration:50 x = [-0.0589 -0.0585 1.3240 1.3477],
gradient = 0.0455, step = LM
Iteration:51 x = [-0.0585 -0.0588 1.3237 1.3474],
gradient = 0.0409, step = LM
Iteration:52 x = [-0.0588 -0.0585 1.3233 1.3471],
gradient = 0.0393, step = LM
Iteration:53 x = [-0.0585 -0.0588 1.3230 1.3468],
gradient = 0.0377, step = LM
Iteration:54 x = [-0.0588 -0.0585 1.3228 1.3465],
gradient = 0.0363, step = LM
Iteration:55 x = [-0.0585 -0.0587 1.3225 1.3462],
gradient = 0.0348, step = LM
Iteration:56 x = [-0.0587 -0.0585 1.3222 1.3459],
gradient = 0.0335, step = LM
Iteration:57 x = [-0.0585 -0.0587 1.3219 1.3457],
gradient = 0.0321, step = LM
Iteration:58 x = [-0.0587 -0.0585 1.3217 1.3454],
gradient = 0.0307, step = LM
Iteration:59 x = [-0.0584 -0.0587 1.3214 1.3452],
gradient = 0.0294, step = LM
Iteration:60 x = [-0.0587 -0.0584 1.3212 1.3449],
gradient = 0.0319, step = LM
Iteration:61 x = [-0.0584 -0.0587 1.3209 1.3447],
gradient = 0.0330, step = LM
Iteration:62 x = [-0.0587 -0.0584 1.3207 1.3445],
gradient = 0.0357, step = LM
Iteration:63 x = [-0.0584 -0.0587 1.3205 1.3442],
gradient = 0.0362, step = LM
Iteration:64 x = [-0.0586 -0.0584 1.3203 1.3440],
gradient = 0.0363, step = LM
Iteration:65 x = [-0.0584 -0.0586 1.3201 1.3439],
gradient = 0.0321, step = LM
Iteration:66 x = [-0.0586 -0.0584 1.3199 1.3437],
gradient = 0.0281, step = LM
Iteration:67 x = [-0.0584 -0.0586 1.3198 1.3435],
gradient = 0.0228, step = LM
Iteration:68 x = [-0.0585 -0.0584 1.3196 1.3434],
gradient = 0.0203, step = LM
Iteration:69 x = [-0.0584 -0.0585 1.3195 1.3432],
gradient = 0.0195, step = LM
Iteration:70 x = [-0.0585 -0.0584 1.3193 1.3431],
gradient = 0.0188, step = LM
Iteration:71 x = [-0.0584 -0.0585 1.3192 1.3429],
gradient = 0.0181, step = LM
Iteration:72 x = [-0.0585 -0.0584 1.3190 1.3428],
gradient = 0.0174, step = LM
Iteration:73 x = [-0.0584 -0.0585 1.3189 1.3426],
gradient = 0.0166, step = LM
Iteration:74 x = [-0.0585 -0.0583 1.3187 1.3425],
gradient = 0.0175, step = LM
Iteration:75 x = [-0.0583 -0.0585 1.3186 1.3423],
gradient = 0.0212, step = LM
Iteration:76 x = [-0.0585 -0.0583 1.3185 1.3422],
gradient = 0.0264, step = LM
Iteration:77 x = [-0.0583 -0.0585 1.3183 1.3421],
gradient = 0.0264, step = LM
Iteration:78 x = [-0.0585 -0.0583 1.3182 1.3420],
gradient = 0.0201, step = LM
Iteration:79 x = [-0.0583 -0.0584 1.3182 1.3419],
gradient = 0.0130, step = LM
Iteration:80 x = [-0.0584 -0.0583 1.3181 1.3418],
gradient = 0.0124, step = LM
Iteration:81 x = [-0.0584 -0.0584 1.3180 1.3417],
gradient = 0.0120, step = LM
Iteration:82 x = [-0.0584 -0.0583 1.3179 1.3416],
gradient = 0.0115, step = LM
Iteration:83 x = [-0.0583 -0.0584 1.3178 1.3415],
gradient = 0.0110, step = LM
Iteration:84 x = [-0.0584 -0.0583 1.3176 1.3414],
gradient = 0.0186, step = LM
Iteration:85 x = [-0.0584 -0.0583 1.3176 1.3414],
gradient = 0.0186, step = LM
Iteration:86 x = [-0.0583 -0.0584 1.3176 1.3413],
gradient = 0.0112, step = LM
Iteration:87 x = [-0.0584 -0.0583 1.3175 1.3412],
gradient = 0.0095, step = LM
Iteration:88 x = [-0.0583 -0.0584 1.3174 1.3412],
gradient = 0.0092, step = LM
Iteration:89 x = [-0.0583 -0.0584 1.3174 1.3412],
gradient = 0.0090, step = QN
Iteration:90 x = [-0.0583 -0.0584 1.3173 1.3412],
gradient = 0.0260, step = QN
Iteration:91 x = [-0.0584 -0.0583 1.3173 1.3411],
gradient = 0.0085, step = LM
Iteration:92 x = [-0.0583 -0.0584 1.3172 1.3411],
gradient = 0.0083, step = LM
Iteration:93 x = [-0.0583 -0.0583 1.3171 1.3410],
gradient = 0.0080, step = LM
Iteration:94 x = [-0.0583 -0.0583 1.3171 1.3409],
gradient = 0.0077, step = LM
Iteration:95 x = [-0.0584 -0.0583 1.3170 1.3409],
gradient = 0.0075, step = LM
Iteration:96 x = [-0.0583 -0.0584 1.3169 1.3408],
gradient = 0.0110, step = LM
Iteration:97 x = [-0.0584 -0.0582 1.3169 1.3407],
gradient = 0.0165, step = LM
Iteration:98 x = [-0.0583 -0.0584 1.3168 1.3407],
gradient = 0.0109, step = LM
Iteration:99 x = [-0.0583 -0.0583 1.3168 1.3406],
gradient = 0.0062, step = LM
Iteration:100 x = [-0.0583 -0.0583 1.3167 1.3406],
gradient = 0.0059, step = LM
Iteration:101 x = [-0.0583 -0.0583 1.3167 1.3406],
gradient = 0.0057, step = LM
Iteration:102 x = [-0.0583 -0.0583 1.3167 1.3405],
gradient = 0.0055, step = LM
Iteration:103 x = [-0.0583 -0.0583 1.3166 1.3404],
gradient = 0.0075, step = LM
Iteration:104 x = [-0.0583 -0.0583 1.3166 1.3404],
gradient = 0.0075, step = LM
Iteration:105 x = [-0.0583 -0.0583 1.3165 1.3404],
gradient = 0.0090, step = LM
Iteration:106 x = [-0.0583 -0.0583 1.3165 1.3403],
gradient = 0.0099, step = LM
Iteration:107 x = [-0.0583 -0.0583 1.3164 1.3403],
gradient = 0.0064, step = LM
Iteration:108 x = [-0.0583 -0.0583 1.3164 1.3403],
gradient = 0.0064, step = QN
Iteration:109 x = [-0.0583 -0.0583 1.3164 1.3403],
gradient = 0.0043, step = LM
Iteration:110 x = [-0.0583 -0.0583 1.3164 1.3402],
gradient = 0.0041, step = LM
Iteration:111 x = [-0.0583 -0.0583 1.3164 1.3402],
gradient = 0.0040, step = LM
Iteration:112 x = [-0.0583 -0.0583 1.3163 1.3402],
gradient = 0.0038, step = LM
Iteration:113 x = [-0.0583 -0.0583 1.3163 1.3401],
gradient = 0.0050, step = LM
Iteration:114 x = [-0.0583 -0.0583 1.3163 1.3401],
gradient = 0.0050, step = LM
Iteration:115 x = [-0.0583 -0.0583 1.3162 1.3401],
gradient = 0.0045, step = LM
Iteration:116 x = [-0.0583 -0.0583 1.3162 1.3401],
gradient = 0.0044, step = LM
Iteration:117 x = [-0.0583 -0.0583 1.3162 1.3400],
gradient = 0.0040, step = LM
Iteration:118 x = [-0.0583 -0.0583 1.3162 1.3400],
gradient = 0.0040, step = QN
Iteration:119 x = [-0.0583 -0.0583 1.3162 1.3400],
gradient = 0.0037, step = LM
Iteration:120 x = [-0.0583 -0.0583 1.3161 1.3400],
gradient = 0.0032, step = LM
Iteration:121 x = [-0.0583 -0.0583 1.3161 1.3400],
gradient = 0.0030, step = LM
Iteration:122 x = [-0.0583 -0.0583 1.3161 1.3400],
gradient = 0.0027, step = LM
Iteration:123 x = [-0.0583 -0.0583 1.3161 1.3399],
gradient = 0.0026, step = LM
Iteration:124 x = [-0.0583 -0.0583 1.3161 1.3399],
gradient = 0.0025, step = LM
Iteration:125 x = [-0.0583 -0.0583 1.3160 1.3399],
gradient = 0.0024, step = LM
Iteration:126 x = [-0.0583 -0.0583 1.3160 1.3399],
gradient = 0.0023, step = LM
Iteration:127 x = [-0.0583 -0.0583 1.3160 1.3399],
gradient = 0.0022, step = LM
Iteration:128 x = [-0.0583 -0.0583 1.3160 1.3398],
gradient = 0.0021, step = LM
Iteration:129 x = [-0.0583 -0.0583 1.3160 1.3398],
gradient = 0.0020, step = LM
Iteration:130 x = [-0.0583 -0.0583 1.3160 1.3398],
gradient = 0.0020, step = LM
Iteration:131 x = [-0.0583 -0.0583 1.3159 1.3398],
gradient = 0.0022, step = LM
Iteration:132 x = [-0.0583 -0.0583 1.3159 1.3398],
gradient = 0.0024, step = LM
Iteration:133 x = [-0.0583 -0.0583 1.3159 1.3398],
gradient = 0.0026, step = LM
Iteration:134 x = [-0.0583 -0.0583 1.3159 1.3397],
gradient = 0.0025, step = LM
Iteration:135 x = [-0.0583 -0.0583 1.3159 1.3397],
gradient = 0.0023, step = LM
Iteration:136 x = [-0.0583 -0.0583 1.3159 1.3397],
gradient = 0.0018, step = LM
Iteration:137 x = [-0.0583 -0.0583 1.3159 1.3397],
gradient = 0.0015, step = LM
Iteration:138 x = [-0.0583 -0.0583 1.3158 1.3397],
gradient = 0.0014, step = LM
Iteration:139 x = [-0.0583 -0.0583 1.3158 1.3397],
gradient = 0.0014, step = LM
Final estimated parameters: [-0.0583 -0.0583 1.3158 1.3397]
The algorithm runs for 139 iterations and produces parameter estimation result x* = [-0.06, -0.06, 1.32, 1.34] with Η_gβ_Η β 0.0014 > _Ξ΅_β. In other words, the algorithm has not successfully converged, but the iteration has been terminated because the stopping criterion Η_hβ_Η β€ _Ξ΅_β (Η_xβ_Η + _Ξ΅_β) has been met. It can also be seen that during the curve fitting of the given data, the algorithm repeatedly changed the step method to QN and vice versa to LM for the reasons described in Scenario 2. From the graph above, obviously, the solution x* found is not good enough.
By looking at the value of x during iteration, it’s observed that _x_β β _x_β and _x_β β _x_β. Hence, we can conclude that the selection of the initial value _x_β = [0, 0, 0, 0] is a bad choice because the value is symmetrical and the algorithm is not able to break the symmetry. You can try and see the same thing happens in choosing other initial symmetrical values such as _x_β = [-1, -1, -1, -1], _x_β = [1, 1, 1, 1], or _x_β = [2, 2, 2, 2]. Orrr, this is just a coincidence. What do you think?
What else do you find interesting from the experiment?
Conclusion
We’ve developed an efficient algorithm to solve a non-linear least squares problem. This algorithm combines Levenberg-Marquardt and Quasi-Newton methods, utilizing trust-region for the step choice. The algorithm can accurately find the best parameters of the model in superlinear convergence rate. However, poor parameter initialization may lead to bad results. In our study case, this algorithm finds the following two models that fit the data:


Of course, these two models are equivalent by rearranging the components of parameter x.

π₯ Hi there! If you enjoy this story and want to support me as a writer, consider becoming a member. For only $5 a month, you’ll get unlimited access to all stories on Medium. If you sign up using my link, I’ll earn a small commission.
π Want to know more about how classical Machine Learning models work and how they optimize their parameters? Or an example of MLOps megaprojects? What about cherry-picked top-notch articles of all time? Continue reading:

[1] K. Madsen (1988): A Combined Gauss-Newton and Quasi-Newton Method for Non-Linear Least Squares. Institute for Numerical Analysis (now part of IMM), DTU. Report NI-88β10.