Method

Suppose that you have N data points, yk, for k = 1,2,...,N, and the function to be fitted is f(x,p), where p represents the M parameters <p1,p2,...,pM>. Define the likelihood of the parameters, given the data, as the probability of the data, given the parameters. We fit for the parameters, p, by finding those values, pmin that maximize this likelihood. This form of parameter estimation is known as maximum likelihood estimation.

Good references on this topic include:

Consider the likelihood function (p)≡P(x1,p)*P(x2,p)*...*P(xN,p), where P is the probability density, which depends on the random variable x and the parameters p.   is a measure for the probability to observe just the particular sample we have, and is called an a-posteriori probability since it is computed after the sampling is done. The best estimates for p are the values which maximize . But maximizing the logarithm of   also maximizes , and maximizing ln() is equivalent to minimizing -ln(). So, the goal becomes minimizing the log likelihood function:

Let p0 be the initial values given for p. The goal is to find a ∇p so that p1 = p0+∇p is a better approximation to the data. We use the iterative Gauss-Newton method, and the series p1,p2,p3,... will hopefully converge to the minimum, pmin.

Generally, the Gauss-Newton method is locally convergent when χ2 is zero at the minimum. Serious difficulties arise when f is sufficiently nonlinear and χ2 is large at the minimum. The Gauss-Newton method has the advantage that linear least squares problems are solved in one iteration.

Consider the Taylor expansion of L(p):

Define the arrays ,   and :

If we linearize, i.e., assume that


then

and so

The problem has reduced to solving the matrix equation

Note: The partial derivatives are approximated numerically using a central difference approximation:

  Graphical user interface
  Tolerance