;Generalized linear curve fitter. Fit a data set with a linear ;combination of (possibly nonlinear) basis functions. ; ;Theory ; Suppose you have a set of measurements y_actual taken at ; values of some independent variable x where x is an array. ; You want to model the data with a curve of some definite ; mathematical form, but with several undetermined parameters ; which can be tuned to fit the data. The simplest example is ; a line. ; ; y_model=a0+a1*x ; ; This has a definite shape (a straight line) but the y-intercept ; a0 and slope a1 can be chosen such that the line "fits" the ; data the "best". ; ; Note: If you actually want to fit a line, go use IDL's LINFIT. ; ; LINFIT2 can solve this problem, but it's meant for more complicated ; ones, such as the following ; ; y_model=a0*sin(x)+a1*cos(x) ; ; or ; ; y_model=a0*sin(x)+a1*cos(x)+a2*sin(2*x)+a3*cos(2*x) ; ; or in general ; ; y_model=a0*f0(x)+a1*f1(x)+a2*f2(x)...+a_i*f_i(x) ; ; where n is the number of basis functions, i runs from 0 to n-1, ; f_i(x) are the basis functions and a_i are the parameters ; ; Note that the basis functions can be nonlinear in x, but they ; must be combined in a linear manner. The following curve cannot ; be used directly ; ; y_disallowed_model=a0*sin(a1*x) ; ; because one of the parameters is not a coefficient of the basis ; function. If your model is such, go use MPFIT. ; ; The definition of "best" is such that some combination of all the ; residuals ; ; residual=y_actual-y_model ; ; is the smallest. One specific mathematical definition of "best" ; which makes the calculus the easiest is "least squares", meaning ; the set of parameters which makes the sum of the squares of the ; residuals ; ; total_variance=total(residual^2) ; ; the smallest. In some problems, the curve needs to fit some points ; closer than others. This is most common when the measurements have ; a known uncertainty on each point and not all uncertainties are equal. ; In this case, we use a related measure of goodness of fit chi-square ; ; chi^2=total(weight*residual^2) ; ; Most commonly, the weights are calculated from the 1-standard-deviation ; uncertainty sigma: ; ; weight=1/(sigma^2) ; ; in which case the chi square statistic is ; ; chi^2=total((residual/sigma)^2) ; ; If all weights are one, this reduces to the total_variance form above. ; If all weights are equal but not one, the chi-square parameter will be ; scaled, but still have a minimum for the same set of parameters. ; ; The fitter also calculates the formal 1-standard-deviation uncertainties ; of the parameters in the form of a covariance matrix, as follows: ; ; covar[i,j]=unc[i]*unc[j]*correlation[i,j] ; ; Since any variable is perfectly correlated with itself, the on-diagonal ; terms of this matrix are unc[i]^2. ; ; For further mathematical background, particularly HOW it works, see ; Numerical Recipes in C, section 15.4 ; ; The basis functions are expressed in the form of a single IDL function ; which given independent variable x (will always be an array, could ; be any shape and size) and order basis_index (will always be a scalar, ; running from 0 to n_basis-1) calculates the value of the i-th basis ; function for all values of the independent variable. ; ; function f_i,x,basis_index=basis_index[,extra named parameters] ; y= ; return,y ; end ; ; You may name this basis function calculator anything you want, and may ; put arbitrary IDL code in it as long as it returns an answer in the ; proper form. ; ;input ; x_ - Array of independent variable values, of any shape and size ; y_ - Array of dependent variable actual values, same size as x_ ; basis - Name of an IDL function which is used to calculate the ; values of the basis functions ; weight= - (Optional) Array of weight to apply to each point, should be ; same size as x_. Defaults to all 11/(sigma^2) ; sigma= - (optional) Array of 1-standard-deviation uncertainty to apply ; to each point. ; n_basis - Number of basis functions to use ; /nan - if set, filter out points where either x_ or y_ are non-finite ; _extra - extra parameters are passed to the basis function ;output ; sigma= - if sigma is undefined upon entry, it will be set to ; the normalized 1-standard-deviation uncertainty calculated ; from the weighting or all ones if no weighting was applied. ; covar= - On exit, is set to the covariance matrix. ; y_model=- On exit, is set to the values of the model at x ; chisq= - On exit, set to the (scalar) chi-square statistic for this model and ; set of parameters ;return ; An array of parameters a_i which complete the fit described above function linfit2,x_,y_,basis,n_basis,weight=weight,sigma=sigma_,covar=covar,chisq=chisq,y_model=y_model,nan=nan,_extra=extra if n_elements(sigma_) eq 0 then begin if n_elements(weight) eq 0 then begin sigma_=x_*0d +1d end else begin sigma_=1.0d/sqrt(weight) end end if keyword_set(nan) then begin w=where(logical_and(finite(x_),finite(y_))) x=x_[w] y=y_[w] sigma=sigma_[w] end else begin x=x_[*] y=y_[*] sigma=sigma_[*] end ;Calculate the design matrix A=make_array(type=size(x,/type),n_elements(x),n_basis) for i=0,n_basis-1 do begin A[*,i]=call_function(basis,x,basis_index=i,_extra=extra)/sigma end AT=transpose(A) ;Calculate the scaled measurement vector b=y/sigma ;Solve the normal equations where aa is the desired parameter vector, ;b is the scaled measurement vector, and A is the design matrix ; ; (AT*A)*aa=AT*b ; alpha*aa=beta ; aa=(alpha^(-1))*beta ; ;All this setup, this whole file, supports three lines of linear algebra alpha=(AT#A) beta=(AT#B) ;Covariance matrix is one of the intermediate values used in solving ;normal equations, so no wasted time calculating it. covar=invert(alpha) aa=covar#beta ;Calculate the model value and chi-square parameter y_model=aa[0]*call_function(basis,x,basis_index=i,_extra=extra) for i=1,n_basis-1 do y_model+=aa[i]*call_function(basis,x,basis_index=i,_extra=extra) chisq=total(((y-y_model)/sigma)^2) ;Return the answer return,aa end