Fitting
This module contains the functions about fitting (chi2, likelihood, mcmc, exploring parameters).
- fitting.basin_hoppin_values(init_guess, std_guess, bounds_guess)
Create several initial guesses.
Create as many as initial guess as there are basin hopping iterations to do. The generation of these values are done with a normal distribution.
The guesses are withdrawn from a Normal distribution of location factor init_guess and scale factor std_guess.
Parameters
- init_guesslist-like
List of free parameters to fit. The shape must be (N,) or equivalent, N being the number of parameters.
- std_guesslist-like
List of scale factors to generate guesses from Normal distributions. The shape must be (N,) or equivalent, N being the number of parameters.
- bounds_guesslist of tuple
Minimum and maximum values the guesses can be. The shape must be (N,2) or equivalent, N being the number of parameters
Returns
- new_init_guessarray-like
New set of guess. Array of shape (N,), with N the number of parameters.
- fitting.calculate_chi2(params, data, func_model, *args, **kwargs)
DEPRECATED Calculate a Chi squared. It can be used by an optimizer. The Chi squared is calculated from the model function
func_model
or from a pre-calculated model (see Keywords).Parameters
- paramsarray
Guess of the parameters.
- datand-array
Data to fit.
- func_modelcallable function
Model used to fit the data (e.g. model of the histogram).
- *argslist-like
Extra-arguments which are in this order: the uncertainties (same shape as
data
), x-axis, arguments offunc_model
.- **kwargskeywords
Accepted keywords are:
use_this_model
to use a predefined model of the data; keywords to pass tofunc_model
.
Returns
- chi2float
chi squared.
- fitting.check_init_guess(guess, bound)
Check the initial guess in config file are between the bounds for a parameter to fit.
Parameters
- guesslist-like or array
values of the initial guess.
- boundlist-like or array
(…,2) list of boundaries of shape (…, (lower bound, upper bound)).
Returns
- checkbool
True
if the initial guess is not between the bounds.
- fitting.chi2_pearson(params, data, func_model, *args, **kwargs)
Pearson’s chi-squared test. This estimator is relevant for fitting histograms assuming the number of elements in the bins follows a multinomial distribution.
The number of degrees of freedom is defined as: \(N_{bins} - (N_{params} + 1)\)
More info: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
Parameters
- paramsarray
Guess of the parameters.
- datand-array
Data to fit.
- func_modelcallable function
Model used to fit the data (e.g. model of the histogram).
- *argslist-like
Extra-arguments which are in this order: the uncertainties (same shape as
data
), x-axis, arguments offunc_model
.- **kwargskeywords
Accepted keywords are:
use_this_model
to use a predefined model of the data; keywords to pass tofunc_model
.
Returns
- chi2float
Pearson’s chi-squared.
- fitting.explore_parameter_space(cost_fun, histo_data, param_bounds, param_sz, xbins, parameters_labels, wl_scale0, instrument_model, instrument_args, rvu_forfit, cdfs, rvus, histo_err=None, **kwargs)
Explore the parameter space with a chosen optimizer (chi2, likelihood…)
Parameters
- cost_funfunction
Cost function to use for model fitting.
- histo_datand-array
Histograms of the data.
- param_boundsnested tuple-like
Nested tuple of the parameter bounds on the form ((min1, max1), (min2, max2)…).
- param_szlist
Number of points to sample each parameter axis.
- xbinsnd-array
Bin axes of the histogram.
- wl_scale01d-array
Wavelength scale.
- instrument_modelfunction
Function simulating the instrument and noises.
- instrument_argstuple
Arguments for the
instrument_model
function.- rvu_forfitlist of two arrays
List of uniform random values use to generate normally distributed values with the fitting parameters
mu
andsig
.- cdfslist of arrays
List of the CDF which are used to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.
- rvuslist of arrays
List of uniform random values use to generate random values to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.
- histo_errTYPE, optional
DESCRIPTION. The default is None.
- **kwargskeywords
Keywords to pass to the
create_histogram_model
function.
Returns
- chi2mapnd-array
Datacube containing the value of the cost function and the tested parameters.
- param_axesTYPE
DESCRIPTION.
- stepsarray
Steps use to sample the parameters axes.
- fitting.log_chi2(params, data, func_model, *args, **kwargs)
Log likelihood of a Normally distributed data. For a model fitting context, this function is maximised with the optimal parameters. It does not directly reflect the reduced \(\chi^2\), one first needs to multiply by two then divide by the number of degrees of freedom and take the opposite sign.
Parameters
- paramsarray
Guess of the parameters.
- datand-array
Data to fit.
- func_modelcallable function
Model used to fit the data (e.g. model of the histogram).
- *argslist-like
Extra-arguments which are in this order: the uncertainties (same shape as
data
), x-axis, arguments offunc_model
.- **kwargskeywords
Accepted keywords are:
use_this_model
to use a predefined model of the data; keywords to pass tofunc_model
.
Returns
- chi2float
half chi squared.
- fitting.log_multinomial(params, data, func_model, *args, **kwargs)
Likelihood of a dataset following multinomial distribution (e.g. number of occurences in the bins of a histogram).
Parameters
- paramsarray
parameters to fit.
- dataarray
data to fit.
- func_modelfunction
Model of the data.
- *argsfunction arguments
extra-arguments to pass to this function and
func_model
. The first argument must be values of the x-axis of the dataset. Iffunc_model
takes any keyword, they must be passed in a dictionary in the last position in *args.- **kwargskeywords arguments
- Keywords accepted:
use_this_model
(array) : uses the values from a model generated out of this function instead of callingfunc_model
Keywords to pass to
func_model
Returns
- float
log of the likelihood. The negative is picked for minimize algorithm to work.
- fitting.log_posterior(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={})
Posterior of the data.
Parameters
- paramslist-like
List of parameters.
- lklh_funcfunction
Likelihood function to use.
- boundsarray-like
Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).
- func_modelfunction
Function of the model which reproduces the data to fig (e.g. histogram).
- dataarray of size (N,) or (nb wl, N)
Data to fit.
- func_argslist-like, optional
Arguments to pass to
func_model
. The default is ().- func_kwargsdic-like, optional
Keywords to pass to
func_model
. The default is {}.- neg_lklhbool, optional
Change the sign of the value of the likelihood. If
True
, it means the returned likelihood bylklh_func
is negative thus it signs must be reverted. The default is True.
Returns
- log_posteriorfloat
value of the posterior.
- fitting.log_prior_uniform(params, bounds)
Uniform prior on a set of parameters to fit
Parameters
- paramsarray of size (N,)
Parameters to fit.
- boundsarray-like
Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).
Returns
- float
value of the prior.
- fitting.lstsqrs_fit(func_model, p0, xdata, ydata, yerr=None, bounds=None, diff_step=None, x_scale=1, func_args=(), func_kwargs={})
Fit the data with the least squares algorithm and taking into account the boundaries.
The function uses the scipy.optimize.least_squares function (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares).
The cost function uses the “huber” transformation hence the cost value is not a chi squared. The boundaries are handled according to the Trust-Reflective-Region.
Parameters
- func_modelfunction
Function to fit the data.
- p0tuple-like
Initial guess.
- xdata1d-array
Flatten array of the x-axis of the data.
- ydata1d-array
Flatten array of the data.
- yerr1d-array, optional
Flatten array of the data error. The default is None.
- boundstuple-like, optional
Tuple-like of shape ((min1, max1), …, (minN, maxN)). The default is None.
- diff_steplist, optional
Determines the relative step size for the finite difference approximation of the Jacobian. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is None.
- x_scaleTYPE, optional
Characteristic scale of each variable. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is 1.
- func_argstuple, optional
Tuple of arguments to pass to the cost function. The default is ().
- func_kwargsdict, optional
Dictionary of arguments to pass to the cost function. The default is {}.
Returns
- poptlist
List of optimised parameters.
- pcov2d-array
Covariance matrix of the optimised parameters.
- resOptimizeResult
Full output of the fitting algorithm.
- fitting.mcmc(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={}, neg_lklh=True, nwalkers=6, nstep=2000, progress_bar=True)
Perform a MCMC with emcee library (Foreman-Mackey et al. (2013)).
Parameters
- paramslist-like
Initial guess.
- lklh_funccallable function
Function returning the likelihood.
- boundstuple-like
Bounds of the parameters. The shape is ((min1, max1), …, (minN, maxN)) .
- func_modelcallable function
Function of the model to fit.
- datand-array
Data to fit.
- func_argstuple, optional
Tuple of arguments to pass to
func_model
. The default is ().- func_kwargsdict, optional
Dictionary of keywords to pass to
func_model
. The default is {}.- neg_lklhbool, optional
Indicates if the likelihood function returns a negative value. The default is True.
- nwalkersint, optional
Number of walkers to use in the MCMC. The default is 6.
- nstepint, optional
Number of steps for the walkers. The default is 2000.
- progress_barbool, optional
Display the progress bar. The default is True.
Returns
- samplesnd-array
Samples from the MCMC algorithm. The shape is (
nwalkers
,nstep
)- flat_samples1d-array
Flatten chains with already discarded burn-in. The burn-in values is defined as
min(nstep//10, 600)
.
- fitting.minimize_fit(cost_func, func_model, p0, xdata, ydata, yerr=None, bounds=None, hessian_method='backward', func_args=(), func_kwargs={})
Wrapper using the
scipy.optimize.minimize
with the Powell algorithm.Parameters
- cost_funcfunction
cost function.
- func_modelfunction
model of the instrument.
- p0array-like
Initial guess on the parameters to fit.
- xdataarray-like
abscissa of the data to fit.
- ydataarray-like
Dataset to fit (could be a histogram).
- yerrarray-like, optional
uncertainties on the data. The default is None.
- boundsarray-like, optional
Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…). The default is None.
- hessian_method: string, optional
Can accept ‘central’, ‘forward’, ‘backward’. Sometines numdifftools returns an Hessian matrix with NaN. The reason is unknown. Changing the method can solve it. The default is ‘backward’. More info on https://numdifftools.readthedocs.io/en/v0.9.41/reference/generated/numdifftools.core.Hessian.html
- func_argslist-like, optional
Arguments to pass to
func_model
. The default is ().- func_kwargsdic-like, optional
Keywords to pass to
func_model
. The default is {}.
Returns
- poptarray
Best fitted values.
- pcov2D-array
Covariance matrix.
- resdic
Complete return of the
scipy.optimize.minimize
function.
- fitting.ramanujan(n)
Ramanujan approximation to calculate the factorial of an integer. Work very well for any integer >= 2. https://en.wikipedia.org/wiki/Stirling%27s_approximation
Parameters
- nint or array
Value to calculate its factorial.
Returns
- ramafloat or array
Factorial of
n
.
- fitting.rescaling(func, rescale_factor)
Rescale a function by a constant. It performs tempering in MCMC, i.e. to smoothen/sharpen the log-likelihood function. Indeed, if the log-likelihood decrease by 1 unit, it means the event is 2.7x less likely to happen. Some log-likelihood functions needs to be tempered before being explored by MCMC algorithm.
Note: the shape of the posterior is scaled by the square root of the tempering factor \(1 / \sqrt{tempering~factor}\).
Parameters
- funccallable
Function to rescale.
- tempering_factorfloat
Scale factor.
Returns
- callable
Rescaled function.
>>> tempering(log_chi2, -2 / ddof) # Returns a reduced chi2 cost function
- fitting.return_neg_func(func)
Return a callable which is the negative of a function: f(x) -> -f(x).
It can be used to create a callable cost function one wants to minimize (e.g. \(\chi^2\) estimator).
Parameters
- funccallable
function to return the negative version.
Returns
- callable
negative version of the function.
- fitting.wrap_residuals(func, ydata, transform)
Calculate the residuals between data points and the model.
Parameters
- funccallable
function of the model.
- ydataarray-like
Data to calculate the residuals with
func
.- transformNone-type or array-like
Transform the residuals (e.g. weight by the uncertainties on
ydata
).
Returns
- array-like
Residuals.