Fitting

This module contains the functions about fitting (chi2, likelihood, mcmc, exploring parameters).

fitting.basin_hoppin_values(init_guess, std_guess, bounds_guess)

Create several initial guesses.

Create as many as initial guess as there are basin hopping iterations to do. The generation of these values are done with a normal distribution.

The guesses are withdrawn from a Normal distribution of location factor init_guess and scale factor std_guess.

Parameters

init_guesslist-like: List of free parameters to fit. The shape must be (N,) or equivalent, N being the number of parameters.
std_guesslist-like: List of scale factors to generate guesses from Normal distributions. The shape must be (N,) or equivalent, N being the number of parameters.
bounds_guesslist of tuple: Minimum and maximum values the guesses can be. The shape must be (N,2) or equivalent, N being the number of parameters

Returns

new_init_guessarray-like: New set of guess. Array of shape (N,), with N the number of parameters.

fitting.calculate_chi2(params, data, func_model, *args, **kwargs)

DEPRECATED Calculate a Chi squared. It can be used by an optimizer. The Chi squared is calculated from the model function func_model or from a pre-calculated model (see Keywords).

Parameters

paramsarray: Guess of the parameters.
datand-array: Data to fit.
func_modelcallable function: Model used to fit the data (e.g. model of the histogram).
*argslist-like: Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.
**kwargskeywords: Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float: chi squared.

fitting.check_init_guess(guess, bound)

Check the initial guess in config file are between the bounds for a parameter to fit.

Parameters

guesslist-like or array: values of the initial guess.
boundlist-like or array: (…,2) list of boundaries of shape (…, (lower bound, upper bound)).

Returns

checkbool: True if the initial guess is not between the bounds.

fitting.chi2_pearson(params, data, func_model, *args, **kwargs)

Pearson’s chi-squared test. This estimator is relevant for fitting histograms assuming the number of elements in the bins follows a multinomial distribution.

The number of degrees of freedom is defined as: \(N_{bins} - (N_{params} + 1)\)

More info: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

Parameters

paramsarray: Guess of the parameters.
datand-array: Data to fit.
func_modelcallable function: Model used to fit the data (e.g. model of the histogram).
*argslist-like: Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.
**kwargskeywords: Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float: Pearson’s chi-squared.

fitting.explore_parameter_space(cost_fun, histo_data, param_bounds, param_sz, xbins, parameters_labels, wl_scale0, instrument_model, instrument_args, rvu_forfit, cdfs, rvus, histo_err=None, **kwargs)

Explore the parameter space with a chosen optimizer (chi2, likelihood…)

Parameters

cost_funfunction: Cost function to use for model fitting.
histo_datand-array: Histograms of the data.
param_boundsnested tuple-like: Nested tuple of the parameter bounds on the form ((min1, max1), (min2, max2)…).
param_szlist: Number of points to sample each parameter axis.
xbinsnd-array: Bin axes of the histogram.
wl_scale01d-array: Wavelength scale.
instrument_modelfunction: Function simulating the instrument and noises.
instrument_argstuple: Arguments for the instrument_model function.
rvu_forfitlist of two arrays: List of uniform random values use to generate normally distributed values with the fitting parameters mu and sig.
cdfslist of arrays: List of the CDF which are used to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.
rvuslist of arrays: List of uniform random values use to generate random values to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.
histo_errTYPE, optional: DESCRIPTION. The default is None.
**kwargskeywords: Keywords to pass to the create_histogram_model function.

Returns

chi2mapnd-array: Datacube containing the value of the cost function and the tested parameters.
param_axesTYPE: DESCRIPTION.
stepsarray: Steps use to sample the parameters axes.

fitting.log_chi2(params, data, func_model, *args, **kwargs)

Log likelihood of a Normally distributed data. For a model fitting context, this function is maximised with the optimal parameters. It does not directly reflect the reduced \(\chi^2\), one first needs to multiply by two then divide by the number of degrees of freedom and take the opposite sign.

Parameters

paramsarray: Guess of the parameters.
datand-array: Data to fit.
func_modelcallable function: Model used to fit the data (e.g. model of the histogram).
*argslist-like: Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.
**kwargskeywords: Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float: half chi squared.

fitting.log_multinomial(params, data, func_model, *args, **kwargs)

Likelihood of a dataset following multinomial distribution (e.g. number of occurences in the bins of a histogram).

Parameters

paramsarray

parameters to fit.

dataarray

data to fit.

func_modelfunction

Model of the data.

*argsfunction arguments

extra-arguments to pass to this function and func_model. The first argument must be values of the x-axis of the dataset. If func_model takes any keyword, they must be passed in a dictionary in the last position in *args.

**kwargskeywords arguments

Keywords accepted:

use_this_model (array) : uses the values from a model generated out of this function instead of calling func_model
Keywords to pass to func_model

Returns

float: log of the likelihood. The negative is picked for minimize algorithm to work.

fitting.log_posterior(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={})

Posterior of the data.

Parameters

paramslist-like: List of parameters.
lklh_funcfunction: Likelihood function to use.
boundsarray-like: Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).
func_modelfunction: Function of the model which reproduces the data to fig (e.g. histogram).
dataarray of size (N,) or (nb wl, N): Data to fit.
func_argslist-like, optional: Arguments to pass to func_model. The default is ().
func_kwargsdic-like, optional: Keywords to pass to func_model. The default is {}.
neg_lklhbool, optional: Change the sign of the value of the likelihood. If True, it means the returned likelihood by lklh_func is negative thus it signs must be reverted. The default is True.

Returns

log_posteriorfloat: value of the posterior.

fitting.log_prior_uniform(params, bounds)

Uniform prior on a set of parameters to fit

Parameters

paramsarray of size (N,): Parameters to fit.
boundsarray-like: Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).

Returns

float: value of the prior.

fitting.lstsqrs_fit(func_model, p0, xdata, ydata, yerr=None, bounds=None, diff_step=None, x_scale=1, func_args=(), func_kwargs={})

Fit the data with the least squares algorithm and taking into account the boundaries.

The function uses the scipy.optimize.least_squares function (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares).

The cost function uses the “huber” transformation hence the cost value is not a chi squared. The boundaries are handled according to the Trust-Reflective-Region.

Parameters

func_modelfunction: Function to fit the data.
p0tuple-like: Initial guess.
xdata1d-array: Flatten array of the x-axis of the data.
ydata1d-array: Flatten array of the data.
yerr1d-array, optional: Flatten array of the data error. The default is None.
boundstuple-like, optional: Tuple-like of shape ((min1, max1), …, (minN, maxN)). The default is None.
diff_steplist, optional: Determines the relative step size for the finite difference approximation of the Jacobian. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is None.
x_scaleTYPE, optional: Characteristic scale of each variable. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is 1.
func_argstuple, optional: Tuple of arguments to pass to the cost function. The default is ().
func_kwargsdict, optional: Dictionary of arguments to pass to the cost function. The default is {}.

Returns

poptlist: List of optimised parameters.
pcov2d-array: Covariance matrix of the optimised parameters.
resOptimizeResult: Full output of the fitting algorithm.

fitting.mcmc(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={}, neg_lklh=True, nwalkers=6, nstep=2000, progress_bar=True)

Perform a MCMC with emcee library (Foreman-Mackey et al. (2013)).

Parameters

paramslist-like: Initial guess.
lklh_funccallable function: Function returning the likelihood.
boundstuple-like: Bounds of the parameters. The shape is ((min1, max1), …, (minN, maxN)) .
func_modelcallable function: Function of the model to fit.
datand-array: Data to fit.
func_argstuple, optional: Tuple of arguments to pass to func_model. The default is ().
func_kwargsdict, optional: Dictionary of keywords to pass to func_model. The default is {}.
neg_lklhbool, optional: Indicates if the likelihood function returns a negative value. The default is True.
nwalkersint, optional: Number of walkers to use in the MCMC. The default is 6.
nstepint, optional: Number of steps for the walkers. The default is 2000.
progress_barbool, optional: Display the progress bar. The default is True.

Returns

samplesnd-array: Samples from the MCMC algorithm. The shape is (nwalkers, nstep)
flat_samples1d-array: Flatten chains with already discarded burn-in. The burn-in values is defined as min(nstep//10, 600).

fitting.minimize_fit(cost_func, func_model, p0, xdata, ydata, yerr=None, bounds=None, hessian_method='backward', func_args=(), func_kwargs={})

Wrapper using the scipy.optimize.minimize with the Powell algorithm.

Parameters

cost_funcfunction: cost function.
func_modelfunction: model of the instrument.
p0array-like: Initial guess on the parameters to fit.
xdataarray-like: abscissa of the data to fit.
ydataarray-like: Dataset to fit (could be a histogram).
yerrarray-like, optional: uncertainties on the data. The default is None.
boundsarray-like, optional: Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…). The default is None.
hessian_method: string, optional: Can accept ‘central’, ‘forward’, ‘backward’. Sometines numdifftools returns an Hessian matrix with NaN. The reason is unknown. Changing the method can solve it. The default is ‘backward’. More info on https://numdifftools.readthedocs.io/en/v0.9.41/reference/generated/numdifftools.core.Hessian.html
func_argslist-like, optional: Arguments to pass to func_model. The default is ().
func_kwargsdic-like, optional: Keywords to pass to func_model. The default is {}.

Returns

poptarray: Best fitted values.
pcov2D-array: Covariance matrix.
resdic: Complete return of the scipy.optimize.minimize function.

fitting.ramanujan(n)

Ramanujan approximation to calculate the factorial of an integer. Work very well for any integer >= 2. https://en.wikipedia.org/wiki/Stirling%27s_approximation

Parameters

nint or array: Value to calculate its factorial.

Returns

ramafloat or array: Factorial of n.

fitting.rescaling(func, rescale_factor)

Rescale a function by a constant. It performs tempering in MCMC, i.e. to smoothen/sharpen the log-likelihood function. Indeed, if the log-likelihood decrease by 1 unit, it means the event is 2.7x less likely to happen. Some log-likelihood functions needs to be tempered before being explored by MCMC algorithm.

Note: the shape of the posterior is scaled by the square root of the tempering factor \(1 / \sqrt{tempering~factor}\).

Parameters

funccallable: Function to rescale.
tempering_factorfloat: Scale factor.

Returns

callable: Rescaled function.

>>> tempering(log_chi2, -2 / ddof) # Returns a reduced chi2 cost function

fitting.return_neg_func(func)

Return a callable which is the negative of a function: f(x) -> -f(x).

It can be used to create a callable cost function one wants to minimize (e.g. \(\chi^2\) estimator).

Parameters

funccallable: function to return the negative version.

Returns

callable: negative version of the function.

fitting.wrap_residuals(func, ydata, transform)

Calculate the residuals between data points and the model.

Parameters

funccallable: function of the model.
ydataarray-like: Data to calculate the residuals with func.
transformNone-type or array-like: Transform the residuals (e.g. weight by the uncertainties on ydata).

Returns

array-like: Residuals.