Fitting

This module contains the functions about fitting (chi2, likelihood, mcmc, exploring parameters).

fitting.basin_hoppin_values(init_guess, std_guess, bounds_guess)

Create several initial guesses.

Create as many as initial guess as there are basin hopping iterations to do. The generation of these values are done with a normal distribution.

The guesses are withdrawn from a Normal distribution of location factor init_guess and scale factor std_guess.

Parameters

init_guesslist-like

List of free parameters to fit. The shape must be (N,) or equivalent, N being the number of parameters.

std_guesslist-like

List of scale factors to generate guesses from Normal distributions. The shape must be (N,) or equivalent, N being the number of parameters.

bounds_guesslist of tuple

Minimum and maximum values the guesses can be. The shape must be (N,2) or equivalent, N being the number of parameters

Returns

new_init_guessarray-like

New set of guess. Array of shape (N,), with N the number of parameters.

fitting.calculate_chi2(params, data, func_model, *args, **kwargs)

DEPRECATED Calculate a Chi squared. It can be used by an optimizer. The Chi squared is calculated from the model function func_model or from a pre-calculated model (see Keywords).

Parameters

paramsarray

Guess of the parameters.

datand-array

Data to fit.

func_modelcallable function

Model used to fit the data (e.g. model of the histogram).

*argslist-like

Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.

**kwargskeywords

Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float

chi squared.

fitting.check_init_guess(guess, bound)

Check the initial guess in config file are between the bounds for a parameter to fit.

Parameters

guesslist-like or array

values of the initial guess.

boundlist-like or array

(…,2) list of boundaries of shape (…, (lower bound, upper bound)).

Returns

checkbool

True if the initial guess is not between the bounds.

fitting.chi2_pearson(params, data, func_model, *args, **kwargs)

Pearson’s chi-squared test. This estimator is relevant for fitting histograms assuming the number of elements in the bins follows a multinomial distribution.

The number of degrees of freedom is defined as: \(N_{bins} - (N_{params} + 1)\)

More info: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

Parameters

paramsarray

Guess of the parameters.

datand-array

Data to fit.

func_modelcallable function

Model used to fit the data (e.g. model of the histogram).

*argslist-like

Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.

**kwargskeywords

Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float

Pearson’s chi-squared.

fitting.explore_parameter_space(cost_fun, histo_data, param_bounds, param_sz, xbins, parameters_labels, wl_scale0, instrument_model, instrument_args, rvu_forfit, cdfs, rvus, histo_err=None, **kwargs)

Explore the parameter space with a chosen optimizer (chi2, likelihood…)

Parameters

cost_funfunction

Cost function to use for model fitting.

histo_datand-array

Histograms of the data.

param_boundsnested tuple-like

Nested tuple of the parameter bounds on the form ((min1, max1), (min2, max2)…).

param_szlist

Number of points to sample each parameter axis.

xbinsnd-array

Bin axes of the histogram.

wl_scale01d-array

Wavelength scale.

instrument_modelfunction

Function simulating the instrument and noises.

instrument_argstuple

Arguments for the instrument_model function.

rvu_forfitlist of two arrays

List of uniform random values use to generate normally distributed values with the fitting parameters mu and sig.

cdfslist of arrays

List of the CDF which are used to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.

rvuslist of arrays

List of uniform random values use to generate random values to reproduce the statistics of the noises. There areas many sequences are noise sources to simulate.

histo_errTYPE, optional

DESCRIPTION. The default is None.

**kwargskeywords

Keywords to pass to the create_histogram_model function.

Returns

chi2mapnd-array

Datacube containing the value of the cost function and the tested parameters.

param_axesTYPE

DESCRIPTION.

stepsarray

Steps use to sample the parameters axes.

fitting.log_chi2(params, data, func_model, *args, **kwargs)

Log likelihood of a Normally distributed data. For a model fitting context, this function is maximised with the optimal parameters. It does not directly reflect the reduced \(\chi^2\), one first needs to multiply by two then divide by the number of degrees of freedom and take the opposite sign.

Parameters

paramsarray

Guess of the parameters.

datand-array

Data to fit.

func_modelcallable function

Model used to fit the data (e.g. model of the histogram).

*argslist-like

Extra-arguments which are in this order: the uncertainties (same shape as data), x-axis, arguments of func_model.

**kwargskeywords

Accepted keywords are: use_this_model to use a predefined model of the data; keywords to pass to func_model.

Returns

chi2float

half chi squared.

fitting.log_multinomial(params, data, func_model, *args, **kwargs)

Likelihood of a dataset following multinomial distribution (e.g. number of occurences in the bins of a histogram).

Parameters

paramsarray

parameters to fit.

dataarray

data to fit.

func_modelfunction

Model of the data.

*argsfunction arguments

extra-arguments to pass to this function and func_model. The first argument must be values of the x-axis of the dataset. If func_model takes any keyword, they must be passed in a dictionary in the last position in *args.

**kwargskeywords arguments
Keywords accepted:
  • use_this_model (array) : uses the values from a model generated out of this function instead of calling func_model

  • Keywords to pass to func_model

Returns

float

log of the likelihood. The negative is picked for minimize algorithm to work.

fitting.log_posterior(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={})

Posterior of the data.

Parameters

paramslist-like

List of parameters.

lklh_funcfunction

Likelihood function to use.

boundsarray-like

Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).

func_modelfunction

Function of the model which reproduces the data to fig (e.g. histogram).

dataarray of size (N,) or (nb wl, N)

Data to fit.

func_argslist-like, optional

Arguments to pass to func_model. The default is ().

func_kwargsdic-like, optional

Keywords to pass to func_model. The default is {}.

neg_lklhbool, optional

Change the sign of the value of the likelihood. If True, it means the returned likelihood by lklh_func is negative thus it signs must be reverted. The default is True.

Returns

log_posteriorfloat

value of the posterior.

fitting.log_prior_uniform(params, bounds)

Uniform prior on a set of parameters to fit

Parameters

paramsarray of size (N,)

Parameters to fit.

boundsarray-like

Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…).

Returns

float

value of the prior.

fitting.lstsqrs_fit(func_model, p0, xdata, ydata, yerr=None, bounds=None, diff_step=None, x_scale=1, func_args=(), func_kwargs={})

Fit the data with the least squares algorithm and taking into account the boundaries.

The function uses the scipy.optimize.least_squares function (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares).

The cost function uses the “huber” transformation hence the cost value is not a chi squared. The boundaries are handled according to the Trust-Reflective-Region.

Parameters

func_modelfunction

Function to fit the data.

p0tuple-like

Initial guess.

xdata1d-array

Flatten array of the x-axis of the data.

ydata1d-array

Flatten array of the data.

yerr1d-array, optional

Flatten array of the data error. The default is None.

boundstuple-like, optional

Tuple-like of shape ((min1, max1), …, (minN, maxN)). The default is None.

diff_steplist, optional

Determines the relative step size for the finite difference approximation of the Jacobian. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is None.

x_scaleTYPE, optional

Characteristic scale of each variable. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html#scipy.optimize.least_squares. The default is 1.

func_argstuple, optional

Tuple of arguments to pass to the cost function. The default is ().

func_kwargsdict, optional

Dictionary of arguments to pass to the cost function. The default is {}.

Returns

poptlist

List of optimised parameters.

pcov2d-array

Covariance matrix of the optimised parameters.

resOptimizeResult

Full output of the fitting algorithm.

fitting.mcmc(params, lklh_func, bounds, func_model, data, func_args=(), func_kwargs={}, neg_lklh=True, nwalkers=6, nstep=2000, progress_bar=True)

Perform a MCMC with emcee library (Foreman-Mackey et al. (2013)).

Parameters

paramslist-like

Initial guess.

lklh_funccallable function

Function returning the likelihood.

boundstuple-like

Bounds of the parameters. The shape is ((min1, max1), …, (minN, maxN)) .

func_modelcallable function

Function of the model to fit.

datand-array

Data to fit.

func_argstuple, optional

Tuple of arguments to pass to func_model. The default is ().

func_kwargsdict, optional

Dictionary of keywords to pass to func_model. The default is {}.

neg_lklhbool, optional

Indicates if the likelihood function returns a negative value. The default is True.

nwalkersint, optional

Number of walkers to use in the MCMC. The default is 6.

nstepint, optional

Number of steps for the walkers. The default is 2000.

progress_barbool, optional

Display the progress bar. The default is True.

Returns

samplesnd-array

Samples from the MCMC algorithm. The shape is (nwalkers, nstep)

flat_samples1d-array

Flatten chains with already discarded burn-in. The burn-in values is defined as min(nstep//10, 600).

fitting.minimize_fit(cost_func, func_model, p0, xdata, ydata, yerr=None, bounds=None, hessian_method='backward', func_args=(), func_kwargs={})

Wrapper using the scipy.optimize.minimize with the Powell algorithm.

Parameters

cost_funcfunction

cost function.

func_modelfunction

model of the instrument.

p0array-like

Initial guess on the parameters to fit.

xdataarray-like

abscissa of the data to fit.

ydataarray-like

Dataset to fit (could be a histogram).

yerrarray-like, optional

uncertainties on the data. The default is None.

boundsarray-like, optional

Boundaries of the parameters to fit. The shape must be like ((min_param1, max_param2), (min_param2, max_param2),…). The default is None.

hessian_method: string, optional

Can accept ‘central’, ‘forward’, ‘backward’. Sometines numdifftools returns an Hessian matrix with NaN. The reason is unknown. Changing the method can solve it. The default is ‘backward’. More info on https://numdifftools.readthedocs.io/en/v0.9.41/reference/generated/numdifftools.core.Hessian.html

func_argslist-like, optional

Arguments to pass to func_model. The default is ().

func_kwargsdic-like, optional

Keywords to pass to func_model. The default is {}.

Returns

poptarray

Best fitted values.

pcov2D-array

Covariance matrix.

resdic

Complete return of the scipy.optimize.minimize function.

fitting.ramanujan(n)

Ramanujan approximation to calculate the factorial of an integer. Work very well for any integer >= 2. https://en.wikipedia.org/wiki/Stirling%27s_approximation

Parameters

nint or array

Value to calculate its factorial.

Returns

ramafloat or array

Factorial of n.

fitting.rescaling(func, rescale_factor)

Rescale a function by a constant. It performs tempering in MCMC, i.e. to smoothen/sharpen the log-likelihood function. Indeed, if the log-likelihood decrease by 1 unit, it means the event is 2.7x less likely to happen. Some log-likelihood functions needs to be tempered before being explored by MCMC algorithm.

Note: the shape of the posterior is scaled by the square root of the tempering factor \(1 / \sqrt{tempering~factor}\).

Parameters

funccallable

Function to rescale.

tempering_factorfloat

Scale factor.

Returns

callable

Rescaled function.

>>> tempering(log_chi2, -2 / ddof) # Returns a reduced chi2 cost function
fitting.return_neg_func(func)

Return a callable which is the negative of a function: f(x) -> -f(x).

It can be used to create a callable cost function one wants to minimize (e.g. \(\chi^2\) estimator).

Parameters

funccallable

function to return the negative version.

Returns

callable

negative version of the function.

fitting.wrap_residuals(func, ydata, transform)

Calculate the residuals between data points and the model.

Parameters

funccallable

function of the model.

ydataarray-like

Data to calculate the residuals with func.

transformNone-type or array-like

Transform the residuals (e.g. weight by the uncertainties on ydata).

Returns

array-like

Residuals.