Histogram tools

This module contains function to build histograms, get cumulative density function and generate randon values from customed distributions, both on GPU or CPU.

histogram_tools.binarySearchCpu(x, y)

Count values less than or equal to in another array. The algorithm used is binary search.

https://www.enjoyalgorithms.com/blog/count-values-less-than-equal-to-in-another-array

Parameters

x1d-array: “reference” array.
y1d-array: Array in which we want to count the number of elements lower or equal to the values in x. MUST be sorted

Returns

highint: Number of elements less than or equal to in another array.

histogram_tools.computeCdf(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF). It is a wrapper which calls the GPU or CPU version depending on the presence of cupy and a GPU.

Parameters

x_axiscupy array: x-axis of the CDF.
datacupy array: Data used to create the CDF.
modestring: If ccdf, the survival function (complementary of the CDF) is calculated instead.
normedbool: If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array: CDF of data.

histogram_tools.computeCdfCpu(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF) on CPU.

Parameters

x_axiscupy array: x-axis of the CDF.
dataarray: Data used to create the CDF.
modestring: If ccdf, the survival function (complementary of the CDF) is calculated instead.
normedbool: If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array: CDF of data.

histogram_tools.computeCdfGPU(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF) on GPU with CUDA.

Parameters

x_axiscupy array: x-axis of the CDF.
datacupy array: Data used to create the CDF.
modestring: If ccdf, the survival function (complementary of the CDF) is calculated instead.
normedbool: If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array: CDF of data.

histogram_tools.compute_data_histogram(data_null, bin_bounds, wl_scale, **kwargs)

Calculate the historam of the null depth for each spectral channel. By default, the histogram is normalised by its integral, unless specified in **kwargs.

Parameters

data_null2d-array (wl, number of points): sequence of null depths. The first axis corresponds to the spectral dispersion.
bin_boundstuple-like (scalar, scalar): boundaries of the null depths range. Values out of this range are pruned from data_null when making the histogram
wl_scale1d-array: wavelength scale.
**kwargsextra-keywords: Use normed=False to not normalise the histogram by its sum.

Returns

null_pdf2d-array (wavelength size, nb of bins): Histogram of the null depth per spectral channel.
null_pdf_errTYPE: Error on the histogram frequency per spectral channel, assuming the number of elements per bin follows a binomial distribution.
szint: Number of bins

histogram_tools.create_histogram_model(params_to_fit, xbins, params_type, wl_scale0, instrument_model, instrument_args, rvus_forfit, cdfs, rvus, **kwargs)

Monte-Carlo simulator of the instrument model to give a histogram.

To avoid memory overflow, the total number of samples can be chunked into smaller parts. The resulting histogram is the same if the simulation is made with the total number of samples in one go.

Parameters

params_to_fittuple-like

List of the parameters to fit.

xbins2D array

1st axis = wavelength

params_typelist

Labels of the parameters to fit, see the notes for more information.

wl_scale01D array

Wavelength scale.

instrument_modelfunction

Function simulating the instrument.

instrument_argstuple, must contain same type of data (all float or all array of the same shape)

List of arguments to pass to instrument_model which are not fitted.

rvus_forfitdic

Contains the uniformly distributed values to generate random values following distributions which parameters are to fit.

cdfstuple. First put CDF of quantities which does not depend on the wavelength.

For wavelength-dependant quantity, 1st axis = wavelength.

rvustuple. First put CDF of quantities which does not depend on the wavelength.

For wavelength-dependant quantity, 1st axis = wavelength.

**kwargskeywords

n_samp_per_loop (int): number of samples for the MC simulation per loop.

nloop (int): number of loops

Returns

accum1d-array: Model of the histogram.
diaglist: Diagnostic data from the instrument model.
diag_rv_1dlist: Diagnostic data that are random values generated from the noise sources that are spectrally independant.
diag_rv_2dlist: Diagnostic data that are random values generated from the noise sources that are spectrally dependant.

Notes

This function can handle model with an arbitrary number of parameters, that can be parameters of a distribution or something else.

The label system carried by params_type allows to identified the parameters that are not related to a distribution, that are related to the same distribution (e.g. a Normal distribution needs 2 parameters, a Poisson needs one).

In addition, when it comes to generate values from a distribution which parameters are to find, one may want some reproductibility. The parameter rvus_forfit embeds sequence of uniformly distributed values in a dictionary. The keys of the dictionary must match the labels in params_type. If a key is not found, a new sequence is generated. For a key, the value can be None if no reproductibility is expected from this distribution.

For example, let’s assume a model that take the parameters: null depth, correcting factor, \(\mu\) and \(\sigma\) of 2 normal distributions and the \(\lambda\) of a Poisson distribution.

We have :

params_to_fit = [null depth, correcting factor, \(\mu_1\) and \(\sigma_1\), \(\mu_1\) and \(\sigma_1\), \(\lambda\)]
params_type = ['deterministic', 'deterministic', 'normal1', 'normal1', 'normal2', 'normal2', 'poisson']
rvus_forfit = {'normal1':None, 'normal2':array([0.27259743, 0.89770258, 0.72093494]), 'poisson':None}

The function will identified their are 2 constant parameters, 3 distributions to model with respectively 2, 2 and 1 parameters. Among these 3 distributions, only one is reproductible if given the same set of values between two calls of the function create_histogram_model.

Implemented distributions and their associated keywords are:

Normal distribution with keyword ‘normal[…]’ pattern, compatible with frozen uniform sequence
Poisson distribution with keyword ‘poisson[…]’ pattern, not compatible with frozen uniform sequence

histogram_tools.generate_rv_params(params_to_fit, params_type, rvus_forfit, n_samp, dtypesize)

Generate random values from distribution which parameters are to be fit. An arbitrary number of distributions can be used.

This function returns random values from Normal and Poisson distributions.

Parameters

params_to_fitlist: List of all the parameters to fit.
params_typelist: List of the natures of the distribution from which generate random values. It must have the same length as params_to_fit
rvus_forfitdict: Dictionary of sequence of uniform distributions that can be use to generate random values in a reproducible way. The keywords must be the same as the ones in params_type.
n_sampint: Number of samples to generate per distribution.
dtypesizenumpy or cupy digital size object: Digital size of the array containing the random values (e.g. float, int, cp.float32…).

Raises

NameError: The distribution in params_type is not recognised.

Returns

rvs_to_fitarray: Array containing all the random values from the distributions which parameters are to determine. A row contains the sequence from one distribution.

Parameters

pdfarray: Normalized PDF which the error is calculated.
data_sizeint: Number of elements used to calculate the PDF.
normedbool: Set to True if pdf is normalised, False otherwise.

Returns

pdf_errarray: Error of the PDF.

histogram_tools.getErrorCDF(data_null, data_null_err, null_axis)

Calculate the error of the CDF. It uses the cupy library.

Parameters

data_nullarray: Null depth measurements used to create the CDF.
data_null_errarray: Error on the null depth measurements.
null_axisarray: Abscissa of the CDF.

Returns

array: Error of the CDF.

histogram_tools.getErrorNull(data_dic, dark_dic)

Compute the error of the null depth.

Parameters

data_dicdict: Dictionary of the data from load_data.
dark_dicdict: Dictionary of the dark from load_data.

Returns

std_nullarray: Array of the error on the null depths.

histogram_tools.getErrorPDF(data_null, data_null_err, null_axis)

Calculate the error of the PDF. It uses the cupy library.

Parameters

data_nullarray: Null depth measurements used to create the PDF.
data_null_errarray: Error on the null depth measurements.
null_axisarray: Abscissa of the CDF.

Returns

array: Error of the PDF.

histogram_tools.get_cdf(data)

Get the CDF of measured quantities. This function works on CPU and GPU.

Parameters

dataarray: Data from which the CDF is wanted.
wl_scale0array: Wavelength axis.

Returns

axes2d-array: axis of the CDF, first axis is the wavelength.
cdfs2d-array: CDF, first axis is the wavelength.

histogram_tools.get_dark_cdf(dk, wl_scale0)

Get the CDF for generating RV from measured dark distributions.

Parameters

dkarray-like: dark data.
wl_scale0array: wavelength axis.

Returns

dark_axisnd-array: axis of the CDF.
dark_cdfnd-array: CDF.

histogram_tools.rv_generator(absc, cdf, nsamp, rvu=None)

Random values generator based on the CDF.

Parameters

absccupy array: Abscissa of the CDF.
cdfcupy array: Normalized arbitrary CDF to use to generate rv.
nsampint: Number of values to generate.
rvuTYPE, optional: Use the same sequence of uniformly random values. The default is None.

Returns

output_samplescupy array: Sequence of random values following the CDF.

Histogram tools

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Notes

Parameters

Raises

Returns

See also

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns