Histogram tools

This module contains function to build histograms, get cumulative density function and generate randon values from customed distributions, both on GPU or CPU.

histogram_tools.binarySearchCpu(x, y)

Count values less than or equal to in another array. The algorithm used is binary search.

https://www.enjoyalgorithms.com/blog/count-values-less-than-equal-to-in-another-array

Parameters

x1d-array

“reference” array.

y1d-array

Array in which we want to count the number of elements lower or equal to the values in x. MUST be sorted

Returns

highint

Number of elements less than or equal to in another array.

histogram_tools.computeCdf(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF). It is a wrapper which calls the GPU or CPU version depending on the presence of cupy and a GPU.

Parameters

x_axiscupy array

x-axis of the CDF.

datacupy array

Data used to create the CDF.

modestring

If ccdf, the survival function (complementary of the CDF) is calculated instead.

normedbool

If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array

CDF of data.

histogram_tools.computeCdfCpu(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF) on CPU.

Parameters

x_axiscupy array

x-axis of the CDF.

dataarray

Data used to create the CDF.

modestring

If ccdf, the survival function (complementary of the CDF) is calculated instead.

normedbool

If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array

CDF of data.

histogram_tools.computeCdfGPU(x_axis, data, mode, normed)

Compute the empirical cumulative density function (CDF) on GPU with CUDA.

Parameters

x_axiscupy array

x-axis of the CDF.

datacupy array

Data used to create the CDF.

modestring

If ccdf, the survival function (complementary of the CDF) is calculated instead.

normedbool

If True, the CDF is normed so that the maximum is equal to 1.

Returns

cdfcupy array

CDF of data.

histogram_tools.compute_data_histogram(data_null, bin_bounds, wl_scale, **kwargs)

Calculate the historam of the null depth for each spectral channel. By default, the histogram is normalised by its integral, unless specified in **kwargs.

Parameters

data_null2d-array (wl, number of points)

sequence of null depths. The first axis corresponds to the spectral dispersion.

bin_boundstuple-like (scalar, scalar)

boundaries of the null depths range. Values out of this range are pruned from data_null when making the histogram

wl_scale1d-array

wavelength scale.

**kwargsextra-keywords

Use normed=False to not normalise the histogram by its sum.

Returns

null_pdf2d-array (wavelength size, nb of bins)

Histogram of the null depth per spectral channel.

null_pdf_errTYPE

Error on the histogram frequency per spectral channel, assuming the number of elements per bin follows a binomial distribution.

szint

Number of bins

histogram_tools.create_histogram_model(params_to_fit, xbins, params_type, wl_scale0, instrument_model, instrument_args, rvus_forfit, cdfs, rvus, **kwargs)

Monte-Carlo simulator of the instrument model to give a histogram.

To avoid memory overflow, the total number of samples can be chunked into smaller parts. The resulting histogram is the same if the simulation is made with the total number of samples in one go.

Parameters

params_to_fittuple-like

List of the parameters to fit.

xbins2D array

1st axis = wavelength

params_typelist

Labels of the parameters to fit, see the notes for more information.

wl_scale01D array

Wavelength scale.

instrument_modelfunction

Function simulating the instrument.

instrument_argstuple, must contain same type of data (all float or all array of the same shape)

List of arguments to pass to instrument_model which are not fitted.

rvus_forfitdic

Contains the uniformly distributed values to generate random values following distributions which parameters are to fit.

cdfstuple. First put CDF of quantities which does not depend on the wavelength.

For wavelength-dependant quantity, 1st axis = wavelength.

rvustuple. First put CDF of quantities which does not depend on the wavelength.

For wavelength-dependant quantity, 1st axis = wavelength.

**kwargskeywords

n_samp_per_loop (int): number of samples for the MC simulation per loop.

nloop (int): number of loops

Returns

accum1d-array

Model of the histogram.

diaglist

Diagnostic data from the instrument model.

diag_rv_1dlist

Diagnostic data that are random values generated from the noise sources that are spectrally independant.

diag_rv_2dlist

Diagnostic data that are random values generated from the noise sources that are spectrally dependant.

Notes

This function can handle model with an arbitrary number of parameters, that can be parameters of a distribution or something else.

The label system carried by params_type allows to identified the parameters that are not related to a distribution, that are related to the same distribution (e.g. a Normal distribution needs 2 parameters, a Poisson needs one).

In addition, when it comes to generate values from a distribution which parameters are to find, one may want some reproductibility. The parameter rvus_forfit embeds sequence of uniformly distributed values in a dictionary. The keys of the dictionary must match the labels in params_type. If a key is not found, a new sequence is generated. For a key, the value can be None if no reproductibility is expected from this distribution.

For example, let’s assume a model that take the parameters: null depth, correcting factor, \(\mu\) and \(\sigma\) of 2 normal distributions and the \(\lambda\) of a Poisson distribution.

We have :
  • params_to_fit = [null depth, correcting factor, \(\mu_1\) and \(\sigma_1\), \(\mu_1\) and \(\sigma_1\), \(\lambda\)]

  • params_type = ['deterministic', 'deterministic', 'normal1', 'normal1', 'normal2', 'normal2', 'poisson']

  • rvus_forfit = {'normal1':None, 'normal2':array([0.27259743, 0.89770258, 0.72093494]), 'poisson':None}

The function will identified their are 2 constant parameters, 3 distributions to model with respectively 2, 2 and 1 parameters. Among these 3 distributions, only one is reproductible if given the same set of values between two calls of the function create_histogram_model.

Implemented distributions and their associated keywords are:
  • Normal distribution with keyword ‘normal[…]’ pattern, compatible with frozen uniform sequence

  • Poisson distribution with keyword ‘poisson[…]’ pattern, not compatible with frozen uniform sequence

histogram_tools.generate_rv_params(params_to_fit, params_type, rvus_forfit, n_samp, dtypesize)

Generate random values from distribution which parameters are to be fit. An arbitrary number of distributions can be used.

This function returns random values from Normal and Poisson distributions.

Parameters

params_to_fitlist

List of all the parameters to fit.

params_typelist

List of the natures of the distribution from which generate random values. It must have the same length as params_to_fit

rvus_forfitdict

Dictionary of sequence of uniform distributions that can be use to generate random values in a reproducible way. The keywords must be the same as the ones in params_type.

n_sampint

Number of samples to generate per distribution.

dtypesizenumpy or cupy digital size object

Digital size of the array containing the random values (e.g. float, int, cp.float32…).

Raises

NameError

The distribution in params_type is not recognised.

Returns

rvs_to_fitarray

Array containing all the random values from the distributions which parameters are to determine. A row contains the sequence from one distribution.

See also

create_histogram_model: the Notes detail the use of params_to_fit`, ``params_type and rvus_forfit.

histogram_tools.getErrorBinomNorm(pdf, data_size, normed)

Calculate the error of the PDF knowing the number of elements in a bin is a random value following a binomial distribution.

Parameters

pdfarray

Normalized PDF which the error is calculated.

data_sizeint

Number of elements used to calculate the PDF.

normedbool

Set to True if pdf is normalised, False otherwise.

Returns

pdf_errarray

Error of the PDF.

histogram_tools.getErrorCDF(data_null, data_null_err, null_axis)

Calculate the error of the CDF. It uses the cupy library.

Parameters

data_nullarray

Null depth measurements used to create the CDF.

data_null_errarray

Error on the null depth measurements.

null_axisarray

Abscissa of the CDF.

Returns

array

Error of the CDF.

histogram_tools.getErrorNull(data_dic, dark_dic)

Compute the error of the null depth.

Parameters

data_dicdict

Dictionary of the data from load_data.

dark_dicdict

Dictionary of the dark from load_data.

Returns

std_nullarray

Array of the error on the null depths.

histogram_tools.getErrorPDF(data_null, data_null_err, null_axis)

Calculate the error of the PDF. It uses the cupy library.

Parameters

data_nullarray

Null depth measurements used to create the PDF.

data_null_errarray

Error on the null depth measurements.

null_axisarray

Abscissa of the CDF.

Returns

array

Error of the PDF.

histogram_tools.get_cdf(data)

Get the CDF of measured quantities. This function works on CPU and GPU.

Parameters

dataarray

Data from which the CDF is wanted.

wl_scale0array

Wavelength axis.

Returns

axes2d-array

axis of the CDF, first axis is the wavelength.

cdfs2d-array

CDF, first axis is the wavelength.

histogram_tools.get_dark_cdf(dk, wl_scale0)

Get the CDF for generating RV from measured dark distributions.

Parameters

dkarray-like

dark data.

wl_scale0array

wavelength axis.

Returns

dark_axisnd-array

axis of the CDF.

dark_cdfnd-array

CDF.

histogram_tools.rv_generator(absc, cdf, nsamp, rvu=None)

Random values generator based on the CDF.

Parameters

absccupy array

Abscissa of the CDF.

cdfcupy array

Normalized arbitrary CDF to use to generate rv.

nsampint

Number of values to generate.

rvuTYPE, optional

Use the same sequence of uniformly random values. The default is None.

Returns

output_samplescupy array

Sequence of random values following the CDF.