Histogram tools
This module contains function to build histograms, get cumulative density function and generate randon values from customed distributions, both on GPU or CPU.
- histogram_tools.binarySearchCpu(x, y)
Count values less than or equal to in another array. The algorithm used is binary search.
https://www.enjoyalgorithms.com/blog/count-values-less-than-equal-to-in-another-array
Parameters
- x1d-array
“reference” array.
- y1d-array
Array in which we want to count the number of elements lower or equal to the values in x. MUST be sorted
Returns
- highint
Number of elements less than or equal to in another array.
- histogram_tools.computeCdf(x_axis, data, mode, normed)
Compute the empirical cumulative density function (CDF). It is a wrapper which calls the GPU or CPU version depending on the presence of cupy and a GPU.
Parameters
- x_axiscupy array
x-axis of the CDF.
- datacupy array
Data used to create the CDF.
- modestring
If
ccdf
, the survival function (complementary of the CDF) is calculated instead.- normedbool
If
True
, the CDF is normed so that the maximum is equal to 1.
Returns
- cdfcupy array
CDF of
data
.
- histogram_tools.computeCdfCpu(x_axis, data, mode, normed)
Compute the empirical cumulative density function (CDF) on CPU.
Parameters
- x_axiscupy array
x-axis of the CDF.
- dataarray
Data used to create the CDF.
- modestring
If
ccdf
, the survival function (complementary of the CDF) is calculated instead.- normedbool
If
True
, the CDF is normed so that the maximum is equal to 1.
Returns
- cdfcupy array
CDF of
data
.
- histogram_tools.computeCdfGPU(x_axis, data, mode, normed)
Compute the empirical cumulative density function (CDF) on GPU with CUDA.
Parameters
- x_axiscupy array
x-axis of the CDF.
- datacupy array
Data used to create the CDF.
- modestring
If
ccdf
, the survival function (complementary of the CDF) is calculated instead.- normedbool
If
True
, the CDF is normed so that the maximum is equal to 1.
Returns
- cdfcupy array
CDF of
data
.
- histogram_tools.compute_data_histogram(data_null, bin_bounds, wl_scale, **kwargs)
Calculate the historam of the null depth for each spectral channel. By default, the histogram is normalised by its integral, unless specified in
**kwargs
.Parameters
- data_null2d-array (wl, number of points)
sequence of null depths. The first axis corresponds to the spectral dispersion.
- bin_boundstuple-like (scalar, scalar)
boundaries of the null depths range. Values out of this range are pruned from
data_null
when making the histogram- wl_scale1d-array
wavelength scale.
- **kwargsextra-keywords
Use
normed=False
to not normalise the histogram by its sum.
Returns
- null_pdf2d-array (wavelength size, nb of bins)
Histogram of the null depth per spectral channel.
- null_pdf_errTYPE
Error on the histogram frequency per spectral channel, assuming the number of elements per bin follows a binomial distribution.
- szint
Number of bins
- histogram_tools.create_histogram_model(params_to_fit, xbins, params_type, wl_scale0, instrument_model, instrument_args, rvus_forfit, cdfs, rvus, **kwargs)
Monte-Carlo simulator of the instrument model to give a histogram.
To avoid memory overflow, the total number of samples can be chunked into smaller parts. The resulting histogram is the same if the simulation is made with the total number of samples in one go.
Parameters
- params_to_fittuple-like
List of the parameters to fit.
- xbins2D array
1st axis = wavelength
- params_typelist
Labels of the parameters to fit, see the notes for more information.
- wl_scale01D array
Wavelength scale.
- instrument_modelfunction
Function simulating the instrument.
- instrument_argstuple, must contain same type of data (all float or all array of the same shape)
List of arguments to pass to
instrument_model
which are not fitted.- rvus_forfitdic
Contains the uniformly distributed values to generate random values following distributions which parameters are to fit.
- cdfstuple. First put CDF of quantities which does not depend on the wavelength.
For wavelength-dependant quantity, 1st axis = wavelength.
- rvustuple. First put CDF of quantities which does not depend on the wavelength.
For wavelength-dependant quantity, 1st axis = wavelength.
- **kwargskeywords
n_samp_per_loop
(int): number of samples for the MC simulation per loop.nloop
(int): number of loops
Returns
- accum1d-array
Model of the histogram.
- diaglist
Diagnostic data from the instrument model.
- diag_rv_1dlist
Diagnostic data that are random values generated from the noise sources that are spectrally independant.
- diag_rv_2dlist
Diagnostic data that are random values generated from the noise sources that are spectrally dependant.
Notes
This function can handle model with an arbitrary number of parameters, that can be parameters of a distribution or something else.
The label system carried by params_type allows to identified the parameters that are not related to a distribution, that are related to the same distribution (e.g. a Normal distribution needs 2 parameters, a Poisson needs one).
In addition, when it comes to generate values from a distribution which parameters are to find, one may want some reproductibility. The parameter rvus_forfit embeds sequence of uniformly distributed values in a dictionary. The keys of the dictionary must match the labels in params_type. If a key is not found, a new sequence is generated. For a key, the value can be None if no reproductibility is expected from this distribution.
For example, let’s assume a model that take the parameters: null depth, correcting factor, \(\mu\) and \(\sigma\) of 2 normal distributions and the \(\lambda\) of a Poisson distribution.
- We have :
params_to_fit = [null depth, correcting factor, \(\mu_1\) and \(\sigma_1\), \(\mu_1\) and \(\sigma_1\), \(\lambda\)]
params_type = ['deterministic', 'deterministic', 'normal1', 'normal1', 'normal2', 'normal2', 'poisson']
rvus_forfit = {'normal1':None, 'normal2':array([0.27259743, 0.89770258, 0.72093494]), 'poisson':None}
The function will identified their are 2 constant parameters, 3 distributions to model with respectively 2, 2 and 1 parameters. Among these 3 distributions, only one is reproductible if given the same set of values between two calls of the function
create_histogram_model
.- Implemented distributions and their associated keywords are:
Normal distribution with keyword ‘normal[…]’ pattern, compatible with frozen uniform sequence
Poisson distribution with keyword ‘poisson[…]’ pattern, not compatible with frozen uniform sequence
- histogram_tools.generate_rv_params(params_to_fit, params_type, rvus_forfit, n_samp, dtypesize)
Generate random values from distribution which parameters are to be fit. An arbitrary number of distributions can be used.
This function returns random values from Normal and Poisson distributions.
Parameters
- params_to_fitlist
List of all the parameters to fit.
- params_typelist
List of the natures of the distribution from which generate random values. It must have the same length as params_to_fit
- rvus_forfitdict
Dictionary of sequence of uniform distributions that can be use to generate random values in a reproducible way. The keywords must be the same as the ones in params_type.
- n_sampint
Number of samples to generate per distribution.
- dtypesizenumpy or cupy digital size object
Digital size of the array containing the random values (e.g.
float
,int
,cp.float32
…).
Raises
- NameError
The distribution in params_type is not recognised.
Returns
- rvs_to_fitarray
Array containing all the random values from the distributions which parameters are to determine. A row contains the sequence from one distribution.
See also
create_histogram_model: the Notes detail the use of
params_to_fit`, ``params_type
andrvus_forfit
.
- histogram_tools.getErrorBinomNorm(pdf, data_size, normed)
Calculate the error of the PDF knowing the number of elements in a bin is a random value following a binomial distribution.
Parameters
- pdfarray
Normalized PDF which the error is calculated.
- data_sizeint
Number of elements used to calculate the PDF.
- normedbool
Set to
True
ifpdf
is normalised,False
otherwise.
Returns
- pdf_errarray
Error of the PDF.
- histogram_tools.getErrorCDF(data_null, data_null_err, null_axis)
Calculate the error of the CDF. It uses the cupy library.
Parameters
- data_nullarray
Null depth measurements used to create the CDF.
- data_null_errarray
Error on the null depth measurements.
- null_axisarray
Abscissa of the CDF.
Returns
- array
Error of the CDF.
- histogram_tools.getErrorNull(data_dic, dark_dic)
Compute the error of the null depth.
Parameters
- data_dicdict
Dictionary of the data from
load_data
.- dark_dicdict
Dictionary of the dark from
load_data
.
Returns
- std_nullarray
Array of the error on the null depths.
- histogram_tools.getErrorPDF(data_null, data_null_err, null_axis)
Calculate the error of the PDF. It uses the cupy library.
Parameters
- data_nullarray
Null depth measurements used to create the PDF.
- data_null_errarray
Error on the null depth measurements.
- null_axisarray
Abscissa of the CDF.
Returns
- array
Error of the PDF.
- histogram_tools.get_cdf(data)
Get the CDF of measured quantities. This function works on CPU and GPU.
Parameters
- dataarray
Data from which the CDF is wanted.
- wl_scale0array
Wavelength axis.
Returns
- axes2d-array
axis of the CDF, first axis is the wavelength.
- cdfs2d-array
CDF, first axis is the wavelength.
- histogram_tools.get_dark_cdf(dk, wl_scale0)
Get the CDF for generating RV from measured dark distributions.
Parameters
- dkarray-like
dark data.
- wl_scale0array
wavelength axis.
Returns
- dark_axisnd-array
axis of the CDF.
- dark_cdfnd-array
CDF.
- histogram_tools.rv_generator(absc, cdf, nsamp, rvu=None)
Random values generator based on the CDF.
Parameters
- absccupy array
Abscissa of the CDF.
- cdfcupy array
Normalized arbitrary CDF to use to generate rv.
- nsampint
Number of values to generate.
- rvuTYPE, optional
Use the same sequence of uniformly random values. The default is None.
Returns
- output_samplescupy array
Sequence of random values following the CDF.