API Reference

Quick Summary

These are the user facing classes and methods:

SWeight(data[, pdfs, yields, discvarranges, ...])

Produce sweights for a dataset given component pdfs.

Cow(mrange, gs, gb[, Im, obs, renorm, verbose])

Produce weights using COWs.

convert_rf_pdf(pdf, obs[, npoints, forcenorm])

Convert RooAbsPdf into python callable.

kendall_tau(x, y)

Return kendall tau correlation coefficient.

cov_correct(hs, gxs, hdata, gdata, weights, ...)

Perform a second order covariance correction for a fit to weighted data.

approx_cov_correct(pdf, data, wts, fvals, fcov)

Perform a first order covariance correction for a fit to weighted data.

SWeighter

class sweights.SWeight(data, pdfs=[], yields=[], discvarranges=None, method='summation', compnames=None, alphas=None, rfobs=[], verbose=True, checks=True)

Produce sweights for a dataset given component pdfs.

Initialize SWeight object.

This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function get_weight().

Parameters
  • data (ndarray) – The dataset in the discriminating variable, should have shape (nevents,ndiscvars) so for one discriminating variable will just be (N,) or (N,1).

  • pdfs (list of callable) – A list of the component pdfs. These should be simple python callable functions which are passed the same number of parameters as there are discriminating variables. If running with the ‘roofit’ method (see method) then this can be a list of ROOT.RooAbsPdf objects. If you only have your pdfs defined as RooFit objects then you can use the wrapper function convert_rf_pdf() to convert them into the appropriate object type for this call and then use other methods.

  • yields (list of float) – A list of the component yields.

  • discvarranges (list of tuple or tuple of tuple, optional) – The ranges for each discriminating variable, as a tuple or list of two element tuples specifying the lower and upper bound for the range (the default is None which will use minus to plus infinity)

  • method (str, optional) – The sweights method to use. Must be one of ‘summation’, ‘integration’, ‘subhess’, ‘tsplot’, ‘rootfit’. The recommended default is summation corresponding to Variant B of arXiv:2112.04574

  • compnames (list of str, optional) – A list of the component names. Only used for the legend entries when making a plot of the weight functions with make_weight_plot()

  • alphas (ndarray, optional) – If using the ‘subhess’ method then the covariance matrix of a fit to the disciminanting variable(s) in which only the yields float must also be passed. This matrix is inverted to produce the W-matrix.

  • rfobs (list, optional) – If using the ‘roofit’ method then the discriminating variables (of type RooRealVar) on which the pdfs depend must also be passed

  • verbose (bool, optional) – Print output

  • checks (bool, optional) – Perform some checks that the derived weights are self consistent, in particular check that the sum of all component weights for a given value of the discriminating variable(s) is unity, and check that the sum of all weights for a given component reproduces the yields. This will print additional output to the screen and issue warnings if checks are not passed.

Notes

Many of the arguments passed and methods implemented are not strictly necessary. In particular the ‘tsplot’ and ‘roofit’ methods are just wrappers for implementations elsewhere that were originally used as cross-checks. In future versions these two methods should be removed to simplify this call.

get_weight(icomp=0, *args)

Return the weights.

Get the weights for a given component and set of discriminating variable values.

Parameters
  • icomp (int, optional) – Get the weight function for the ith component

  • *args (ndarray) – The values of the discriminating variables (one argument for each) as numpy arrays

Returns

An array of the weights

Return type

ndarray

make_weight_plot(axis=None, dopts=['r', 'b', 'g', 'm', 'c', 'y'], labels=None)

Make a plot of the weight functions.

Parameters
  • axis (optional) – matplotlib axis to draw will default to use plt.gca()

  • dopts (list of str) – List of colors for the different components

  • labels (list of str) – List of legend labels. Default will use those passed in compnames with __init__

print_checks()

Print checks.

COWs

class sweights.Cow(mrange, gs, gb, Im=1, obs=None, renorm=True, verbose=True)

Produce weights using COWs.

Initialize Cow object.

This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function get_weight().

Parameters
  • mrange (tuple) – A two element tuple for the integration range in the discriminant variable

  • gs (callable) – The function for the signal pdf (numerator) must accept a single argument in this case the discriminant variable

  • gb (callable or list of callable) – A function or list of functions for the backgrond pdfs (numerator) which must each accept a single argument in this case the discriminant variable

  • Im (int or callable, optional) – The function for the “variance function” or I(m) (denominator) which must accept a single argument in this case the discriminant variable. Can also pass 1 for a uniform variance function (the default)

  • obs (tuple of ndarray, optional) – You can instead pass the observed distribution to evaluate Im instead. This expects the entries and bin edges in a two element tuple like the return value of np.histogram

  • renorm (bool, optional) – Renormalise passed functions to unity (you can override this if you already know it’s true)

Notes

For more details see arXiv:2112.04574

See also

get_weight

get_weight(k, m)

Return the weights.

Wrapper for wk

See also

wk

wk(k, m)

Return the weights.

Parameters
  • k (int) – Index of the component

  • m (ndarray) – Values of the discriminating variable to compute weights for

Returns

Values of the weights

Return type

ndarray

Covariance Correction

sweights.cov_correct(hs, gxs, hdata, gdata, weights, Nxs, fvals, fcov, dw_dW_fs, Wvals, verbose=False)

Perform a second order covariance correction for a fit to weighted data.

Parameters
  • hs (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters

  • gxs (list of callable) – A list of the disciminant variable pdfs which must take a single argument (x), where x is the observable (shape parameters of gxs must be known in this case)

  • hdata (ndarray) – The data values of the control variable observable at which the hs pdf is evaluated

  • gdata (ndarray) – The data values of the disciminant variable observable at which the gxs pdfs are evaluated

  • weights (ndarray) – The values of the weights

  • Nxs (list of float or tuple of float) – A list of the fitted yield values for the components gxs

  • fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn

  • fcov (ndarray) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)

  • dw_dW_fs (list of callable) – A list of the functions describing the partial derivate of the weight function with respect to the W matrix elements (see the tutorial to see this passed for sweights and COWs)

  • Wvals (list of float or tuple of float) – A list of the W matrix elements

  • verbose (bool, optional) – Print some output

Returns

The corrected covariance matrix

Return type

ndarray

Notes

This function corresponds to the weighted score function with sweights (both terms of Eq.51 in arXiv:2112.04574). If the shape parameters of the gxs are not known then the full sandwich estimate must be used which is not yet implemented in this package.

sweights.approx_cov_correct(pdf, data, wts, fvals, fcov, verbose=False)

Perform a first order covariance correction for a fit to weighted data.

Parameters
  • pdf (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters

  • data (ndarray) – The data values of the observable at which the pdf is evaluated

  • wts (ndarray) – The values of the weights

  • fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn

  • fcov (ndarray) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)

  • verbose (bool, optional) – Print some output

Returns

The corrected covariance matrix

Return type

ndarray

Notes

This function corresponds to the weighted score function with independent weights (first term of Eq.51 in arXiv:2112.04574). This is often a good approximation for sweights although will be an overestimate. For a better correction for sweights use cov_correct (which is only in general accurate when the shape parameters in the discriminating variable are known)

See also

cov_correct

Utilities

sweights.kendall_tau(x, y)

Return kendall tau correlation coefficient.

Useful for ascertainting the extent to which two samples are independent. In particular for sweights and COWs one wants to know the extent to which the discriminant and control variables factorise.

Parameters
  • x (ndarray) – Values in the first dimension - must have the same shape as y

  • y (ndarray) – Values in the second dimension - must have the same shape as x

Returns

Two element tuple with the coefficient and its uncertainty

Return type

tuple

Notes

x and y must have the same dimension. This function now uses scipy.stats.kendalltau for the coefficent calcualtion (the uncertainty calculation is trivial) which makes a few optimisations. See the scipy documentation for more information.

sweights.convert_rf_pdf(pdf, obs, npoints=400, forcenorm=False)

Convert RooAbsPdf into python callable.

Helper function to convert a RooFit::RooAbsPdf object into a python callable that can be used by either the SWeight or Cow classes

Parameters
  • pdf (RooAbsPdf) – The pdf, must inherit from RooAbsPdf (e.g. RooGaussian, RooExponential, RooAddPdf etc.)

  • obs (RooRealVar) – The observable, must inherit from RooRealVarLValue but will usually be a RooRealVar

  • npoints (int, optional) – The number of points to use for the interpolation

  • forcenorm (bool, optional) – Force the return function to be normalised by performing a numerical integration of it (the function should in most cases be normalised properly anyway so this shouldn’t be needed)

Returns

A callable function representing a normalised pdf which can then be passed to the SWeight or Cow classes

Return type

callable