hic3defdr.util.balancing module

hic3defdr.util.balancing.kr_balance(array, tol=1e-06, x0=None, delta=0.1, ddelta=3, fl=1, max_iter=3000)[source]

Balances a matrix via Knight-Ruiz matrix balancing, using sparse matrix operations.

Parameters:
  • array (np.ndarray or scipy.sparse.spmatrix) – The matrix to balance. Must be 2 dimensional and square. Will be symmetrized using its upper triangular entries.
  • tol (float) – The error tolerance.
  • x0 (np.ndarray, optional) – The initial guess for the bias vector. Pass None to use a vector of all 1’s.
  • ddelta (delta,) – How close/far the balancing vectors can get to/from the positive cone, in terms of a relative measure on the size of the elements.
  • fl (int) – Determines whether or not internal convergence statistics will be printed to standard output.
  • max_iter (int) – The maximum number of iterations to perform.
Returns:

The CSR matrix is the balanced matrix. It will be upper triangular if array was upper triangular, otherwise it will by symmetric. The first np.ndarray is the bias vector. The second np.ndarray is a list of residuals after each iteration.

Return type:

sparse.csr_matrix, np.ndarray, np.ndarray

Notes

The core logic of this implementation is transcribed (by Dan Gillis) from the original Knight and Ruiz IMA J. Numer. Anal. 2013 paper and differs only slightly from the implementation in Juicer/Juicebox (see comments).

It then uses the “sum factor” approach from the Juicer/Juicebox code to scale the bias vectors up to match the scale of the original matrix (see comments).

Overall, this function is not nan-safe, but you may pass matrices that contain empty rows (the matrix will be shrunken before balancing, but all outputs will match the shape of the original matrix).

This function does not perform any of the logic used in Juicer/Juicebox to filter out sparse rows if the balancing fails to converge.

If the max_iter limit is reached, this function will return the current best balanced matrix and bias vector - no exception will be thrown. Callers can compare the length of the list of residuals to max_iter - if they are equal, the algorithm has not actually converged, and the caller may choose to perform further filtering on the matrix and try again. The caller is also free to decide if the matrix is “balanced enough” using any other criteria (e.g., variance of nonzero row sums).