hic3defdr.util.simulation module

hic3defdr.util.simulation.perturb_cluster(matrix, cluster, effect, respect_zeros=True)[source]

Perturbs a specific cluster in a contact matrix with a given effect.

Operates in-place.

Based on a notebook linked here: https://colab.research.google.com/drive/1dk9kX57ZtlxQ3jubrKL_q2r8LZnSlVwY

Parameters:
  • matrix (scipy.sparse.spmatrix) – The contact matrix. Must support slicing.
  • cluster (list of tuple of int) – A list of (i, j) tuples marking the position of points which belong to the cluster which we want to perturb.
  • effect (float) – The effect to apply to the cluster. Values in matrix under the cluster footprint will be shifted by this proportion of their original value.
  • respect_zeros (bool) – Pass True to preserve the sparsity structure of matrix if it is sparse. Has no effect if matrix is dense.
hic3defdr.util.simulation.simulate(row, col, mean, disp_fn, bias, size_factors, clusters, beta=0.5, p_diff=0.4, trend='mean', verbose=True)[source]

Simulates raw contact matrices based on mean and disp_fn using bias and size_factors per simulated replicate and perturbing the loops specified in clusters with an effect size of beta and direction chosen at random for p_diff fraction of clusters.

Parameters:
  • col (row,) – Row and column indices identifying the location of pixels in mean.
  • mean (np.ndarray) – Vector of mean values for each pixel to use as a base to simulate from.
  • disp_fn (function) – Function that returns a dispersion given a mean or distance (as specified by trend). Will be used to determine the dispersion values to use during simulation.
  • bias (np.ndarray) – Rows are bins of the full contact matrix, columns are to-be-simulated replicates. Each column represents the bias vector to use for simulating that replicate.
  • size_factors (np.ndarray) – Vector of size factors to use for simulating for each to-be-simulated replicate. To use a different size factor at different distance scales, pass a matrix whose rows correspond to distance scales and whose columns correspond to replicates.
  • clusters (list of list of tuple) – The outer list is a list of clusters which represent the locations of loops. Each cluster is a list of (i, j) tuples marking the position of pixels which belong to that cluster.
  • beta (float) – The effect size of the loop perturbations to use when simulating. Perturbed loops will be strengthened or weakened by this fraction of their original strength.
  • p_diff (float or list of float) – Pass a single float to specify the probability that a loop will be perturbed across the simulated conditions. Pass four floats to specify the probabilities of all four specific perturbations: up in A, down in A, up in B, down in B. The remaining loops will be constitutive.
  • trend ('mean' or 'dist') – Whether disp_fn returns the smoothed dispersion as a function of mean or of interaction distance.
  • verbose (bool) – Pass False to silence reporting of progress to stderr.
Returns:

  • classes (np.ndarray) – Vector of ground-truth class labels used for simulation with ‘U7’ dtype.
  • gen (generator of scipy.sparse.csr_matrix) – Generates the simulated raw contact matrices for each simulated replicate, in order.