hic3defdr.util.simulation module¶
-
hic3defdr.util.simulation.
perturb_cluster
(matrix, cluster, effect, respect_zeros=True)[source]¶ Perturbs a specific cluster in a contact matrix with a given effect.
Operates in-place.
Based on a notebook linked here: https://colab.research.google.com/drive/1dk9kX57ZtlxQ3jubrKL_q2r8LZnSlVwY
Parameters: - matrix (scipy.sparse.spmatrix) – The contact matrix. Must support slicing.
- cluster (list of tuple of int) – A list of (i, j) tuples marking the position of points which belong to the cluster which we want to perturb.
- effect (float) – The effect to apply to the cluster. Values in
matrix
under the cluster footprint will be shifted by this proportion of their original value. - respect_zeros (bool) – Pass True to preserve the sparsity structure of
matrix
if it is sparse. Has no effect ifmatrix
is dense.
-
hic3defdr.util.simulation.
simulate
(row, col, mean, disp_fn, bias, size_factors, clusters, beta=0.5, p_diff=0.4, trend='mean', verbose=True)[source]¶ Simulates raw contact matrices based on
mean
anddisp_fn
usingbias
andsize_factors
per simulated replicate and perturbing the loops specified inclusters
with an effect size ofbeta
and direction chosen at random forp_diff
fraction of clusters.Parameters: - col (row,) – Row and column indices identifying the location of pixels in
mean
. - mean (np.ndarray) – Vector of mean values for each pixel to use as a base to simulate from.
- disp_fn (function) – Function that returns a dispersion given a mean or distance (as
specified by
trend
). Will be used to determine the dispersion values to use during simulation. - bias (np.ndarray) – Rows are bins of the full contact matrix, columns are to-be-simulated replicates. Each column represents the bias vector to use for simulating that replicate.
- size_factors (np.ndarray) – Vector of size factors to use for simulating for each to-be-simulated replicate. To use a different size factor at different distance scales, pass a matrix whose rows correspond to distance scales and whose columns correspond to replicates.
- clusters (list of list of tuple) – The outer list is a list of clusters which represent the locations of loops. Each cluster is a list of (i, j) tuples marking the position of pixels which belong to that cluster.
- beta (float) – The effect size of the loop perturbations to use when simulating. Perturbed loops will be strengthened or weakened by this fraction of their original strength.
- p_diff (float or list of float) – Pass a single float to specify the probability that a loop will be perturbed across the simulated conditions. Pass four floats to specify the probabilities of all four specific perturbations: up in A, down in A, up in B, down in B. The remaining loops will be constitutive.
- trend ('mean' or 'dist') – Whether
disp_fn
returns the smoothed dispersion as a function of mean or of interaction distance. - verbose (bool) – Pass False to silence reporting of progress to stderr.
Returns: - classes (np.ndarray) – Vector of ground-truth class labels used for simulation with ‘U7’ dtype.
- gen (generator of
scipy.sparse.csr_matrix
) – Generates the simulated raw contact matrices for each simulated replicate, in order.
- col (row,) – Row and column indices identifying the location of pixels in