hic3defdr.analysis.simulation module¶
-
class
hic3defdr.analysis.simulation.
SimulatingHiC3DeFDR
[source]¶ Bases:
object
Mixin class containing plotting functions for HiC3DeFDR.
-
evaluate
(cluster_pattern, label_pattern, min_dist=None, max_dist=None, rerun_bh=False, outfile=None)[source]¶ Evaluates the results of this analysis, comparing it to true labels.
Parameters: - cluster_pattern (str) – File path pattern to sparse JSON formatted cluster files
representing loop cluster locations. Should contain at least one
‘<chrom>’ which will be replaced with the chromosome name when
loading data for specific chromosomes. Pass a condition name to use
self.loop_patterns[cluster_pattern]
instead. - label_pattern (str) – File path pattern to true label files for each chromosome. Should
contain at least one ‘<chrom>’ which will be replaced with the
chromosome name when loading data for specific chromosomes. Files
should be loadable with
np.loadtxt(..., dtype='U7')
to yield a vector of true labels parallel to the clusters pointed to bycluster_pattern
. - max_dist (min_dist,) – Specify minimum and maximum distances to evaluate performance within, respectively. Pass None to leave one or both ends unbounded.
- rerun_bh (bool) – If
min_dist
and/ormax_dist
are used to constrain the distances, pass True to re-run BH-FDR on the subset of p-values at the selected distances. Pass False to use the original dataset-wide q-values. Does nothing ifmin_dist
andmax_dist
are both None. - outfile (str, optional) – Name of a file to save the evaluation results to inside this
object’s
outdir
. Default is ‘eval.npz’ ifmin_dist
andmax_dist
are both None, otherwise it is ‘eval_<min_dist>_<max_dist>.npz’.
- cluster_pattern (str) – File path pattern to sparse JSON formatted cluster files
representing loop cluster locations. Should contain at least one
‘<chrom>’ which will be replaced with the chromosome name when
loading data for specific chromosomes. Pass a condition name to use
-
simulate
(cond, chrom=None, beta=0.5, p_diff=0.4, skip_bias=False, loop_pattern=None, outdir='sim', n_threads=-1, verbose=True)[source]¶ Simulates raw contact matrices based on previously fitted scaled means and dispersions in a specific condition.
Can only be run after
estimate_dispersions()
has been run.Parameters: - cond (str) – Name of the condition to base the simulation on.
- chrom (str, optional) – Name of the chromosome to simulate. Pass None to simulate all chromosomes in series.
- beta (float) – The effect size of the loop perturbations to use when simulating. Perturbed loops will be strengthened or weakened by this fraction of their original strength.
- p_diff (float or list of float) – Pass a single float to specify the probability that a loop will be perturbed across the simulated conditions. Pass four floats to specify the probabilities of all four specific perturbations: up in A, down in A, up in B, down in B. The remaining loops will be constitutive.
- skip_bias (bool) – Pass True to set all bias factors and size factors to 1, effectively simulating “unbiased” raw data.
- loop_pattern (str, optional) – File path pattern to sparse JSON formatted cluster files
representing loop cluster locations for the simulation. Should
contain at least one ‘<chrom>’ which will be replaced with the
chromosome name when loading data for specific chromosomes. Pass
None to use
self.loop_patterns[cond]
. - outdir (str) – Path to a directory to store the simulated data to.
- n_threads (int) – The number of threads (technically GIL-avoiding child processes) to use to process multiple chromosomes in parallel. Pass -1 to use as many threads as there are CPUs. Pass 0 to process the chromosomes serially.
- verbose (bool) – Pass False to silence reporting of progress to stderr.
-