hic3defdr.analysis.simulation module

class hic3defdr.analysis.simulation.SimulatingHiC3DeFDR[source]

Bases: object

Mixin class containing plotting functions for HiC3DeFDR.

evaluate(cluster_pattern, label_pattern, min_dist=None, max_dist=None, rerun_bh=False, outfile=None)[source]

Evaluates the results of this analysis, comparing it to true labels.

Parameters:
  • cluster_pattern (str) – File path pattern to sparse JSON formatted cluster files representing loop cluster locations. Should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes. Pass a condition name to use self.loop_patterns[cluster_pattern] instead.
  • label_pattern (str) – File path pattern to true label files for each chromosome. Should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes. Files should be loadable with np.loadtxt(..., dtype='U7') to yield a vector of true labels parallel to the clusters pointed to by cluster_pattern.
  • max_dist (min_dist,) – Specify minimum and maximum distances to evaluate performance within, respectively. Pass None to leave one or both ends unbounded.
  • rerun_bh (bool) – If min_dist and/or max_dist are used to constrain the distances, pass True to re-run BH-FDR on the subset of p-values at the selected distances. Pass False to use the original dataset-wide q-values. Does nothing if min_dist and max_dist are both None.
  • outfile (str, optional) – Name of a file to save the evaluation results to inside this object’s outdir. Default is ‘eval.npz’ if min_dist and max_dist are both None, otherwise it is ‘eval_<min_dist>_<max_dist>.npz’.
simulate(cond, chrom=None, beta=0.5, p_diff=0.4, skip_bias=False, loop_pattern=None, outdir='sim', n_threads=-1, verbose=True)[source]

Simulates raw contact matrices based on previously fitted scaled means and dispersions in a specific condition.

Can only be run after estimate_dispersions() has been run.

Parameters:
  • cond (str) – Name of the condition to base the simulation on.
  • chrom (str, optional) – Name of the chromosome to simulate. Pass None to simulate all chromosomes in series.
  • beta (float) – The effect size of the loop perturbations to use when simulating. Perturbed loops will be strengthened or weakened by this fraction of their original strength.
  • p_diff (float or list of float) – Pass a single float to specify the probability that a loop will be perturbed across the simulated conditions. Pass four floats to specify the probabilities of all four specific perturbations: up in A, down in A, up in B, down in B. The remaining loops will be constitutive.
  • skip_bias (bool) – Pass True to set all bias factors and size factors to 1, effectively simulating “unbiased” raw data.
  • loop_pattern (str, optional) – File path pattern to sparse JSON formatted cluster files representing loop cluster locations for the simulation. Should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes. Pass None to use self.loop_patterns[cond].
  • outdir (str) – Path to a directory to store the simulated data to.
  • n_threads (int) – The number of threads (technically GIL-avoiding child processes) to use to process multiple chromosomes in parallel. Pass -1 to use as many threads as there are CPUs. Pass 0 to process the chromosomes serially.
  • verbose (bool) – Pass False to silence reporting of progress to stderr.