hic3defdr.analysis.constructor module¶
-
class
hic3defdr.analysis.constructor.
HiC3DeFDR
(raw_npz_patterns, bias_patterns, chroms, design, outdir, dist_thresh_min=4, dist_thresh_max=200, bias_thresh=0.1, mean_thresh=1.0, loop_patterns=None, res=None)[source]¶ Bases:
hic3defdr.analysis.core.CoreHiC3DeFDR
,hic3defdr.analysis.analysis.AnalyzingHiC3DeFDR
,hic3defdr.analysis.simulation.SimulatingHiC3DeFDR
,hic3defdr.analysis.plotting.PlottingHiC3DeFDR
Main object for hic3defdr analysis.
-
raw_npz_patterns
¶ File path patterns to
scipy.sparse
formatted NPZ files containing raw contact matrices for each replicate, in order. Each file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.Type: list of str
-
bias_patterns
¶ File path patterns to
np.savetxt()
formatted files containing bias vector information for each replicate, in order. ach file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.Type: list of str
-
chroms
¶ List of chromosome names as strings. These names will be substituted in for ‘<chroms>’ in the
raw_npz_patterns
andbias_patterns
.Type: list of str
-
design
¶ Pass a DataFrame with boolean dtype whose rows correspond to replicates and whose columns correspond to conditions. Replicate and condition names will be inferred from the row and column labels, respectively. If you pass a string, the DataFrame will be loaded via
pd.read_csv(design, index_col=0)
.Type: pd.DataFrame or str
-
outdir
¶ Specify a directory to store the results of the analysis. Two different HiC3DeFDR analyses cannot co-exist in the same directory. The directory will be created if it does not exist.
Type: str
-
dist_thresh_min, dist_thresh_max
The minimum and maximum interaction distance (in bin units) to include in the analysis.
Type: int
-
bias_thresh
¶ Bins with a bias factor below this threshold or above its reciprocal in any replicate will be filtered out of the analysis.
Type: float
-
mean_thresh
¶ Pixels with mean value below this threshold will be filtered out at the dispersion fitting stage.
Type: float
-
loop_patterns
¶ Keys should be condition names as strings, values should be file path patterns to sparse JSON formatted cluster files representing called loops in that condition. Each file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.
Type: dict of str, optional
-
res
¶ The bin resolution, in base pair units, of the input contact matrix data. Used only when printing TSV output. Pass None to skip printing TSV output during the
threshold()
andclassify()
steps.Type: int, optional
-