hic3defdr.analysis.constructor module

class hic3defdr.analysis.constructor.HiC3DeFDR(raw_npz_patterns, bias_patterns, chroms, design, outdir, dist_thresh_min=4, dist_thresh_max=200, bias_thresh=0.1, mean_thresh=1.0, loop_patterns=None, res=None)[source]

Bases: hic3defdr.analysis.core.CoreHiC3DeFDR, hic3defdr.analysis.analysis.AnalyzingHiC3DeFDR, hic3defdr.analysis.simulation.SimulatingHiC3DeFDR, hic3defdr.analysis.plotting.PlottingHiC3DeFDR

Main object for hic3defdr analysis.

raw_npz_patterns

File path patterns to scipy.sparse formatted NPZ files containing raw contact matrices for each replicate, in order. Each file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.

Type:list of str
bias_patterns

File path patterns to np.savetxt() formatted files containing bias vector information for each replicate, in order. ach file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.

Type:list of str
chroms

List of chromosome names as strings. These names will be substituted in for ‘<chroms>’ in the raw_npz_patterns and bias_patterns.

Type:list of str
design

Pass a DataFrame with boolean dtype whose rows correspond to replicates and whose columns correspond to conditions. Replicate and condition names will be inferred from the row and column labels, respectively. If you pass a string, the DataFrame will be loaded via pd.read_csv(design, index_col=0).

Type:pd.DataFrame or str
outdir

Specify a directory to store the results of the analysis. Two different HiC3DeFDR analyses cannot co-exist in the same directory. The directory will be created if it does not exist.

Type:str
dist_thresh_min, dist_thresh_max

The minimum and maximum interaction distance (in bin units) to include in the analysis.

Type:int
bias_thresh

Bins with a bias factor below this threshold or above its reciprocal in any replicate will be filtered out of the analysis.

Type:float
mean_thresh

Pixels with mean value below this threshold will be filtered out at the dispersion fitting stage.

Type:float
loop_patterns

Keys should be condition names as strings, values should be file path patterns to sparse JSON formatted cluster files representing called loops in that condition. Each file path pattern should contain at least one ‘<chrom>’ which will be replaced with the chromosome name when loading data for specific chromosomes.

Type:dict of str, optional
res

The bin resolution, in base pair units, of the input contact matrix data. Used only when printing TSV output. Pass None to skip printing TSV output during the threshold() and classify() steps.

Type:int, optional