hic3defdr.util.clusters module¶
-
class
hic3defdr.util.clusters.
DirectedDisjointSet
[source]¶ Bases:
object
Based on https://stackoverflow.com/a/3067672 but supporting directed edges.
The overall effect is like a directed sparse graph -
DDS.add(a, b)
is like adding an edge froma
tob
.a
gets marked as a source,b
does not (anything not in the setDDS.sources
is assumed to be a destination). Ifb
is in an existing group, but isn’t also the source of any other edge, then the groups won’t be merged. Finally, the groups returned byDDS.get_groups()
will be filtered to include only source nodes.This is an “improved” or “streamlined” version where destination nodes are not stored anywhere if they haven’t previously been seen as a source.
-
class
hic3defdr.util.clusters.
NumpyEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Bases:
json.encoder.JSONEncoder
Pass this to json.dump() to correctly serialize numpy values.
Credit: https://stackoverflow.com/a/27050186
-
default
(obj)[source]¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
-
hic3defdr.util.clusters.
cluster_from_string
(cluster_string)[source]¶ If a cluster gets converted to a string (e.g., when the cluster is written to a text file), this function allows you to recover the cluster as a normal Python object (a list of pairs of integers).
Parameters: cluster_string (str) – The string representation of the cluster. Returns: The inner lists are pairs of integers specifying the row and column indices of the pixels in the cluster. Return type: list of list of int Examples
>>> from hic3defdr.util.clusters import cluster_from_string >>> cluster = [(4, 5), (3, 4), (3, 5), (3, 6)] >>> cluster_string = str(cluster) >>> cluster_string '[(4, 5), (3, 4), (3, 5), (3, 6)]' >>> cluster_from_string(cluster_string) [[4, 5], [3, 4], [3, 5], [3, 6]]
-
hic3defdr.util.clusters.
cluster_to_loop_id
(cluster, chrom, resolution)[source]¶ Makes a cluster into a loop id of the form “chr:start-end_chr:start-end”.
This is a copy of
hiclite.util.clusters.make_loop_id_for_cluster()
.Parameters: - cluster (set of tuple of int) – The tuples should be (row_index, col_index) tuples specifying which entries of the chromosomal contact matrix belong to this cluster.
- chrom (str) – The chromsome name, e.g. ‘chr21’.
- resolution (int) – The resolution of the contact matrix referred to by cluster.
Returns: The loop id, a string of the form “chr:start-end_chr:start-end”.
Return type: str
Examples
>>> from hic3defdr.util.clusters import cluster_to_loop_id >>> cluster = [(4, 5), (3, 4), (3, 5), (3, 6)] >>> cluster_to_loop_id(cluster, 'chrX', 10000) 'chrX:30000-50000_chrX:40000-70000'
-
hic3defdr.util.clusters.
cluster_to_slices
(cluster, width=21)[source]¶ Computes a square row and column slice of a specified width centered on a given cluster.
Parameters: - cluster (list of tuple) – A list of (i, j) tuples marking the position of significant points which belong to the cluster.
- width (int) – Should be odd. Specifies the side length of the square slice.
Returns: The row and column slice, respectively.
Return type: slice, slice
Examples
>>> from hic3defdr.util.clusters import cluster_to_slices >>> cluster = [(4, 5), (3, 4), (3, 5), (3, 6)] >>> width = 5 >>> slices = cluster_to_slices(cluster, width=width) >>> slices (slice(1, 6, None), slice(3, 8, None)) >>> slices[0].stop - slices[0].start == width True >>> slices[1].stop - slices[1].start == width True
-
hic3defdr.util.clusters.
clusters_to_coo
(clusters, shape)[source]¶ Converts clusters (list of list of tuple) to a COO sparse matrix.
Parameters: - clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster.
- shape (tuple) – The shape with which to construct the COO matrix.
Returns: The sparse matrix of significant points.
Return type: scipy.sparse.coo_matrix
Examples
>>> from hic3defdr.util.clusters import clusters_to_coo >>> coo = clusters_to_coo([[(1, 2), (1, 1)], [(4, 4), (3, 4)]], (5, 5)) >>> coo.toarray() array([[False, False, False, False, False], [False, True, True, False, False], [False, False, False, False, False], [False, False, False, False, True], [False, False, False, False, True]])
-
hic3defdr.util.clusters.
clusters_to_pixel_set
(clusters)[source]¶ Converts a list of clusters to a set of pixels.
This function has no callers and is usually used as a one-liner.
Parameters: clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster. Returns: Each tuple is of the form (i, j) and marks the position of a significant point in the clustering. Return type: set of tuple
-
hic3defdr.util.clusters.
convert_cluster_array_to_sparse
(cluster_array)[source]¶ Converts an array of cluster information to a sparse, JSON-friendly format.
Parameters: cluster_array (np.ndarray or scipy.sparse.spmatrix) – Square, triangular, int dtype. Entries should be the cluster id for points which belong to that cluster, zero everywhere else. Returns: The sets are clusters, tuples are the matrix indices of the pixels in that cluster. Return type: list of sets of tuples of int Notes
Since the introduction of hiclite.util.clusters.find_clusters(), this function is no longer used.
-
hic3defdr.util.clusters.
filter_clusters_by_distance
(clusters, min_dist, max_dist)[source]¶ Filters a list of clusters by distance.
Parameters: - clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster.
- max_dist (min_dist,) – Specify a range of distances in bin units to filter by (inclusive). If
either
min_dist
ormax_dist
is None, the distance bin will be considered unbounded on that end.
Returns: The clusters that are within the distance range requested.
Return type: list of list of tuple
-
hic3defdr.util.clusters.
find_clusters
(sig_points, connectivity=1)[source]¶ Finds clusters of adjacent True points in a boolean matrix.
Parameters: - sig_points (scipy.sparse.spmatrix or np.ndarray) – A boolean matrix indicating which points are significant.
- connectivity (int) – The connectivity to use when clustering.
Returns: The clusters.
Return type: list of set of tuple of int
-
hic3defdr.util.clusters.
hiccups_to_clusters
(hiccups_txt, resolution)[source]¶ Loads HiCCUPS-format loop calls as clusters, approximating each loop as a cluster with just one pixel.
Parameters: - hiccups_txt (str) – The HiCCUPS-format loop call file to load.
- resolution (int) – The resolution to use for the clusters.
Returns: strings. The values are lists of clusters on that chromosome. Each clusters is a list of [x, y] pairs representing the row and column indices of the pixels in that cluster.
Return type: dict of list of clusters The keys of the dict are chromosome names as
-
hic3defdr.util.clusters.
load_clusters
(infile)[source]¶ Loads clusters in a sparse format from a JSON file.
Parameters: infile (str) – The JSON file containing sparse cluster information. Returns: The sets are clusters, the tuples are the indices of entries in that cluster. Return type: list of set of tuple of int
-
hic3defdr.util.clusters.
save_clusters
(clusters, outfile)[source]¶ Saves cluster information to disk in sparse JSON format.
Parameters: - clusters (np.ndarray or list of set of tuple of int) – If an np.ndarray is passed, it should be square and triangular and have int dtype. Entries should be the cluster id for points which belong to that cluster, zero everywhere else. If a list of sets is passed, the sets are clusters, the tuples are the indices of entries in that cluster.
- outfile (str) – File to write JSON output to.