hic3defdr.util.clusters module

class hic3defdr.util.clusters.DirectedDisjointSet[source]

Bases: object

Based on https://stackoverflow.com/a/3067672 but supporting directed edges.

The overall effect is like a directed sparse graph - DDS.add(a, b) is like adding an edge from a to b. a gets marked as a source, b does not (anything not in the set DDS.sources is assumed to be a destination). If b is in an existing group, but isn’t also the source of any other edge, then the groups won’t be merged. Finally, the groups returned by DDS.get_groups() will be filtered to include only source nodes.

This is an “improved” or “streamlined” version where destination nodes are not stored anywhere if they haven’t previously been seen as a source.

add(a, b)[source]
get_groups()[source]
class hic3defdr.util.clusters.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.encoder.JSONEncoder

Pass this to json.dump() to correctly serialize numpy values.

Credit: https://stackoverflow.com/a/27050186

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
hic3defdr.util.clusters.cluster_from_string(cluster_string)[source]

If a cluster gets converted to a string (e.g., when the cluster is written to a text file), this function allows you to recover the cluster as a normal Python object (a list of pairs of integers).

Parameters:cluster_string (str) – The string representation of the cluster.
Returns:The inner lists are pairs of integers specifying the row and column indices of the pixels in the cluster.
Return type:list of list of int

Examples

>>> from hic3defdr.util.clusters import cluster_from_string
>>> cluster = [(4, 5), (3, 4), (3, 5), (3, 6)]
>>> cluster_string = str(cluster)
>>> cluster_string
'[(4, 5), (3, 4), (3, 5), (3, 6)]'
>>> cluster_from_string(cluster_string)
[[4, 5], [3, 4], [3, 5], [3, 6]]
hic3defdr.util.clusters.cluster_to_loop_id(cluster, chrom, resolution)[source]

Makes a cluster into a loop id of the form “chr:start-end_chr:start-end”.

This is a copy of hiclite.util.clusters.make_loop_id_for_cluster().

Parameters:
  • cluster (set of tuple of int) – The tuples should be (row_index, col_index) tuples specifying which entries of the chromosomal contact matrix belong to this cluster.
  • chrom (str) – The chromsome name, e.g. ‘chr21’.
  • resolution (int) – The resolution of the contact matrix referred to by cluster.
Returns:

The loop id, a string of the form “chr:start-end_chr:start-end”.

Return type:

str

Examples

>>> from hic3defdr.util.clusters import cluster_to_loop_id
>>> cluster = [(4, 5),  (3, 4), (3, 5), (3, 6)]
>>> cluster_to_loop_id(cluster, 'chrX', 10000)
'chrX:30000-50000_chrX:40000-70000'
hic3defdr.util.clusters.cluster_to_slices(cluster, width=21)[source]

Computes a square row and column slice of a specified width centered on a given cluster.

Parameters:
  • cluster (list of tuple) – A list of (i, j) tuples marking the position of significant points which belong to the cluster.
  • width (int) – Should be odd. Specifies the side length of the square slice.
Returns:

The row and column slice, respectively.

Return type:

slice, slice

Examples

>>> from hic3defdr.util.clusters import cluster_to_slices
>>> cluster = [(4, 5),  (3, 4), (3, 5), (3, 6)]
>>> width = 5
>>> slices = cluster_to_slices(cluster, width=width)
>>> slices
(slice(1, 6, None), slice(3, 8, None))
>>> slices[0].stop - slices[0].start == width
True
>>> slices[1].stop - slices[1].start == width
True
hic3defdr.util.clusters.clusters_to_coo(clusters, shape)[source]

Converts clusters (list of list of tuple) to a COO sparse matrix.

Parameters:
  • clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster.
  • shape (tuple) – The shape with which to construct the COO matrix.
Returns:

The sparse matrix of significant points.

Return type:

scipy.sparse.coo_matrix

Examples

>>> from hic3defdr.util.clusters import clusters_to_coo
>>> coo = clusters_to_coo([[(1, 2), (1, 1)], [(4, 4),  (3, 4)]], (5, 5))
>>> coo.toarray()
array([[False, False, False, False, False],
       [False,  True,  True, False, False],
       [False, False, False, False, False],
       [False, False, False, False,  True],
       [False, False, False, False,  True]])
hic3defdr.util.clusters.clusters_to_pixel_set(clusters)[source]

Converts a list of clusters to a set of pixels.

This function has no callers and is usually used as a one-liner.

Parameters:clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster.
Returns:Each tuple is of the form (i, j) and marks the position of a significant point in the clustering.
Return type:set of tuple
hic3defdr.util.clusters.convert_cluster_array_to_sparse(cluster_array)[source]

Converts an array of cluster information to a sparse, JSON-friendly format.

Parameters:cluster_array (np.ndarray or scipy.sparse.spmatrix) – Square, triangular, int dtype. Entries should be the cluster id for points which belong to that cluster, zero everywhere else.
Returns:The sets are clusters, tuples are the matrix indices of the pixels in that cluster.
Return type:list of sets of tuples of int

Notes

Since the introduction of hiclite.util.clusters.find_clusters(), this function is no longer used.

hic3defdr.util.clusters.filter_clusters_by_distance(clusters, min_dist, max_dist)[source]

Filters a list of clusters by distance.

Parameters:
  • clusters (list of list of tuple) – The outer list is a list of clusters. Each cluster is a list of (i, j) tuples marking the position of significant points which belong to that cluster.
  • max_dist (min_dist,) – Specify a range of distances in bin units to filter by (inclusive). If either min_dist or max_dist is None, the distance bin will be considered unbounded on that end.
Returns:

The clusters that are within the distance range requested.

Return type:

list of list of tuple

hic3defdr.util.clusters.find_clusters(sig_points, connectivity=1)[source]

Finds clusters of adjacent True points in a boolean matrix.

Parameters:
  • sig_points (scipy.sparse.spmatrix or np.ndarray) – A boolean matrix indicating which points are significant.
  • connectivity (int) – The connectivity to use when clustering.
Returns:

The clusters.

Return type:

list of set of tuple of int

hic3defdr.util.clusters.hiccups_to_clusters(hiccups_txt, resolution)[source]

Loads HiCCUPS-format loop calls as clusters, approximating each loop as a cluster with just one pixel.

Parameters:
  • hiccups_txt (str) – The HiCCUPS-format loop call file to load.
  • resolution (int) – The resolution to use for the clusters.
Returns:

strings. The values are lists of clusters on that chromosome. Each clusters is a list of [x, y] pairs representing the row and column indices of the pixels in that cluster.

Return type:

dict of list of clusters The keys of the dict are chromosome names as

hic3defdr.util.clusters.load_clusters(infile)[source]

Loads clusters in a sparse format from a JSON file.

Parameters:infile (str) – The JSON file containing sparse cluster information.
Returns:The sets are clusters, the tuples are the indices of entries in that cluster.
Return type:list of set of tuple of int
hic3defdr.util.clusters.save_clusters(clusters, outfile)[source]

Saves cluster information to disk in sparse JSON format.

Parameters:
  • clusters (np.ndarray or list of set of tuple of int) – If an np.ndarray is passed, it should be square and triangular and have int dtype. Entries should be the cluster id for points which belong to that cluster, zero everywhere else. If a list of sets is passed, the sets are clusters, the tuples are the indices of entries in that cluster.
  • outfile (str) – File to write JSON output to.