Utility Functions¶
API¶

cgc.utils.
mem_estimate_coclustering_numpy
(n_rows, n_cols, nclusters_row, nclusters_col, out_unit=None)¶ Estimate the maximum memory usage of cgc.coclustering_numpy, given the matrix size (n_rows, n_cols) and the number of row/column clusters (nclusters_row, nclusters_col).
The estimated memory usage is the sum of the size of all major arrays simultaneously allocated in cgc.coclustering_numpy.coclustering.
Depending on the shape of the data matrix, there are two possible memory peaks, corresponding to either the first or the second call to cgc.coclustering_numpy._distance().
Parameters:  n_rows (int) – Number of rows in the data matrix.
 n_cols (int) – Number of columns in the data matrix.
 nclusters_row (int) – Number of row clusters.
 nclusters_col (int) – Number of column clusters.
 out_unit (str) – Output units, choose between “B”, “KB”, “MB”, “GB”
Returns: Estimated memory usage, unit, peak.
Type: tuple

cgc.utils.
calculate_cocluster_averages
(Z, row_clusters, col_clusters, nclusters_row=None, nclusters_col=None)¶ Calculate the cocluster averages from the data array and the row and columncluster assignments.
Parameters:  Z (numpy.ndarray or dask.array.Array) – Data matrix.
 row_clusters (numpy.ndarray or array_like) – Row cluster assignment.
 col_clusters (numpy.ndarray or array_like) – Column cluster assignment.
 nclusters_row (int, optional) – Number of row clusters. If not provided, it is set as the number of unique elements in row_clusters.
 nclusters_col (int, optional) – Number of column clusters. If not provided, it is set as the number of unique elements in col_clusters.
Returns: Array with cocluster averages, shape (nclusters_row, nclusters_col). Elements corresponding to empty coclusters are set as NaN.
Type: numpy.ndarray

cgc.utils.
calculate_tricluster_averages
(Z, row_clusters, col_clusters, bnd_clusters, nclusters_row=None, nclusters_col=None, nclusters_bnd=None)¶ Calculate the tricluster averages from the data array and the band, row and columncluster assignments.
Parameters:  Z (numpy.ndarray or dask.array.Array) – Data array, with shape (bands, rows, columns).
 row_clusters (numpy.ndarray or array_like) – Row cluster assignment.
 col_clusters (numpy.ndarray or array_like) – Column cluster assignment.
 bnd_clusters (numpy.ndarray or array_like) – Band cluster assignment.
 nclusters_row (int, optional) – Number of row clusters. If not provided, it is set as the number of unique elements in row_clusters.
 nclusters_col (int, optional) – Number of column clusters. If not provided, it is set as the number of unique elements in col_clusters.
 nclusters_bnd (int, optional) – Number of band clusters. If not provided, it is set as the number of unique elements in col_clusters.
Returns: Array with tricluster averages, shape (nclusters_bnd, nclusters_row, nclusters_col). Elements corresponding to empty triclusters are set as NaN.
Type: numpy.ndarray

cgc.utils.
calculate_cluster_feature
(Z, function, clusters, nclusters=None, **kwargs)¶ Calculate features for clusters. This function works in N dimensions (N=2, 3, …) making it suitable to calculate features for both coclusters and triclusters.
Parameters:  Z (numpy.ndarray or dask.array.Array) – Data array (N dimensions).
 function (Callable) – Function to run over the cluster elements to calculate the desidered feature. Should take as an input a Ndimensional array and return a scalar.
 clusters (tuple, list, or numpy.ndarray) – Iterable with length N. It should contain the cluster labels for each dimension, following the same ordering as for Z
 nclusters (tuple, list, or numpy.ndarray, optional) – Iterable with length N. It should contains the number of clusters in each dimension, following the same ordering as for Z. If not provided, it is set as the number of unique elements in each dimension.
 kwargs (dict, optional) – keyword arguments to be passed to the input function together with the input data array for each cluster
Returns: the desired feature is computed for each cluster and added to an array with N dimensions. It has dimension N and shape equal to nclusters.
Type: numpy.ndarray