Title: | Slanted Matrices and Ordered Clustering |
---|---|
Description: | Slanted matrices and ordered clustering for better visualization of similarity data. |
Authors: | Oren Ben-Kiki [aut, cre], Weizmann Institute of Science [cph] |
Maintainer: | Oren Ben-Kiki <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2-0 |
Built: | 2025-01-30 06:23:32 UTC |
Source: | https://github.com/tanaylab/slanter |
This is a simple matrix where each entry is the similarity (correlation) between a pair of batches. Negative correlations were changed to zero to simplify the analysis.
data(meristems)
data(meristems)
A simple square matrix.
data(meristems) similarity <- meristems similarity[similarity < 0] = 0 slanter::sheatmap(meristems, order_data=similarity, show_rownames=FALSE, show_colnames=FALSE)
data(meristems) similarity <- meristems similarity[similarity < 0] = 0 slanter::sheatmap(meristems, order_data=similarity, show_rownames=FALSE, show_colnames=FALSE)
Given a distance matrix for sorted objects, compute a hierarchical clustering preserving this
order. That is, this is similar to hclust
with the constraint that the result's order is
always 1:N
.
oclust(distances, method = "ward.D2", order = NULL, members = NULL)
oclust(distances, method = "ward.D2", order = NULL, members = NULL)
distances |
A distances object (as created by |
method |
The clustering method to use (only |
order |
If specified, assume the data will be re-ordered by this order. |
members |
Optionally, the number of members for each row/column of the distances (by default, one each). |
If an order
is specified, assumes that the data will be re-ordered by this order. That is,
the indices in the returned hclust
object will refer to the post-reorder data locations,
**not** to the current data locations.
This can be applied to the results of slanted_reorder
, to give a "plausible"
clustering for the data.
A clustering object (as created by hclust
).
clusters <- slanter::oclust(dist(mtcars), order=1:dim(mtcars)[1]) clusters$order
clusters <- slanter::oclust(dist(mtcars), order=1:dim(mtcars)[1]) clusters$order
You'd expect data[order,]
to "just work". It doesn't for data frames with a single column,
which happens for annotation data, hence the need for this function. Sigh.
reorder_frame(frame, order)
reorder_frame(frame, order)
frame |
A data frame to reorder the rows of. |
order |
An array containing indices permutation to apply to the rows. |
The data frame with the new row orders.
df <- data.frame(foo=c(1, 2, 3)) df[c(1,3,2),] slanter::reorder_frame(df, c(1,3,2))
df <- data.frame(foo=c(1, 2, 3)) df[c(1,3,2),] slanter::reorder_frame(df, c(1,3,2))
Given a clustering of some data, and some ideal order we'd like to use to visualize it, reorder (but do not modify) the clustering to be as consistent as possible with this ideal order.
reorder_hclust(clusters, order)
reorder_hclust(clusters, order)
clusters |
The existing clustering of the data. |
order |
The ideal order we'd like to see the data in. |
A reordered clustering which is consistent, wherever possible, the ideal order.
clusters <- hclust(dist(mtcars)) clusters$order clusters <- slanter::reorder_hclust(clusters, 1:length(clusters$order)) clusters$order
clusters <- hclust(dist(mtcars)) clusters$order clusters <- slanter::reorder_hclust(clusters, 1:length(clusters$order)) clusters$order
Given a matrix expressing the cross-similarity between two (possibly different) sets of entities, this will reorder it to move the high values close to the diagonal, for a better visualization.
sheatmap( data, ..., order_data = NULL, annotation_col = NULL, annotation_row = NULL, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, patch_cols_order = NULL, patch_rows_order = NULL, discount_outliers = TRUE, cluster_rows = TRUE, cluster_cols = TRUE, oclust_rows = TRUE, oclust_cols = TRUE, clustering_distance_rows = "euclidian", clustering_distance_cols = "euclidian", clustering_method = "ward.D2", clustering_callback = NA )
sheatmap( data, ..., order_data = NULL, annotation_col = NULL, annotation_row = NULL, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, patch_cols_order = NULL, patch_rows_order = NULL, discount_outliers = TRUE, cluster_rows = TRUE, cluster_cols = TRUE, oclust_rows = TRUE, oclust_cols = TRUE, clustering_distance_rows = "euclidian", clustering_distance_cols = "euclidian", clustering_method = "ward.D2", clustering_callback = NA )
data |
A rectangular matrix to plot, of non-negative values (unless |
... |
Additional flags to pass to |
order_data |
An optional matrix of non-negative values of the same size to use for computing the orders. |
annotation_col |
Optional data frame describing each column. |
annotation_row |
Optional data frame describing each row. |
order_rows |
Whether to reorder the rows. Otherwise, use the current order. |
order_cols |
Whether to reorder the columns. Otherwise, use the current order. |
squared_order |
Whether to reorder to minimize the l2 norm (otherwise minimizes the l1 norm). |
same_order |
Whether to apply the same order to both rows and columns (if reordering both). For a square matrix, may also contain 'row' or 'column' to force the order of one axis to apply to both. |
patch_cols_order |
Optional function that may be applied to the columns order, returning a better order. |
patch_rows_order |
Optional function that may be applied to the rows order, returning a better order. |
discount_outliers |
Whether to do a final order phase discounting outlier values far from the diagonal. |
cluster_rows |
Whether to cluster the rows, or the clustering to use. |
cluster_cols |
Whether to cluster the columns, or the clustering to use. |
oclust_rows |
Whether to use |
oclust_cols |
Whether to use |
clustering_distance_rows |
The default method for computing row distances (by default,
|
clustering_distance_cols |
The default method for computing column distances (by default,
|
clustering_method |
The default method to use for hierarchical clustering (by default,
|
clustering_callback |
Is not supported. |
If you have an a-priori order for the rows and/or columns, you can prevent reordering either or
both by specifying order_rows=FALSE
and/or order_cols=FALSE
. Otherwise,
slanted_orders
is used to compute the "ideal" slanted order for the data.
By default, the rows and columns are ordered independently from each other. If the matrix is
asymmetric but square (e.g., a matrix of weights of a directed graph such as a
K-nearest-neighbors graph), then you can can specify same_order=TRUE
to force both rows
and columns to the same order. You can also specify same_order='row'
to force the columns
to use the same order as the rows, or same_order='column'
to force the rows to use the
same order as the columns.
You can also specify a patch_cols_order
and/or a 'patch_rows_order
' function that
takes the computed "ideal" order and returns a patched order. For example, this can be used to
force special values (such as "outliers") to the side of the heatmap.
There are four options for controlling clustering:
* By default, sheatmap
will generate a clustering tree using oclust
, to generate
the "best" clustering that is also compatible with the slanted order.
* Request that sheatmap
will use the same hclust
as
pheatmap
(e.g., oclust_rows=FALSE
). In this case, the tree is reordered to
be the "most compatible" with the target slanted order. That is, sheatmap
will invoke
reorder_hclust
so that, for each node of the tree, the order of the two sub-trees will
be chosen to best match the target slanted order. The end result need not be identical to the
slanted order, but is as close as possible given the hclust
clustering tree.
* Specify an explicit clustering (e.g., cluster_rows=hclust(...)
). In this case,
sheatmap
will again merely reorder the tree but will not modify it.
In addition, you can give this function any of the pheatmap
flags, and it will just pass
them on. This allows full control over the diagram's features.
Note that clustering_callback
is not supported. In addition, the default
clustering_method
here is ward.D2
instead of complete
, since the only
methods supported by oclust
are ward.D
and ward.D2
.
Whatever pheatmap
returns.
slanter::sheatmap(cor(t(mtcars))) slanter::sheatmap(cor(t(mtcars)), oclust_rows=FALSE, oclust_cols=FALSE) pheatmap::pheatmap(cor(t(mtcars)))
slanter::sheatmap(cor(t(mtcars))) slanter::sheatmap(cor(t(mtcars)), oclust_rows=FALSE, oclust_cols=FALSE) pheatmap::pheatmap(cor(t(mtcars)))
For a matrix expressing the cross-similarity between two (possibly different) sets of entities,
this produces better results than clustering (e.g. as done by pheatmap
). This is because
clustering does not care about the order of each two sub-partitions. That is, clustering is as
happy with ((2, 1), (4, 3))
as it is with the more sensible ((1, 2), (3, 4))
. As a
result, visualizations of similarities using naive clustering can be misleading.
slanted_orders( data, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, discount_outliers = TRUE, max_spin_count = 10 )
slanted_orders( data, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, discount_outliers = TRUE, max_spin_count = 10 )
data |
A rectangular matrix containing non-negative values. |
order_rows |
Whether to reorder the rows. |
order_cols |
Whether to reorder the columns. |
squared_order |
Whether to reorder to minimize the l2 norm (otherwise minimizes the l1 norm). |
same_order |
Whether to apply the same order to both rows and columns. |
discount_outliers |
Whether to do a final order phase discounting outlier values far from the diagonal. |
max_spin_count |
How many times to retry improving the solution before giving up. |
A list with two keys, rows
and cols
, which contain the order.
slanter::slanted_orders(cor(t(mtcars)))
slanter::slanted_orders(cor(t(mtcars)))
Given a matrix expressing the cross-similarity between two (possibly different) sets of entities,
this uses slanted_orders
to compute the "best" order for visualizing the matrix, then
returns the reordered data. Commonly used in pheatmap(slanted_reorder(data), ...)
, and of
course sheatmap
does this internally for you.
slanted_reorder( data, order_data = NULL, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, discount_outliers = TRUE )
slanted_reorder( data, order_data = NULL, order_rows = TRUE, order_cols = TRUE, squared_order = TRUE, same_order = FALSE, discount_outliers = TRUE )
data |
A rectangular matrix to reorder, of non-negative values (unless |
order_data |
An optional matrix of non-negative values of the same size to use for computing the orders. |
order_rows |
Whether to reorder the rows. |
order_cols |
Whether to reorder the columns. |
squared_order |
Whether to reorder to minimize the l2 norm (otherwise minimizes the l1 norm). |
same_order |
Whether to apply the same order to both rows and columns. |
discount_outliers |
Whether to do a final order phase discounting outlier values far from the diagonal. |
A matrix of the same shape whose rows and columns are a permutation of the input.
slanter::slanted_reorder(cor(t(mtcars)))
slanter::slanted_reorder(cor(t(mtcars)))