NEWS
tglkmeans 0.6.4
- Fix:
match_clusters() mapped only one cluster (leaving the rest NA) unless
all per-cluster overlap counts tied; mapping is now per cluster.
- Fix:
hclust_intra_clusters = TRUE errored when the id column was not named
"id" (e.g. a named or auto-detected id column).
- Fix:
predict_tgl_kmeans() crashed (pearson/spearman) or silently used the
first center (euclid) for observations with no overlap; these now return NA.
- Removed the unused
future/doFuture dependency; the package no longer
overrides the user's future plan on load.
- No longer prints internal debugging;
verbose = TRUE shows concise
per-iteration progress. Default threads cap at 2 under R CMD check.
- Performance: k-means++ seeding uses
nth_element instead of a full sort per
seed. Cluster assignments are unchanged.
tglkmeans 0.6.3
- Fix:
metric = "spearman" ignored missing values. Ranking tested the wrong
missing-value sentinel, so NAs were ranked as the largest value and included
in the rank correlation instead of being dropped. Spearman now excludes missing
values pairwise, matching euclid/pearson. Clustering results for Spearman on
data with NAs change (and are now correct); results on complete data are
unchanged.
- Fix:
predict_tgl_kmeans() with metric = "euclid" used a plain Euclidean
distance, which disagreed with the training metric sqrt(sum_sq) / n when a
cluster center had a missing dimension. Prediction now reproduces the training
distance exactly. Predictions on data whose centers have no missing dimensions
are unchanged.
- Performance: removed the dense
k x n per-thread vote matrix in the
reassignment step. Memory and per-iteration work no longer scale with the
number of clusters; cluster assignments are unchanged.
- Performance:
metric = "spearman" no longer computes a discarded p-value on
every point-to-center comparison, and uses a contiguous sort buffer instead
of a linked list. Results are unchanged.
- Performance:
predict_tgl_kmeans() uses smaller internal chunks, cutting
redundant distance/correlation computation on large inputs. Results are
unchanged.
- Fix:
hclust_intra_clusters = TRUE returned a scrambled within-cluster
ordering. The order/intra_clust_order columns now follow the hclust
dendrogram leaf order as documented.
- Removed unused internal code (
reduce_coclust/reduce_num_trials and dead
rank-sum / incomplete-beta helpers).
tglkmeans 0.6.2
- Fix:
predict_tgl_kmeans() crashed with 'from' contains NAs / NAs introduced by coercion to integer range on inputs of ~46K rows or more. The one-shot as.matrix(tgs_dist(.)) overflowed integer indexing inside stats:::as.matrix.dist. The prediction now processes observations in chunks (#21).
tglkmeans 0.6.1 (2026-03-04)
- Added
predict_tgl_kmeans() function to assign new observations to existing k-means cluster centers (#5).
- Exported
match_clusters() and test_clustering() functions.
- Auto-detect character/factor first column as ID column.
- Parallelized k-means initialization.
- Fix: Pearson distance was not negated, causing incorrect cluster assignments when using
metric = "pearson".
- Fix: memory leak in k-means core.
- Fix: k-means seeding crash when
k is large relative to data size.
- Fix: race condition in parallel workers.
- Fix: package failed to load on machines where
detectCores() returns NA.
- Fix:
downsample_matrix used identical random seed for all columns.
- Removed
plyr dependency. Moved ggplot2 from Imports to Suggests.
tglkmeans 0.5.8 (2026-01-14)
- Fix: Registered "id" as a global variable to maintain compatibility with future versions of dplyr (addressing the removal of
dplyr::id()).
tglkmeans 0.5.7
- Bug fix: crashed on some machines when
id_column=TRUE and data had a single column.
tglkmeans 0.5.6
- Removed parallelization for
hclust_intra_clusters - it was causing hangs in some systems. The parallel parameter was removed from TGL_kmeans and TGL_kmeans_tidy.
tglkmeans 0.5.5 (2024-05-15)
- Fix: clustering crashed when
hclust_intra_clusters was TRUE and input was a matrix.
tglkmeans 0.5.4 (2024-01-09)
- Fixed usage of more than 2 cores when testing on CRAN.
tglkmeans 0.5.3
- Fix: colnames and rownames were removed in
downsample_matrix function.
tglkmeans 0.5.2
tglkmeans 0.5.1
- Fix:
cluster slot ids were corrupted when data was a tibble and id_column was TRUE.
- Fix: ids were not used when
id_column was FALSE and data had rownames.
tglkmeans 0.5.0
- Added
dowsample_matrix function to downsample the columns of a count matrix to a target number.
tglkmeans 0.4.0
- Default of
id_column parameter was changed to FALSE. Note that this is a breaking change, and if you want to use an id column, you need to set it explicitly to TRUE.
- Use R random number generator instead of C++11 random number generator. For backwards compatibility, the old random number generator can be used by setting
use_cpp_random to TRUE.
- Added parallelization using
RcppParallel.
tglkmeans 0.3.12
- Added validity checks for
k and the number of observations.
tglkmeans 0.3.11 (2023-08-21)
- Changed pkgdoc, see: https://github.com/r-lib/roxygen2/issues/1491.
tglkmeans 0.3.10 (2023-06-26)
- Removed broken link to one of the references in the description.
tglkmeans 0.3.9
- Remove empty clusters. This may happen when the number of clusters is larger than the number of observations, and currently caused an error in the reordering step.
tglkmeans 0.3.8 (2023-03-21)
- Removed C++11 specification + require R >= 4.0.0.
tglkmeans 0.3.6
- Fixed error on debian systems.
tglkmeans 0.3.5 (2022-08-28)
- Changed errors from cpp to 1 based indexing.
- fix: loading the package failed on machines with a single core.
tglkmeans 0.3.4 (2022-04-20)
tglkmeans 0.3.3
- Set NA values to zeros in correlation matrix when reordering clusters
(avoid crashing on some datasets with NA's in the
dist object)
tglkmeans 0.3.1
- Use rownames when exist.
- Do not fail when "id" column doesn't exist (warn instead).
tglkmeans 0.3.0
- Removed bootstrapping (it was causing a lot of problems in travis testing and almost wasn't used).
- Added a
NEWS.md file to track changes to the package.