muon.pp.neighbors

Contents

muon.pp.neighbors#

muon.pp.neighbors(mdata: MuData, n_neighbors: int | None = None, n_bandwidth_neighbors: int = 20, n_multineighbors: int = 200, neighbor_keys: Dict[str, str | None] | None = None, metric: Literal['euclidean', 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'wminkowski', 'yule'] = 'euclidean', low_memory: bool | None = None, key_added: str | None = None, weight_key: str | None = 'mod_weight', add_weights_to_modalities: bool = False, eps: float = 0.0001, copy: bool = False, random_state: int | RandomState | None = 42) MuData | None#

Multimodal nearest neighbor search.

This implements the multimodal nearest neighbor method of Hao et al. and Swanson et al. The neighbor search efficiency on this heavily relies on UMAP. In particular, you may want to decrease n_multineighbors for large data set to avoid excessive peak memory use. Note that to achieve results as close as possible to the Seurat implementation, observations must be normalized to unit L2 norm (see l2norm()) prior to running per-modality nearest-neighbor search.

References

Hao et al, 2020 (doi:10.1101/2020.10.12.335331) Swanson et al, 2020 (doi:10.1101/2020.09.04.283887)

Parameters:
  • mdata – MuData object. Per-modality nearest neighbor search must have already been performed for all modalities that are to be used for multimodal nearest neighbor search.

  • n_neighbors – Number of nearest neighbors to find. If None, will be set to the arithmetic mean of per-modality neighbors.

  • n_bandwidth_neighbors – Number of nearest neighbors to use for bandwidth selection.

  • n_multineighbors – Number of nearest neighbors in each modality to consider as candidates for multimodal nearest neighbors. Only points in the union of per-modality nearest neighbors are candidates for multimodal nearest neighbors. This will use the same metric that was used for the nearest neighbor search in the respective modality.

  • neighbor_keys – Keys in .uns where per-modality neighborhood information is stored. Defaults to "neighbors". If set, only the modalities present in neighbor_keys will be used for multimodal nearest neighbor search.

  • metric – Distance measure to use. This will only be used in the final step to search for nearest neighbors in the set of candidates.

  • low_memory – Whether to use the low-memory implementation of nearest-neighbor descent. If not set, will default to True if the data set has more than 50 000 samples.

  • key_added – If not specified, the multimodal neighbors data is stored in .uns["neighbors"], distances and connectivities are stored in .obsp["distances"] and .obsp["connectivities"], respectively. If specified, the neighbors data is added to .uns[key_added], distances are stored in .obsp[key_added + "_distances"] and connectivities in .obsp[key_added + "_connectivities"].

  • weight_key – Weight key to add to each modality’s .obs or to mdata.obs. By default, it is "mod_weight".

  • add_weights_to_modalities – If to add weights to individual modalities. By default, it is False and the weights will be added to mdata.obs.

  • eps – Small number to avoid numerical errors.

  • copy – Return a copy instead of writing to mdata.

  • random_state – Random seed.

Returns: Depending on copy, returns or updates mdata. Cell-modality weights will be stored in

.obs["modality_weight"] separately for each modality.