Apply consistent rules about allowed sample names? #53

mikemc · 2020-09-11T18:04:18Z

There seems to be inconsistent handling of sample names by various phyloseq methods. For example,

x <- y <- z <- data.frame(var1 = letters[1:3], var2 = 7:9)
rownames(x) <- c("1", "2", "3")
rownames(y) <- c("s1", "2", "3")
rownames(z) <- c("3", "2", "1")
sample_data(x)
#>     var1 var2
#> sa1    a    7
#> sa2    b    8
#> sa3    c    9
sample_data(y)
#>    var1 var2
#> s1    a    7
#> 2     b    8
#> 3     c    9
sample_data(y) %>% prune_samples(c("2", "3"), .)
#>   var1 var2
#> 2    b    8
#> 3    c    9
sample_data(z)
#>   var1 var2
#> 3    a    7
#> 2    b    8
#> 1    c    9

In addition, some phyloseq functions cause numerical sample names to be prepended with an "X", as would be done by make.names(). This happens in the results of diversity().

The text was updated successfully, but these errors were encountered:

mikemc · 2020-09-12T18:24:58Z

It looks like the reason that only the first case results in dummy sample names is that phyloseq checks if the row names are as.character(1:n), and if so decides that sample names are missing and sets the names to "sa1", "sa2", etc.

See https://github.com/joey711/phyloseq/blob/dc35470498c79284231d41d1add1a74940f51fb7/R/sampleData-class.R#L60

mikemc · 2020-09-28T22:17:41Z

This also pops up when subsetting, e.g.

sam <- tibble::tibble(
  sample_id = c(letters[1:3], 1:3), 
  var = c(rep("a", 3), rep("b", 3)), 
) %>% sample_data
sam
#>   var
#> a   a
#> b   a
#> c   a
#> 1   b
#> 2   b
#> 3   b
sam %>% subset_samples(var == "b")
#>     var
#> sa1   b
#> sa2   b
#> sa3   b

mikemc added bug question labels Sep 11, 2020

mikemc mentioned this issue Sep 12, 2020

merge_samples2() fails when group values are numbers #52

Closed

mikemc mentioned this issue Oct 16, 2020

Revamp constructors #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply consistent rules about allowed sample names? #53

Apply consistent rules about allowed sample names? #53

mikemc commented Sep 11, 2020 •

edited

Loading

mikemc commented Sep 12, 2020 •

edited

Loading

mikemc commented Sep 28, 2020

Apply consistent rules about allowed sample names? #53

Apply consistent rules about allowed sample names? #53

Comments

mikemc commented Sep 11, 2020 • edited Loading

mikemc commented Sep 12, 2020 • edited Loading

mikemc commented Sep 28, 2020

mikemc commented Sep 11, 2020 •

edited

Loading

mikemc commented Sep 12, 2020 •

edited

Loading