Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply consistent rules about allowed sample names? #53

Open
mikemc opened this issue Sep 11, 2020 · 2 comments
Open

Apply consistent rules about allowed sample names? #53

mikemc opened this issue Sep 11, 2020 · 2 comments
Labels
bug Something isn't working question Further information is requested

Comments

@mikemc
Copy link
Owner

mikemc commented Sep 11, 2020

There seems to be inconsistent handling of sample names by various phyloseq methods. For example,

x <- y <- z <- data.frame(var1 = letters[1:3], var2 = 7:9)
rownames(x) <- c("1", "2", "3")
rownames(y) <- c("s1", "2", "3")
rownames(z) <- c("3", "2", "1")
sample_data(x)
#>     var1 var2
#> sa1    a    7
#> sa2    b    8
#> sa3    c    9
sample_data(y)
#>    var1 var2
#> s1    a    7
#> 2     b    8
#> 3     c    9
sample_data(y) %>% prune_samples(c("2", "3"), .)
#>   var1 var2
#> 2    b    8
#> 3    c    9
sample_data(z)
#>   var1 var2
#> 3    a    7
#> 2    b    8
#> 1    c    9

In addition, some phyloseq functions cause numerical sample names to be prepended with an "X", as would be done by make.names(). This happens in the results of diversity().

@mikemc mikemc added bug Something isn't working question Further information is requested labels Sep 11, 2020
@mikemc
Copy link
Owner Author

mikemc commented Sep 12, 2020

It looks like the reason that only the first case results in dummy sample names is that phyloseq checks if the row names are as.character(1:n), and if so decides that sample names are missing and sets the names to "sa1", "sa2", etc.

See https://github.com/joey711/phyloseq/blob/dc35470498c79284231d41d1add1a74940f51fb7/R/sampleData-class.R#L60

@mikemc
Copy link
Owner Author

mikemc commented Sep 28, 2020

This also pops up when subsetting, e.g.

sam <- tibble::tibble(
  sample_id = c(letters[1:3], 1:3), 
  var = c(rep("a", 3), rep("b", 3)), 
) %>% sample_data
sam
#>   var
#> a   a
#> b   a
#> c   a
#> 1   b
#> 2   b
#> 3   b
sam %>% subset_samples(var == "b")
#>     var
#> sa1   b
#> sa2   b
#> sa3   b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant