Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results of leiden.community not reproducible across OS #11

Closed
MarieOestreich opened this issue Jul 29, 2022 · 10 comments
Closed

results of leiden.community not reproducible across OS #11

MarieOestreich opened this issue Jul 29, 2022 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@MarieOestreich
Copy link

Hi!
I have run leidenAlg::leiden.community() on the exact same graph g and with identical seeds on Windows and on Linux and the results differ. Is there a known reason (and maybe even fix) for this?

Cheers,
Marie

Code:
set.seed(168575)
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)

Expected behaviour: partition is the same when running code on Windows and Linux.
Observed behaviour: partition is different.

@evanbiederstedt
Copy link
Contributor

evanbiederstedt commented Jul 29, 2022 via email

@evanbiederstedt
Copy link
Contributor

Could you try installing the package from this branch? https://github.com/kharchenkolab/leidenAlg/tree/no_cpp

Please check if the problem remains

@evanbiederstedt
Copy link
Contributor

Are you able to see reproducible clusters across OS using igraph::cluster_leiden()? If so, we could try using this.

@evanbiederstedt evanbiederstedt added the help wanted Extra attention is needed label Aug 9, 2022
@MarieOestreich
Copy link
Author

Hi, sorry for the delayed response.

I will attach a file that holds the edges and weights for a graph where I observed this problem.
And here is the code I used to read the file, build an igraph and cluster the graph, yielding different clusters on Windows vs. Linux (also for the igraph::cluster_leiden()):

# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- igraph::graph_from_data_frame(df)

# seed for reproducibility
set.seed(168575)

# Version 1: leidenAlg
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
# partition <- igraph::cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

# get cluster frequencies for easier comparisons between results on Linux and Windows.
clusters_df <- base::data.frame(cluster = base::as.numeric(partition$membership), gene = partition$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))

graph.csv

@evanbiederstedt
Copy link
Contributor

Hi @MarieOestreich

Yes, this is related to #10

If this is an issue for igraph::cluster_leiden(), it's probably best to create an issue here: https://github.com/igraph/rigraph

CC @vtraag @ntamas etc.

@ntamas
Copy link

ntamas commented Aug 11, 2022

@vtraag is on holiday now and he is the one who knows the Leiden algorithm inside and out, but I've taken a cursory glance at the source code in the meantime and made some tests. I can indeed see some nondeterminism for certain graphs and I'll try to find the cause in the next few days. Scratch that, I just forgot to reset the random seed after invoking the algorithm. When I reset the seed, the results seem to be consistent (deterministic) when I am on the same platform. I'll try to test it across different platforms now.

@evanbiederstedt
Copy link
Contributor

Thanks for the help, @ntamas
(Apologies for being so difficult in recent months as well)

@ntamas
Copy link

ntamas commented Aug 12, 2022

@MarieOestreich So, unfortunately I cannot reproduce this in my environment. I tried your graph.csv file with the following code, which is only slightly modified from what you posted, mostly to test whether the results are consistent on a single platform when re-initializing the seed (and note that I am calling igraph's cluster_leiden):

library(igraph)

# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- graph_from_data_frame(df)

# seed for reproducibility
set.seed(168575)

# Version 1: leidenAlg
# partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
p1 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

set.seed(168575)
p2 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

compare(p1, p2)

clusters_df <- base::data.frame(cluster = base::as.numeric(p1$membership), gene = p1$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)

clusters_df <- base::data.frame(cluster = base::as.numeric(p2$membership), gene = p2$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)

I observe the same result in the following environments:

  • R 4.2.0 with igraph 1.3.4 on Windows (64-bit Intel CPU)
  • R 4.2.0 with igraph 1.3.4 on macOS Monterey (Apple M1)
  • R 4.1.2 with igraph 1.3.4 on Ubuntu Linux 20.04 (64-bit Intel CPU)

The summary of the partition is this:

   Var1 Freq
1     1  764
2     2  291
3     3  730
4     4 1290
5     5  287
6     6 1360
7     7  214
8     8  439
9     9  791
10   10  339
11   11  279
12   12  977
13   13   83
14   14   86
15   15    4
16   16  201
17   17   43
18   18   85
19   19   74
20   20   10
21   21    3
22   22    4

Can you let me know the exact R and igraph version that you are using on both platforms and whether it is the official CRAN R or some other R distribution (Anaconda R, Microsoft R Open etc)?

@vtraag
Copy link

vtraag commented Oct 5, 2022

Thanks for covering in my absense @ntamas! In addition to the tests for igraph that @ntamas performed, I can also confirm that the results of the leidenalg implementation in Python also yields identical results on both Linux and Windows (both for this graph and for random graphs).

Given that a seed is also set here in this R interface

Optimiser o( (int) (R::runif(0,1)*(double)RAND_MAX) );

I would assume that the R interface would also yield identical results for both Linux and Windows, whenever a seed it set in R.

Without being able to reproduce the issue it seems difficult to track it down.

@evanbiederstedt
Copy link
Contributor

Thanks everyone. I really appreciate the time invested here @ntamas @vtraag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants