results of leiden.community not reproducible across OS #11

MarieOestreich · 2022-07-29T08:41:30Z

Hi!
I have run leidenAlg::leiden.community() on the exact same graph g and with identical seeds on Windows and on Linux and the results differ. Is there a known reason (and maybe even fix) for this?

Cheers,
Marie

Code:
set.seed(168575)
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)

Expected behaviour: partition is the same when running code on Windows and Linux.
Observed behaviour: partition is different.

The text was updated successfully, but these errors were encountered:

evanbiederstedt · 2022-07-29T11:09:45Z

Do you have a graph I could use to reproduce this behavior? Otherwise, I can’t fix it

…

On Fri, Jul 29, 2022 at 04:41 Marie Oestreich ***@***.***> wrote: Hi! I have run leidenAlg::leiden.community() on the exact same graph g and with identical seeds on Windows and on Linux and the results differ. Is there a known reason (and maybe even fix) for this? Cheers, Marie Code: set.seed(168575) partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50) Expected behaviour: partition is the same when running code on Windows and Linux. Observed behaviour: partition is different. — Reply to this email directly, view it on GitHub <#11>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVFHALUKLBN47FO74YZXZTVWOKMLANCNFSM5477L4XQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

evanbiederstedt · 2022-08-05T20:05:11Z

Could you try installing the package from this branch? https://github.com/kharchenkolab/leidenAlg/tree/no_cpp

Please check if the problem remains

evanbiederstedt · 2022-08-05T20:34:00Z

Are you able to see reproducible clusters across OS using igraph::cluster_leiden()? If so, we could try using this.

MarieOestreich · 2022-08-10T13:40:50Z

Hi, sorry for the delayed response.

I will attach a file that holds the edges and weights for a graph where I observed this problem.
And here is the code I used to read the file, build an igraph and cluster the graph, yielding different clusters on Windows vs. Linux (also for the igraph::cluster_leiden()):

# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- igraph::graph_from_data_frame(df)

# seed for reproducibility
set.seed(168575)

# Version 1: leidenAlg
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
# partition <- igraph::cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

# get cluster frequencies for easier comparisons between results on Linux and Windows.
clusters_df <- base::data.frame(cluster = base::as.numeric(partition$membership), gene = partition$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))

graph.csv

evanbiederstedt · 2022-08-10T13:59:49Z

Hi @MarieOestreich

Yes, this is related to #10

If this is an issue for igraph::cluster_leiden(), it's probably best to create an issue here: https://github.com/igraph/rigraph

CC @vtraag @ntamas etc.

ntamas · 2022-08-11T21:34:53Z

@vtraag is on holiday now and he is the one who knows the Leiden algorithm inside and out, but I've taken a cursory glance at the source code in the meantime and made some tests. ~~I can indeed see some nondeterminism for certain graphs and I'll try to find the cause in the next few days.~~ Scratch that, I just forgot to reset the random seed after invoking the algorithm. When I reset the seed, the results seem to be consistent (deterministic) when I am on the same platform. I'll try to test it across different platforms now.

evanbiederstedt · 2022-08-12T01:18:48Z

Thanks for the help, @ntamas
(Apologies for being so difficult in recent months as well)

ntamas · 2022-08-12T09:46:11Z

@MarieOestreich So, unfortunately I cannot reproduce this in my environment. I tried your graph.csv file with the following code, which is only slightly modified from what you posted, mostly to test whether the results are consistent on a single platform when re-initializing the seed (and note that I am calling igraph's cluster_leiden):

library(igraph)

# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- graph_from_data_frame(df)

# seed for reproducibility
set.seed(168575)

# Version 1: leidenAlg
# partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
p1 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

set.seed(168575)
p2 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')

compare(p1, p2)

clusters_df <- base::data.frame(cluster = base::as.numeric(p1$membership), gene = p1$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)

clusters_df <- base::data.frame(cluster = base::as.numeric(p2$membership), gene = p2$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)

I observe the same result in the following environments:

R 4.2.0 with igraph 1.3.4 on Windows (64-bit Intel CPU)
R 4.2.0 with igraph 1.3.4 on macOS Monterey (Apple M1)
R 4.1.2 with igraph 1.3.4 on Ubuntu Linux 20.04 (64-bit Intel CPU)

The summary of the partition is this:

   Var1 Freq
1     1  764
2     2  291
3     3  730
4     4 1290
5     5  287
6     6 1360
7     7  214
8     8  439
9     9  791
10   10  339
11   11  279
12   12  977
13   13   83
14   14   86
15   15    4
16   16  201
17   17   43
18   18   85
19   19   74
20   20   10
21   21    3
22   22    4

Can you let me know the exact R and igraph version that you are using on both platforms and whether it is the official CRAN R or some other R distribution (Anaconda R, Microsoft R Open etc)?

vtraag · 2022-10-05T18:05:32Z

Thanks for covering in my absense @ntamas! In addition to the tests for igraph that @ntamas performed, I can also confirm that the results of the leidenalg implementation in Python also yields identical results on both Linux and Windows (both for this graph and for random graphs).

Given that a seed is also set here in this R interface

leidenAlg/src/leiden.cpp

Line 70 in e0eeef6

Optimiser o( (int) (R::runif(0,1)*(double)RAND_MAX) );

I would assume that the R interface would also yield identical results for both Linux and Windows, whenever a seed it set in R.

Without being able to reproduce the issue it seems difficult to track it down.

evanbiederstedt · 2022-10-05T19:34:02Z

Thanks everyone. I really appreciate the time invested here @ntamas @vtraag

evanbiederstedt added the help wanted Extra attention is needed label Aug 9, 2022

evanbiederstedt closed this as completed Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results of leiden.community not reproducible across OS #11

results of leiden.community not reproducible across OS #11

MarieOestreich commented Jul 29, 2022

evanbiederstedt commented Jul 29, 2022 via email

evanbiederstedt commented Aug 5, 2022

evanbiederstedt commented Aug 5, 2022

MarieOestreich commented Aug 10, 2022

evanbiederstedt commented Aug 10, 2022

ntamas commented Aug 11, 2022 •

edited

Loading

evanbiederstedt commented Aug 12, 2022

ntamas commented Aug 12, 2022

vtraag commented Oct 5, 2022

evanbiederstedt commented Oct 5, 2022

results of leiden.community not reproducible across OS #11

results of leiden.community not reproducible across OS #11

Comments

MarieOestreich commented Jul 29, 2022

evanbiederstedt commented Jul 29, 2022 via email

evanbiederstedt commented Aug 5, 2022

evanbiederstedt commented Aug 5, 2022

MarieOestreich commented Aug 10, 2022

evanbiederstedt commented Aug 10, 2022

ntamas commented Aug 11, 2022 • edited Loading

evanbiederstedt commented Aug 12, 2022

ntamas commented Aug 12, 2022

vtraag commented Oct 5, 2022

evanbiederstedt commented Oct 5, 2022

ntamas commented Aug 11, 2022 •

edited

Loading