Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

Open
smarter opened this issue Dec 2, 2019 · 3 comments

Comments

@smarter
Copy link
Member

smarter commented Dec 2, 2019

Corresponding scalac issue: scala/scala-dev#405

A while ago @retronym did some amazing work making scalac reproducible, we're still lagging behind that. I've recently opened #7620 which slightly improves things but there's more to do, Jason sent me a mail a while back with more details:

I flushed out the problems by first crafting some test cases
https://github.com/retronym/scalac-stability/blob/master/README.md that I
figured would be unstable, and then by running a script
https://github.com/retronym/scalac-stability/blob/master/scalac-stability-check.sh
to try compile, reverse compile, file-by-file compile and diffing classes
after each step. We recently wrote a the jardiff
https://github.com/scala/jardiff tool to make diffing JARs pleasant.

Here's my work in progress
https://github.com/scala/scala/compare/2.13.x...retronym:topic/stable-output?expand=1.
Following are the areas that needed attention.

Fresh names generated during typechecking should not use compilation-unit
scoped counters, otherwise the assignment can change with different orders
of type completion. Instead, I'm use a separate fresh name scope from the
closest enclosing, user-written (ie, not macro generated) method . I was
mindful to keep names distinct within blocks of macro expanded code.

Parameter alias computation across multiple levels of inheritance fail to
detect an alias due to a data race in the calls to setAlias and alias. I've
tried to move some of the computation post typer, once we know that all of
the base classes have been through typer and had the super constructor
calls analyzed.

Mixed in outer accessors were incorrectly taking the position of the mixed
in member, rather than using that of the subclass. This showed up an
instability because the position of the mixed in member was only available
in joint compilation.

Default getters could be added into scope based on the order that the
related methods were type completed. I've changed namers to eagerly create
and enter symbols for the default getters when the DefDef is entered. When
the DefDef is completed, Namers finishes the job of assigning types to the
default getter and handing the their bodies to Typer to be spliced into the
tree. Conceptually, this is like having a phase between parser and namer
that adds stub DefDefs for the default getters.

Lambdalift was using Symbol.id as part of the a sort key
https://github.com/scala/scala/blob/8eb67ed9db83879a536a558d688e3c1009656de5/src/compiler/scala/tools/nsc/transform/LambdaLift.scala#L68
for symbol sets, which controlled the order that lambda lifted defs were
created, leading to instability in the fresh names. I changed this to use a
linked set instead.

The order of decls was not explicitly pickled, which resulted in different
order of mixin decls in subclasses depending if they were joint or
separarely compiled with the super trait. I was able to change the pickler
to enter member symbols entries in the decl order, so the pickle format
need not change.

Finally (I hope!), I noticed that we have some tricky code to typecheck
refinement types, that defers some of its work
https://github.com/scala/scala/blob/8eb67ed9db83879a536a558d688e3c1009656de5/src/compiler/scala/tools/nsc/typechecker/Typers.scala#L3057-L3068
until the rest of the compilation unit has passed through typer. Part of
this sets the OVERRIDE flag. Inferred types, based that refinement, which
copy the refined type, can be unstable depending on whether or not the flag
has yet been mutated when they observe it. My workaround is to
unconditionally zero that flag on the decls of all copied refinement types. Do
you remember
why we needed to set that flag in the first place?

The first actionable point would be to do something similar to scala/scala@f50ec3c, we can then see what problems we hit in practice. We have some infrastructure in place in https://github.com/lampepfl/dotty/blob/master/compiler/test/dotty/tools/dotc/IdempotencyTests.scala which should be expanded (e.g., check idempotency of the whole compiler instead of a few files in tests/order-idempotency).

@smarter
Copy link
Member Author

smarter commented Dec 2, 2019

One relevant difference between scalac and dotty here is that scalac desugars context bounds in its parser, and fresh name counters are per-compilation-unit there:

class Test {
  def foo[T: Ordering]: Unit = {
    def bar[S: Equiv]: Unit = {}
  }
  def foo2[T: Ordering]: Unit = {}
}
% scalac -Xprint:parser foo.scala
package <empty> {
  class Test extends scala.AnyRef {
    def <init>() = {
      super.<init>();
      ()
    };
    def foo[T](implicit evidence$1: Ordering[T]): Unit = {
      def bar[S](implicit evidence$2: Equiv[S]): Unit = ();
      ()
    };
    def foo2[T](implicit evidence$3: Ordering[T]): Unit = ()
  }
}

In Dotty however, desugarings are handled by https://github.com/lampepfl/dotty/blob/master/compiler/src/dotty/tools/dotc/ast/Desugar.scala which gets called from the typer, where the order in which definition is typed is not stable due to lazy completers. Quoting from above:

Fresh names generated during typechecking should not use compilation-unit
scoped counters, otherwise the assignment can change with different orders
of type completion. Instead, I'm use a separate fresh name scope from the
closest enclosing, user-written (ie, not macro generated) method . I was
mindful to keep names distinct within blocks of macro expanded code.

But if we apply this technique to context bounds we would end up with:

    def foo[T](implicit evidence$1: Ordering[T]): Unit = {
      def bar[S](implicit evidence$1: Equiv[S]): Unit = ();
      ()
    };

which means the parameter of the inner method would shadow the parameter of the outer method, this isn't what we want so a different scheme is needed (maybe use evidence$<nesting level>$ instead of evidence$ as a prefix?)

@smarter
Copy link
Member Author

smarter commented Dec 2, 2019

which means the parameter of the inner method would shadow the parameter of the outer method

@odersky tells me that shadowing is not a problem with given parameters (they're referred to by their symbol), but context bounds are currently desugared into implicit parameters (this needs to be changed at some point anyway, but the migration strategy hasn't been completely thought out yet).

EDIT: We completely dropped implicit shadowing support so this is no longer an issue.

@smarter smarter changed the title Fully determinstic compilation (stable compiler output under reordering and separate compilation) Fully deterministic compilation (stable compiler output under reordering and separate compilation) Dec 2, 2019
@retronym
Copy link
Member

retronym commented Dec 2, 2019

which gets called from the typer, where the order in which definition is typed is not stable due to lazy completers

@smarter this sounds similar to the problem I faced with the naming of default getters, which used to be created lazily in type completers. I changed things to create the fresh names eagerly when creating the type completer, which runs things in the order of source code.

raboof added a commit to raboof/dotty that referenced this issue Apr 28, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

I tried adding an 'integration test' in `tests/pos` to be picked up by
`IdempotencyCheck.scala`, but unfortunately couldn't reproduce the
nondeterminism that way, so didn't end up including it in this commit.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
raboof added a commit to raboof/dotty that referenced this issue Apr 28, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

I tried adding an 'integration test' in `tests/pos` to be picked up by
`IdempotencyCheck.scala`, but unfortunately couldn't reproduce the
nondeterminism that way, so didn't end up including it in this commit.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
raboof added a commit to raboof/dotty that referenced this issue Apr 28, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

An 'integration test' `Annotations.scala` was added in `tests/pos`
to be picked up by `IdempotencyCheck.scala`, but unfortunately I
haven't successfully reproduced the nondeterminism that way.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
raboof added a commit to raboof/dotty that referenced this issue Apr 28, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

Some integration tests were added in `tests/pos` to be picked up by
`IdempotencyCheck.scala`, but unfortunately I haven't successfully
reproduced the nondeterminism that way.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
raboof added a commit to raboof/dotty that referenced this issue Apr 28, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

Some integration tests were added in `tests/pos` to be picked up by
`IdempotencyCheck.scala`, but unfortunately I haven't successfully
reproduced the nondeterminism that way.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
bishabosha pushed a commit to dotty-staging/dotty that referenced this issue Oct 18, 2022
This change makes sure non-repeated annotations are kept in the order
they were found in the source code.

The motivation is not necessarily to have them in the original order,
but to have them in an order that is deterministic across rebuilds
(potentially even across different machines), for reasons discussed
further in scala#7661 and the corresponding scala/scala-dev#405

Some integration tests were added in `tests/pos` to be picked up by
`IdempotencyCheck.scala`, but unfortunately I haven't successfully
reproduced the nondeterminism that way.

I didn't see an obvious place for a 'unit test' of this code, I'd be
happy to add one when someone can recommend a good place to put it.

This is basically the dotty equivalent of
scala/scala@954c5d3

Fixes scala#14743
smarter added a commit to dotty-staging/dotty that referenced this issue Jul 24, 2023
Context bounds are desugared into term parameters `evidence$N` and before this
commit, the `N` was chosen to be unique in the current compilation unit. This
isn't great because it means that adding a new definition with a context bound
in the middle of a file would change the desugaring of subsequent definitions
in the same file.

Even worse, when using incremental compilation we could end up with the same
context bound desugared with a different value of `N` on different compilation
runs because the order in which a compilation unit is traversed during Typer is
not fixed but depends on the how the units that are jointly compiled depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and renaming
a method parameter forces the recompilation of all files calling that method.

To fix this, we now only require context bounds parameters to have unique names
among all the parameters of the method. This matches how we already desugar
`def foo(using A, B)` into `def foo(x$1: A, x$2: B)` regardless of the context.

Note that fresh names used in other situations are still problematic for
deterministic compilation, most of the time they're not part of the API checked
by Zinc, but they they can still lead to overcompilation if they appear in an
inline def since the entire body of the inline def constitutes its API. In the
future, we should follow Scala 2 lead and only require names to be fresh at the
method level: scala/scala#6300 (The Scala 2 logic is
slightly more complex to handle macros, but I don't think that applies to Scala
3 macros), see scala#7661.

Fixes scala#18080.
smarter added a commit to dotty-staging/dotty that referenced this issue Jul 24, 2023
Context bounds are desugared into term parameters `evidence$N` and before this
commit, the `N` was chosen to be unique in the current compilation unit. This
isn't great because it means that adding a new definition with a context bound
in the middle of a file would change the desugaring of subsequent definitions
in the same file.

Even worse, when using incremental compilation we could end up with the same
context bound desugared with a different value of `N` on different compilation
runs because the order in which a compilation unit is traversed during Typer is
not fixed but depends on the how the units that are jointly compiled depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and renaming
a method parameter forces the recompilation of all files calling that method.

To fix this, we now only require context bounds parameters to have unique names
among all the parameters of the method. This matches how we already desugar
`def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the
context.

Note that fresh names used in other situations are still problematic for
deterministic compilation. Most of the time they're not part of the API checked
by Zinc, but they can still lead to overcompilation if they appear in an
`inline def` since the entire body of the `inline def` constitutes its API. In
the future, we should follow Scala 2's lead and only require names to be fresh
at the method level: scala/scala#6300 (The Scala 2
logic is slightly more complex to handle macros, but I don't think that applies
to Scala 3 macros), see scala#7661.

Fixes scala#18080.
smarter added a commit to dotty-staging/dotty that referenced this issue Jul 24, 2023
Context bounds are desugared into term parameters `evidence$N` and before this
commit, the `N` was chosen to be unique in the current compilation unit. This
isn't great because it means that adding a new definition with a context bound
in the middle of a file would change the desugaring of subsequent definitions
in the same file.

Even worse, when using incremental compilation we could end up with the same
context bound desugared with a different value of `N` on different compilation
runs because the order in which a compilation unit is traversed during Typer is
not fixed but depends on the how the units that are jointly compiled depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and renaming
a method parameter forces the recompilation of all files calling that method.

To fix this, we now only require context bounds parameters to have unique names
among all the parameters of the method. This matches how we already desugar
`def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the
context.

Note that fresh names used in other situations are still problematic for
deterministic compilation. Most of the time they're not part of the API checked
by Zinc, but they can still lead to overcompilation if they appear in an
`inline def` since the entire body of the `inline def` constitutes its API. In
the future, we should follow Scala 2's lead and only require names to be fresh
at the method level: scala/scala#6300 (The Scala 2
logic is slightly more complex to handle macros, but I don't think that applies
to Scala 3 macros), see scala#7661.

Fixes scala#18080.
smarter added a commit to dotty-staging/dotty that referenced this issue Jul 25, 2023
Context bounds are desugared into term parameters `evidence$N` and before this
commit, the `N` was chosen to be unique in the current compilation unit. This
isn't great because it means that adding a new definition with a context bound
in the middle of a file would change the desugaring of subsequent definitions
in the same file.

Even worse, when using incremental compilation we could end up with the same
context bound desugared with a different value of `N` on different compilation
runs because the order in which a compilation unit is traversed during Typer is
not fixed but depends on the how the units that are jointly compiled depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and renaming
a method parameter forces the recompilation of all files calling that method.

To fix this, we now only require context bounds parameters to have unique names
among all the parameters of the method. This matches how we already desugar
`def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the
context.

Note that fresh names used in other situations are still problematic for
deterministic compilation. Most of the time they're not part of the API checked
by Zinc, but they can still lead to overcompilation if they appear in an
`inline def` since the entire body of the `inline def` constitutes its API. In
the future, we should follow Scala 2's lead and only require names to be fresh
at the method level: scala/scala#6300 (The Scala 2
logic is slightly more complex to handle macros, but I don't think that applies
to Scala 3 macros), see scala#7661.

Fixes scala#18080.
smarter added a commit that referenced this issue Jul 25, 2023
Context bounds are desugared into term parameters `evidence$N` and
before this
commit, the `N` was chosen to be unique in the current compilation unit.
This
isn't great because it means that adding a new definition with a context
bound
in the middle of a file would change the desugaring of subsequent
definitions
in the same file.

Even worse, when using incremental compilation we could end up with the
same
context bound desugared with a different value of `N` on different
compilation
runs because the order in which a compilation unit is traversed during
Typer is
not fixed but depends on the how the units that are jointly compiled
depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and
renaming
a method parameter forces the recompilation of all files calling that
method.

To fix this, we now only require context bounds parameters to have
unique names
among all the parameters of the method. This matches how we already
desugar
`def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of
the
context.

Note that fresh names used in other situations are still problematic for
deterministic compilation. Most of the time they're not part of the API
checked
by Zinc, but they can still lead to overcompilation if they appear in an
`inline def` since the entire body of the `inline def` constitutes its
API. In
the future, we should follow Scala 2's lead and only require names to be
fresh
at the method level: scala/scala#6300 (The Scala
2
logic is slightly more complex to handle macros, but I don't think that
applies
to Scala 3 macros), see #7661.

Fixes #18080.
Kordyjan pushed a commit that referenced this issue Dec 1, 2023
Context bounds are desugared into term parameters `evidence$N` and before this
commit, the `N` was chosen to be unique in the current compilation unit. This
isn't great because it means that adding a new definition with a context bound
in the middle of a file would change the desugaring of subsequent definitions
in the same file.

Even worse, when using incremental compilation we could end up with the same
context bound desugared with a different value of `N` on different compilation
runs because the order in which a compilation unit is traversed during Typer is
not fixed but depends on the how the units that are jointly compiled depend on
each other (as demonstrated by the `stable-ctx-bounds` test). This issue
affects all fresh names generated during Typer, but it is especially
problematic for context bounds because they're part of the API and renaming
a method parameter forces the recompilation of all files calling that method.

To fix this, we now only require context bounds parameters to have unique names
among all the parameters of the method. This matches how we already desugar
`def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the
context.

Note that fresh names used in other situations are still problematic for
deterministic compilation. Most of the time they're not part of the API checked
by Zinc, but they can still lead to overcompilation if they appear in an
`inline def` since the entire body of the `inline def` constitutes its API. In
the future, we should follow Scala 2's lead and only require names to be fresh
at the method level: scala/scala#6300 (The Scala 2
logic is slightly more complex to handle macros, but I don't think that applies
to Scala 3 macros), see #7661.

Fixes #18080.

[Cherry-picked f322b7b]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants