Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

smarter · 2019-12-02T12:49:48Z

Corresponding scalac issue: scala/scala-dev#405

A while ago @retronym did some amazing work making scalac reproducible, we're still lagging behind that. I've recently opened #7620 which slightly improves things but there's more to do, Jason sent me a mail a while back with more details:

I flushed out the problems by first crafting some test cases
https://github.com/retronym/scalac-stability/blob/master/README.md that I
figured would be unstable, and then by running a script
https://github.com/retronym/scalac-stability/blob/master/scalac-stability-check.sh
to try compile, reverse compile, file-by-file compile and diffing classes
after each step. We recently wrote a the jardiff
https://github.com/scala/jardiff tool to make diffing JARs pleasant.

Here's my work in progress
https://github.com/scala/scala/compare/2.13.x...retronym:topic/stable-output?expand=1.
Following are the areas that needed attention.

Fresh names generated during typechecking should not use compilation-unit
scoped counters, otherwise the assignment can change with different orders
of type completion. Instead, I'm use a separate fresh name scope from the
closest enclosing, user-written (ie, not macro generated) method . I was
mindful to keep names distinct within blocks of macro expanded code.

Parameter alias computation across multiple levels of inheritance fail to
detect an alias due to a data race in the calls to setAlias and alias. I've
tried to move some of the computation post typer, once we know that all of
the base classes have been through typer and had the super constructor
calls analyzed.

Mixed in outer accessors were incorrectly taking the position of the mixed
in member, rather than using that of the subclass. This showed up an
instability because the position of the mixed in member was only available
in joint compilation.

Default getters could be added into scope based on the order that the
related methods were type completed. I've changed namers to eagerly create
and enter symbols for the default getters when the DefDef is entered. When
the DefDef is completed, Namers finishes the job of assigning types to the
default getter and handing the their bodies to Typer to be spliced into the
tree. Conceptually, this is like having a phase between parser and namer
that adds stub DefDefs for the default getters.

Lambdalift was using Symbol.id as part of the a sort key
https://github.com/scala/scala/blob/8eb67ed9db83879a536a558d688e3c1009656de5/src/compiler/scala/tools/nsc/transform/LambdaLift.scala#L68
for symbol sets, which controlled the order that lambda lifted defs were
created, leading to instability in the fresh names. I changed this to use a
linked set instead.

The order of decls was not explicitly pickled, which resulted in different
order of mixin decls in subclasses depending if they were joint or
separarely compiled with the super trait. I was able to change the pickler
to enter member symbols entries in the decl order, so the pickle format
need not change.

Finally (I hope!), I noticed that we have some tricky code to typecheck
refinement types, that defers some of its work
https://github.com/scala/scala/blob/8eb67ed9db83879a536a558d688e3c1009656de5/src/compiler/scala/tools/nsc/typechecker/Typers.scala#L3057-L3068
until the rest of the compilation unit has passed through typer. Part of
this sets the OVERRIDE flag. Inferred types, based that refinement, which
copy the refined type, can be unstable depending on whether or not the flag
has yet been mutated when they observe it. My workaround is to
unconditionally zero that flag on the decls of all copied refinement types. Do
you remember why we needed to set that flag in the first place?

The first actionable point would be to do something similar to scala/scala@f50ec3c, we can then see what problems we hit in practice. We have some infrastructure in place in https://github.com/lampepfl/dotty/blob/master/compiler/test/dotty/tools/dotc/IdempotencyTests.scala which should be expanded (e.g., check idempotency of the whole compiler instead of a few files in tests/order-idempotency).

The text was updated successfully, but these errors were encountered:

smarter · 2019-12-02T13:19:35Z

One relevant difference between scalac and dotty here is that scalac desugars context bounds in its parser, and fresh name counters are per-compilation-unit there:

class Test {
  def foo[T: Ordering]: Unit = {
    def bar[S: Equiv]: Unit = {}
  }
  def foo2[T: Ordering]: Unit = {}
}

% scalac -Xprint:parser foo.scala
package <empty> {
  class Test extends scala.AnyRef {
    def <init>() = {
      super.<init>();
      ()
    };
    def foo[T](implicit evidence$1: Ordering[T]): Unit = {
      def bar[S](implicit evidence$2: Equiv[S]): Unit = ();
      ()
    };
    def foo2[T](implicit evidence$3: Ordering[T]): Unit = ()
  }
}

In Dotty however, desugarings are handled by https://github.com/lampepfl/dotty/blob/master/compiler/src/dotty/tools/dotc/ast/Desugar.scala which gets called from the typer, where the order in which definition is typed is not stable due to lazy completers. Quoting from above:

Fresh names generated during typechecking should not use compilation-unit
scoped counters, otherwise the assignment can change with different orders
of type completion. Instead, I'm use a separate fresh name scope from the
closest enclosing, user-written (ie, not macro generated) method . I was
mindful to keep names distinct within blocks of macro expanded code.

But if we apply this technique to context bounds we would end up with:

    def foo[T](implicit evidence$1: Ordering[T]): Unit = {
      def bar[S](implicit evidence$1: Equiv[S]): Unit = ();
      ()
    };

which means the parameter of the inner method would shadow the parameter of the outer method, this isn't what we want so a different scheme is needed (maybe use evidence$<nesting level>$ instead of evidence$ as a prefix?)

smarter · 2019-12-02T14:41:00Z

which means the parameter of the inner method would shadow the parameter of the outer method

@odersky tells me that shadowing is not a problem with given parameters (they're referred to by their symbol), but context bounds are currently desugared into implicit parameters (this needs to be changed at some point anyway, but the migration strategy hasn't been completely thought out yet).

EDIT: We completely dropped implicit shadowing support so this is no longer an issue.

retronym · 2019-12-02T23:45:13Z

which gets called from the typer, where the order in which definition is typed is not stable due to lazy completers

@smarter this sounds similar to the problem I faced with the naming of default getters, which used to be created lazily in type completers. I changed things to create the fresh names eagerly when creating the type completer, which runs things in the order of source code.

This change makes sure non-repeated annotations are kept in the order they were found in the source code. The motivation is not necessarily to have them in the original order, but to have them in an order that is deterministic across rebuilds (potentially even across different machines), for reasons discussed further in scala#7661 and the corresponding scala/scala-dev#405 I tried adding an 'integration test' in `tests/pos` to be picked up by `IdempotencyCheck.scala`, but unfortunately couldn't reproduce the nondeterminism that way, so didn't end up including it in this commit. I didn't see an obvious place for a 'unit test' of this code, I'd be happy to add one when someone can recommend a good place to put it. This is basically the dotty equivalent of scala/scala@954c5d3 Fixes scala#14743

This change makes sure non-repeated annotations are kept in the order they were found in the source code. The motivation is not necessarily to have them in the original order, but to have them in an order that is deterministic across rebuilds (potentially even across different machines), for reasons discussed further in scala#7661 and the corresponding scala/scala-dev#405 An 'integration test' `Annotations.scala` was added in `tests/pos` to be picked up by `IdempotencyCheck.scala`, but unfortunately I haven't successfully reproduced the nondeterminism that way. I didn't see an obvious place for a 'unit test' of this code, I'd be happy to add one when someone can recommend a good place to put it. This is basically the dotty equivalent of scala/scala@954c5d3 Fixes scala#14743

This change makes sure non-repeated annotations are kept in the order they were found in the source code. The motivation is not necessarily to have them in the original order, but to have them in an order that is deterministic across rebuilds (potentially even across different machines), for reasons discussed further in scala#7661 and the corresponding scala/scala-dev#405 Some integration tests were added in `tests/pos` to be picked up by `IdempotencyCheck.scala`, but unfortunately I haven't successfully reproduced the nondeterminism that way. I didn't see an obvious place for a 'unit test' of this code, I'd be happy to add one when someone can recommend a good place to put it. This is basically the dotty equivalent of scala/scala@954c5d3 Fixes scala#14743

Context bounds are desugared into term parameters `evidence$N` and before this commit, the `N` was chosen to be unique in the current compilation unit. This isn't great because it means that adding a new definition with a context bound in the middle of a file would change the desugaring of subsequent definitions in the same file. Even worse, when using incremental compilation we could end up with the same context bound desugared with a different value of `N` on different compilation runs because the order in which a compilation unit is traversed during Typer is not fixed but depends on the how the units that are jointly compiled depend on each other (as demonstrated by the `stable-ctx-bounds` test). This issue affects all fresh names generated during Typer, but it is especially problematic for context bounds because they're part of the API and renaming a method parameter forces the recompilation of all files calling that method. To fix this, we now only require context bounds parameters to have unique names among all the parameters of the method. This matches how we already desugar `def foo(using A, B)` into `def foo(x$1: A, x$2: B)` regardless of the context. Note that fresh names used in other situations are still problematic for deterministic compilation, most of the time they're not part of the API checked by Zinc, but they they can still lead to overcompilation if they appear in an inline def since the entire body of the inline def constitutes its API. In the future, we should follow Scala 2 lead and only require names to be fresh at the method level: scala/scala#6300 (The Scala 2 logic is slightly more complex to handle macros, but I don't think that applies to Scala 3 macros), see scala#7661. Fixes scala#18080.

Context bounds are desugared into term parameters `evidence$N` and before this commit, the `N` was chosen to be unique in the current compilation unit. This isn't great because it means that adding a new definition with a context bound in the middle of a file would change the desugaring of subsequent definitions in the same file. Even worse, when using incremental compilation we could end up with the same context bound desugared with a different value of `N` on different compilation runs because the order in which a compilation unit is traversed during Typer is not fixed but depends on the how the units that are jointly compiled depend on each other (as demonstrated by the `stable-ctx-bounds` test). This issue affects all fresh names generated during Typer, but it is especially problematic for context bounds because they're part of the API and renaming a method parameter forces the recompilation of all files calling that method. To fix this, we now only require context bounds parameters to have unique names among all the parameters of the method. This matches how we already desugar `def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the context. Note that fresh names used in other situations are still problematic for deterministic compilation. Most of the time they're not part of the API checked by Zinc, but they can still lead to overcompilation if they appear in an `inline def` since the entire body of the `inline def` constitutes its API. In the future, we should follow Scala 2's lead and only require names to be fresh at the method level: scala/scala#6300 (The Scala 2 logic is slightly more complex to handle macros, but I don't think that applies to Scala 3 macros), see scala#7661. Fixes scala#18080.

Context bounds are desugared into term parameters `evidence$N` and before this commit, the `N` was chosen to be unique in the current compilation unit. This isn't great because it means that adding a new definition with a context bound in the middle of a file would change the desugaring of subsequent definitions in the same file. Even worse, when using incremental compilation we could end up with the same context bound desugared with a different value of `N` on different compilation runs because the order in which a compilation unit is traversed during Typer is not fixed but depends on the how the units that are jointly compiled depend on each other (as demonstrated by the `stable-ctx-bounds` test). This issue affects all fresh names generated during Typer, but it is especially problematic for context bounds because they're part of the API and renaming a method parameter forces the recompilation of all files calling that method. To fix this, we now only require context bounds parameters to have unique names among all the parameters of the method. This matches how we already desugar `def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the context. Note that fresh names used in other situations are still problematic for deterministic compilation. Most of the time they're not part of the API checked by Zinc, but they can still lead to overcompilation if they appear in an `inline def` since the entire body of the `inline def` constitutes its API. In the future, we should follow Scala 2's lead and only require names to be fresh at the method level: scala/scala#6300 (The Scala 2 logic is slightly more complex to handle macros, but I don't think that applies to Scala 3 macros), see #7661. Fixes #18080.

Context bounds are desugared into term parameters `evidence$N` and before this commit, the `N` was chosen to be unique in the current compilation unit. This isn't great because it means that adding a new definition with a context bound in the middle of a file would change the desugaring of subsequent definitions in the same file. Even worse, when using incremental compilation we could end up with the same context bound desugared with a different value of `N` on different compilation runs because the order in which a compilation unit is traversed during Typer is not fixed but depends on the how the units that are jointly compiled depend on each other (as demonstrated by the `stable-ctx-bounds` test). This issue affects all fresh names generated during Typer, but it is especially problematic for context bounds because they're part of the API and renaming a method parameter forces the recompilation of all files calling that method. To fix this, we now only require context bounds parameters to have unique names among all the parameters of the method. This matches how we already desugar `def foo(using A, B)` into `def foo(using x$1: A, x$2: B)` regardless of the context. Note that fresh names used in other situations are still problematic for deterministic compilation. Most of the time they're not part of the API checked by Zinc, but they can still lead to overcompilation if they appear in an `inline def` since the entire body of the `inline def` constitutes its API. In the future, we should follow Scala 2's lead and only require names to be fresh at the method level: scala/scala#6300 (The Scala 2 logic is slightly more complex to handle macros, but I don't think that applies to Scala 3 macros), see #7661. Fixes #18080. [Cherry-picked f322b7b]

smarter added itype:bug area:transform area:typer compat:scala2 itype:enhancement and removed itype:bug labels Dec 2, 2019

smarter mentioned this issue Dec 2, 2019

Make fresh name generation per file instead of global #7620

Merged

smarter changed the title ~~Fully determinstic compilation (stable compiler output under reordering and separate compilation)~~ Fully deterministic compilation (stable compiler output under reordering and separate compilation) Dec 2, 2019

smarter mentioned this issue Mar 22, 2022

order of annotations in produced bytecode is not deterministic #14743

Closed

raboof mentioned this issue Apr 28, 2022

Keep annotation order #15063

Merged

raboof mentioned this issue Apr 24, 2023

Order of 'mc' synthetic methods is not deterministic #17330

Closed

smarter mentioned this issue Jul 24, 2023

Fix overcompilation due to unstable context bound desugaring #18280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

smarter commented Dec 2, 2019 •

edited

Loading

smarter commented Dec 2, 2019 •

edited

Loading

smarter commented Dec 2, 2019 •

edited

Loading

retronym commented Dec 2, 2019 •

edited

Loading

Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

Fully deterministic compilation (stable compiler output under reordering and separate compilation) #7661

Comments

smarter commented Dec 2, 2019 • edited Loading

smarter commented Dec 2, 2019 • edited Loading

smarter commented Dec 2, 2019 • edited Loading

retronym commented Dec 2, 2019 • edited Loading

smarter commented Dec 2, 2019 •

edited

Loading

smarter commented Dec 2, 2019 •

edited

Loading

smarter commented Dec 2, 2019 •

edited

Loading

retronym commented Dec 2, 2019 •

edited

Loading