Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Simplify revFlowThrough #18355

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 9 additions & 20 deletions shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll
Original file line number Diff line number Diff line change
Expand Up @@ -2261,10 +2261,7 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {
returnAp = apNone()
or
// flow through a callable
exists(DataFlowCall call, ParamNodeEx p, Ap innerReturnAp |
revFlowThrough(call, returnCtx, p, state, _, returnAp, ap, innerReturnAp) and
flowThroughIntoCall(call, node, p, ap, innerReturnAp)
)
revFlowThrough(_, returnCtx, state, returnAp, ap, node)
or
// flow out of a callable
exists(ReturnPosition pos |
Expand Down Expand Up @@ -2413,11 +2410,14 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {

pragma[nomagic]
private predicate revFlowThrough(
DataFlowCall call, ReturnCtx returnCtx, ParamNodeEx p, FlowState state,
ReturnPosition pos, ApOption returnAp, Ap ap, Ap innerReturnAp
DataFlowCall call, ReturnCtx returnCtx, FlowState state, ApOption returnAp, Ap ap,
ArgNodeEx arg
) {
revFlowParamToReturn(p, state, pos, innerReturnAp, ap) and
revFlowIsReturned(call, returnCtx, returnAp, pos, innerReturnAp)
exists(ParamNodeEx p, ReturnPosition pos, Ap innerReturnAp |
flowThroughIntoCall(call, arg, p, ap, innerReturnAp) and
revFlowParamToReturn(p, state, pos, innerReturnAp, ap) and
revFlowIsReturned(call, returnCtx, returnAp, pos, innerReturnAp)
Comment on lines +2417 to +2419
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is disrupting a non-linear join, so it's very likely beneficial to push flowThroughIntoCall further into one of the recursive conjuncts in some way - as-is we're likely having some inefficiency with at least one of the two delta+prev combinations. Also, it would be nice to understand a bit deeper which columns are contributing to the blowup in which way.
We can safely push the projection flowThroughIntoCall(_, _, p, ap, innerReturnAp) into revFlowParamToReturn, which may be beneficial, as that's a pure filter on a pre-non-linear-join conjunct, but we cannot push in the other columns as that would amount to a join with the call-graph a bit too soon (revFlowIsReturned is exactly meant to constrain that part as much as possible).
OTOH, it may very well be good to push flowThroughIntoCall in its entirety into revFlowIsReturned as that already contains the call graph edge. If a project of flowThroughIntoCall to a pure filter in that case turns out yield a beneficial tuple reduction, then the join of revFlowOut and returnFlowsThrough (which occurs in a few places) ought to be revised as flowThroughIntoCall already contains a projected version of returnFlowsThrough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried the two scenarios you're referring to here:

https://github.com/github/codeql/commits/smowton/perf/simplify-rev-flow-through-cand-1 goes for scenario 1, add a filter in revFlowParamToReturn. Dead simple; I'll start a DCA shortly.

https://github.com/github/codeql/commits/smowton/perf/simplify-rev-flow-through-cand-2 goes for scenario 2, "push flowThroughIntoCall in its entirety". However I imagine I've misinterpreted what you want there, since pushing it and adding parameters means we once again have a predicate revFlowIsReturned that materialises the product of the three Aps, which seems to be the source of the trouble.

I'm also not sure what you want re: the join of revFlowOut and returnFlowsThrough: I replaced _s on the one invocation of ReturnFlowsThrough that now coincided with its sibling flowThroughIntoCall, but found only one other use-site that combined revFlowOut and returnFlowsThrough, which seemed difficult to adapt to share anything with the other use-site.

I also found that I can no longer reproduce my original performance problem now that during the release we discovered the performance problem introduced by #18235. That suggests that the extra FP flow introduced via vararg array hairpin routes (e.g. x -> varargs(x, y) -> y) was crucial in the large blowup I was seeing, and this may not in fact be a relevant problem. I think it still likely is the case that revFlowThrough with its three Ap arguments is not ideal to materialise, and the materialisations resulting from this PR's simplifications are better in principle, but the specific motivation to chase this down has gone and may not return if this is too much of a pain to get merged.

)
}

/**
Expand Down Expand Up @@ -2543,22 +2543,11 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {
)
}

pragma[nomagic]
private predicate revFlowThroughArg(
DataFlowCall call, ArgNodeEx arg, FlowState state, ReturnCtx returnCtx, ApOption returnAp,
Ap ap
) {
exists(ParamNodeEx p, Ap innerReturnAp |
revFlowThrough(call, returnCtx, p, state, _, returnAp, ap, innerReturnAp) and
flowThroughIntoCall(call, arg, p, ap, innerReturnAp)
)
}

pragma[nomagic]
predicate callMayFlowThroughRev(DataFlowCall call) {
exists(ArgNodeEx arg, FlowState state, ReturnCtx returnCtx, ApOption returnAp, Ap ap |
revFlow(arg, state, returnCtx, returnAp, ap) and
revFlowThroughArg(call, arg, state, returnCtx, returnAp, ap)
revFlowThrough(call, returnCtx, state, returnAp, ap, arg)
)
}

Expand Down
Loading