Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Reactant.jl in precompiling.jl will cause Segfault #13

Open
x66ccff opened this issue Jan 7, 2025 · 5 comments
Open

[BUG]: Reactant.jl in precompiling.jl will cause Segfault #13

x66ccff opened this issue Jan 7, 2025 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@x66ccff
Copy link
Owner

x66ccff commented Jan 7, 2025

What happened?

It seems that enabling PSRN (using the Reactant.jl backend) in precompiling.jl causes a segmentation fault when the main program starts. Therefore, I disabled PSRN in precompiling.jl using this method

if options.populations > 3 # TODO I don' know how to add a option for control whether use PSRN or not, cause Option too complex for me ...
println("Use PSRN")
# N_PSRN_INPUT = 10

if options.populations > 3 # TODO I don' know how to add a option for control whether use PSRN or not, cause Option too complex for me ...
start_psrn_task(
psrn_manager, dominating_trees, dataset, options, N_PSRN_INPUT
)
process_psrn_results!(
psrn_manager, state.halls_of_fame[j], dataset, options
)
end

@x66ccff x66ccff added the bug Something isn't working label Jan 7, 2025
@x66ccff
Copy link
Owner Author

x66ccff commented Jan 7, 2025

@MilesCranmer I feel like this is similar to what you mentioned earlier about some I/O being opened but not closed during precompilation, but I'm not entirely sure how to resolve it. What's even stranger is that I was originally able to use Reactant.jl during precompilation, but after I deleted the folder and re-cloned my repository, and then re-instantiated it, I encountered the issue of not being able to use Reactant.jl during precompilation. I find this problem quite puzzling. Do you have any thoughts on this?

@x66ccff
Copy link
Owner Author

x66ccff commented Jan 7, 2025

To reproduce this error, you can change the >3 to >0 in the code mentioned above.


Edit: here is the output

[ Info: Started!
Options(binops=(+, *, /, -), unaops=(cos, exp, SymbolicRegression.CoreModule.OperatorsModule.safe_log, sin), bin_constraints=[(-1, -1), (-1, -1), (-1, -1), (-1, -1)], una_constraints=[-1, -1, -1, -1], complexity_mapping=SymbolicRegression.CoreModule.OptionsStructModule.ComplexityMapping{Int64, Int64}(false, Int64[], Int64[], 0, 0), tournament_selection_n=15, tournament_selection_p=0.982, parsimony=0.0, dimensional_constraint_penalty=nothing, dimensionless_constants_only=false, alpha=3.17, maxsize=30, maxdepth=30, turbo=Val{false}(), bumper=Val{false}(), migration=true, hof_migration=true, should_simplify=true, should_optimize_constants=true, output_directory=nothing, populations=31, perturbation_factor=0.129, annealing=true, batching=false, batch_size=50, mutation_weights=..., crossover_probability=0.0259, warmup_maxsize_by=0.0, use_frequency=true, use_frequency_in_tournament=true, adaptive_parsimony_scaling=1040.0, population_size=27, ncycles_per_iteration=380, fraction_replaced=0.00036, fraction_replaced_hof=0.0614, topn=12, verbosity=nothing, v_print_precision=Val{5}(), save_to_file=true, probability_negate_constant=0.00743, seed=nothing, elementwise_loss=L2DistLoss, loss_function=nothing, node_type=Node, expression_type=Expression, expression_options=NamedTuple(), progress=nothing, terminal_width=nothing, optimizer_algorithm=Optim.BFGS{LineSearches.InitialStatic{Float64}, LineSearches.BackTracking{Float64, Int64}, Nothing, Nothing, Optim.Flat}(LineSearches.InitialStatic{Float64}
  alpha: Float64 1.0
  scaled: Bool false
, LineSearches.BackTracking{Float64, Int64}
  c_1: Float64 0.0001
  ρ_hi: Float64 0.5
  ρ_lo: Float64 0.1
  iterations: Int64 1000
  order: Int64 3
  maxstep: Float64 Inf
  cache: Nothing nothing
, nothing, nothing, Optim.Flat()), optimizer_probability=0.14, optimizer_nrestarts=2, optimizer_options=..., autodiff_backend=nothing, recorder_file=pysr_recorder.json, prob_pick_first=0.982, early_stop_condition=nothing, return_state=Val{nothing}(), timeout_in_seconds=nothing, max_evals=nothing, input_stream=Base.TTY(RawFD(14) active, 0 bytes waiting), skip_mutation_failures=true, nested_constraints=nothing, deterministic=false, define_helper_functions=true, use_recorder=false)
Use PSRN
[ Info: compiling layer = 1 / total 2 ...
compiling binary triu operator Add ... 15
[ Info: (1, 15)
[ Info: Reactant.ConcreteRArray{Float32, 2}
[ Info: 15
[ Info: (2, 120)
[ Info: Reactant.ConcreteRArray{Int64, 2}

[3815112] signal 11 (1): 段错误
in expression starting at /home/kent/_Project/PTSjl/SymbolicRegressionGPU.jl/example.jl:39
unknown function (ip: 0x7a8e25782110)
add_kernel! at /home/kent/_Project/PTSjl/SymbolicRegressionGPU.jl/src/PSRNfunctions.jl:101 [inlined]
call_with_reactant at /home/kent/.julia/packages/Reactant/QnaAD/src/utils.jl:0
Allocations: 19498775 (Pool: 19498213; Big: 562); GC: 32
段错误 (核心已转储)
(base) kent@kent-Super-Server:~/_Project/PTSjl/SymbolicRegressionGPU.jl$ 

It seems that the program encountered a segfault when compiling kernels at the beginning of the main program, right after precompiling.jl finished.

function add_kernel!(x::AbstractMatrix, n::Int, indices::AbstractMatrix)
l_idx = indices[1, :]
r_idx = indices[2, :]
l_value = x[l_idx]'
r_value = x[r_idx]'
res = l_value .+ r_value
return res
end

function compile_kernels!(layer::SymbolLayer)
for op in layer.operators
if op isa UnaryOperator
println("compiling unary operator $(op.name) ... $(layer.in_dim)")
op.compiled_kernel = compile_unary_kernel(layer.in_dim, op.kernel)
elseif op isa BinaryTriuOperator
println("compiling binary triu operator $(op.name) ... $(layer.in_dim)")
op.compiled_kernel = compile_binary_triu_kernel(layer.in_dim, op.kernel)
elseif op isa BinarySquaredOperator
println("compiling binary squared operator $(op.name) ... $(layer.in_dim)")
op.compiled_kernel = compile_binary_squared_kernel(layer.in_dim, op.kernel)
else
error("Unsupported operator type: $(typeof(op))")
end
end
end

@x66ccff x66ccff added the help wanted Extra attention is needed label Jan 7, 2025
@MilesCranmer
Copy link

Hm. The Reactant.jl developers are pretty responsive; maybe they would know?

@wsmoses
Copy link

wsmoses commented Jan 8, 2025

Not certain, but it may be JuliaLang/julia#56947 which has a backport fix to 1.10 here JuliaLang/julia#56973

And also regardless you need to clear the opaque closure compile cache like here: https://github.com/EnzymeAD/Reactant.jl/blob/47f363bbd73c91594913fb532db525ccea33b12b/src/Precompile.jl#L58

We'll hopefully clean up stuff shortly to give some guidance on precompilation shortly

@x66ccff
Copy link
Owner Author

x66ccff commented Jan 8, 2025

Thanks! I'll take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants