Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

back2source: Run pipeline for Ruby source and gem packages #1476

Open
Tracked by #1437
pombredanne opened this issue Dec 12, 2024 · 10 comments
Open
Tracked by #1437

back2source: Run pipeline for Ruby source and gem packages #1476

pombredanne opened this issue Dec 12, 2024 · 10 comments
Assignees
Milestone

Comments

@pombredanne
Copy link
Member

No description provided.

@chinyeungli
Copy link
Contributor

For Ruby, the .gem file is the deployed binary. Our focus is on the files within the data.tar.gz archive in the .gem file. These files are a subset of the development codebase, and by checking the checksums between them, we can confirm the mapping. Since the "map_deploy_to_develop" pipeline already handles extract_archives and map_checksum, I believe the current pipeline manages Ruby's back-to-source process well.

I tried with https://rubygems.org/gems/devise and it's working fine.

@pombredanne any comment?

@pombredanne
Copy link
Member Author

@chinyeungli I still would want to have a specific option for Ruby. There are some cases where ruby gems have native code and the d2d will not work just off checksums. See for instance https://tristanpenman.com/blog/posts/2018/08/29/writing-a-gem-with-native-extensions/

@chinyeungli
Copy link
Contributor

Created a packages list that can be used for testing
ruby_projects_list.xlsx

@chinyeungli
Copy link
Contributor

I tried 10+ projects and most of them have exact sha1 match between deploy and devel.
There is one exception, that I found so far, nokogiri-1.17.2.gem

In this gem file, it contains the following 2 libraries which don't exist in the development codebase:

gems/nokogiri-1.17.2.gem-extract/data.tar.gz-extract/ports/archives/libxml2-2.13.5.tar.xz
gems/nokogiri-1.17.2.gem-extract/data.tar.gz-extract/ports/archives/libxslt-1.1.42.tar.xz

In addition, there is no references for these 2 libraries anywhere in the output (the "DEPENDENCIES" worksheet is empty).
I guess we need to do something about this?

The nokogiri-1.17.2.gem has some .c and .h files and those also exist in the development codebase (with sha1match).
Is this the native code that you mentioned about?

@pombredanne
Copy link
Member Author

I guess we need to do something about this?
...
Is this the native code that you mentioned about?
This is an issue indeed and something to handle as a problem for upstream!
@chinyeungli Excellent find

@pombredanne
Copy link
Member Author

This is the kind of ghost embedded 3rd-party packages that matter a lot!

@chinyeungli
Copy link
Contributor

the license may be different. Can you check the high level licenses differences between the gems and nokogiri (though this may be documented at https://github.com/sparklemotion/nokogiri/blob/main/LICENSE-DEPENDENCIES.md and well known then?)

both are under mit as same as nokogiri

@tdruez
Copy link
Contributor

tdruez commented Jan 13, 2025

Run on the 100 projects list provided at #1476 (comment)

  1. Convert the ruby_projects_list.xlsx into back2source_ruby_projects_list.csv

  2. Batch create the projects

    docker compose -f /opt/scancodeio/docker-compose.yml run --rm \
        --volume $PWD:/input-data:ro \
        web scanpipe batch-create \
        --input-list /input-data/back2source_ruby_projects_list.csv \
        --pipeline map_deploy_to_develop \
        --label back2source-Ruby \
        --execute --async
    
  3. Use the "Report" action on the filtered list of projects by "back2source-Ruby" label.

Results: back2source-report-Ruby.xlsx

AyanSinhaMahapatra added a commit that referenced this issue Jan 20, 2025
Also ignore specific files paths containing metadata in ruby
gems.

Reference: #1438
Reference: #1476
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
AyanSinhaMahapatra added a commit that referenced this issue Jan 20, 2025
Also ignore specific files paths containing metadata in ruby
gems.

Reference: #1438
Reference: #1476
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@pombredanne
Copy link
Member Author

@AyanSinhaMahapatra @tdruez This looks good. Now what are the true issues we found? What would be the report that highlights these?

@tdruez
Copy link
Contributor

tdruez commented Jan 23, 2025

What would be the report that highlights these?

The report is available in the previous comment: #1476 (comment)

Results: back2source-report-Ruby.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

4 participants