Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Spark support in pipelinedp4j for end-to-end Differential Privacy #278

Merged
merged 38 commits into from
Dec 5, 2024

Conversation

sakkumar
Copy link
Contributor

Apache Spark support in pipelinedp4j for end-to-end Differential Privacy

Testing
%cd pipelinedp4j
%bazelisk build ...
%bazelisk test ...

%cd examples/pipelinedp4j
%bazelisk build ...

SparkExample:
bazel-bin/SparkExample --local-input-file-path="./netflix_data.csv" --local-output-file-path="./output/"

Starting calculations...
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
24/11/27 15:55:43 INFO SparkContext: Running Spark version 3.3.2
....
....
24/11/27 15:55:48 INFO SparkContext: Successfully stopped SparkContext
Finished calculations.
pipelinedp4j % cat ./output/part-00000-867b9339-e539-47b5-a9fd-d7140d441a1f-c000.txt 

movieId=4506, numberOfViewers=6854, numberOfViews=6841, averageOfRatings=3.9186866446697803
movieId=4505, numberOfViewers=234, numberOfViews=235, averageOfRatings=2.509885102993678
movieId=4503, numberOfViewers=1770, numberOfViews=1802, averageOfRatings=3.20902639675554
movieId=4500, numberOfViewers=257, numberOfViews=242, averageOfRatings=3.0546646050432593
movieId=4501, numberOfViewers=578, numberOfViews=594, averageOfRatings=3.107717355258994

@RamSaw RamSaw assigned RamSaw and unassigned RamSaw Dec 2, 2024
@RamSaw RamSaw self-requested a review December 2, 2024 12:04
Copy link
Collaborator

@RamSaw RamSaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, Saket, for your contribution!

Copy link
Collaborator

@RamSaw RamSaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we didn't run Google formatter. Could you apply it? I created two patches, apply them sequentially (Format_files.patch first and then Format_files_2.patch).

Format_files.patch
Format_files_2.patch

@RamSaw
Copy link
Collaborator

RamSaw commented Dec 2, 2024

I used https://github.com/google/google-java-format for java and https://github.com/facebook/ktfmt for Kotlin. I will add them into GitHub Actions and also add maybe an easy way to install them and apply to the code right before committing the changes.

@sakkumar
Copy link
Contributor Author

sakkumar commented Dec 2, 2024

I realized that we didn't run Google formatter. Could you apply it? I created two patches, apply them sequentially (Format_files.patch first and then Format_files_2.patch).

Format_files.patch Format_files_2.patch

Done

@RamSaw RamSaw merged commit 1dfe8f9 into google:main Dec 5, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants