-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 2.0.0 support #108
Comments
With only the above change we get
By removing the call to render we can now build and run all of SparkPerf with Spark 2.0.0 (there's probably a better fix, I played around with the json4s import versions but without success). The files to change are
Pull request to follow |
All modules* built OK, code changes currently at https://github.com/a-roberts/spark-perf/commit/5f090fc2f1c272b839cee8965c77293d018c18d1 I'll sanity check this first by running all of the tests before contributing, noticed a few API changes we need to handle and I've also changed the configuration file to look for $SPARK_HOME instead of /root by default Still working on MLlib actually, in my commit nothing for this module is built (duration 0s!) |
I've updated my commit using the new APIs available in the latest Spark 2 code, I think we should either create a new branch for 2.0 or simply provide different defaults if we detect the user specifies Spark 2 (e.g. Scala 2.11.8 not Scala 2.10.x). I've verified all ML tests now function as expected This is currently relying on us having the jars from a recently built Spark 2 in the lib folder for all spark-perf projects - this is because the APIs have changed since the spark-2.0.0-preview artifact which is in Maven central and the requirement will be removed once spark-2.0.0 artifacts are available. Would appreciate having this reviewed, you can easily view my changes at master...a-roberts:master |
We've noticed a 30% geomean regression for Spark 2 and this SparkPerf vs Spark 1.5.2 and "normal" SparkPerf i.e. before this changeset, this is running with a low scale factor and the configuration below. Either my changes are a real disaster or we've noticed a significant performance regression, we can gather a 1.6.2 comparison but would like for my changes for the benchmark itself to be checked so we can rule out problems here. @pwendell as a top contributor to this project can you or anybody else familiar with the new Spark 2 APIs please review this changeset? Configuration used where we see the big regression:
Main changes I made
In Spark 2.0 the top five methods where we spend our time is as follows, the percentage is how much of the overall processing time was spent in this particular method:
and in 1.5.2 the top five methods are:
I see the following scores, on the left I have the test name followed by the 1.5.2 time and then the 2.0.0 time This is only running the Spark core tests (scheduling throughput through scala-count-w-filtr, including all inbetween). Will mention this on the mailing list as part of a general performance regression thread so this particular item remains focused on the Spark 2.0.0 changes i have made for SparkPerf, the goal is to have something stable to compare Spark releases with. |
I'm updating this to work with Spark 2 now that it's available and we don't need to use a snapshot or to build with an included version |
so now we need to clone and build new spark-perf to work with spark 2.0. |
All modules, my PR is at #115 |
but ,when I have gone to https://github.com/databricks/spark-perf.git and tried to clone master.I haven't found any commit for 2.0 |
That's because my change is a pull request that hasn't been merged, working on a small issue regarding the Spark version now with the mllib project as I see the travis-cl integration build failed, would be much appreciated if you clone my changes and see if you find any problems |
Hi,I have clone your changes and integrated it with Spark 2.0 and run Spark-Test.I have got proper results with no error.Only change that I need to do was no in config.py file were in place of MLLIB_SPARK_VERSION = 2.0.0 and need to keep MLLIB_SPARK_VERSION = 2.0 |
where i can clone your changes? |
Any update on the issues in this project? |
maybe spark-perf 2.0 Just replace some of the packages,and didn't paly the advantage of dataset |
I'm working on this and will submit a pull request once done, we face NoSuchMethodError problems once you try to run anything but scheduling-throughput
The fix for that is to modify spark-tests/project/SparkTestsBuild.scala - use 2.0.0-preview for org.apache.spark dependency version and Scala 2.11.8; specifically this resolves
which is triggered by
The text was updated successfully, but these errors were encountered: