Check for performance regressions in testing suite #174

gsarma · 2015-07-02T19:41:41Z

With @travs:
Yes, this is super meta, but we want to make sure the testing suite itself isn't taking longer than it is supposed to. Casually, it looks like there is about 20% variation in the time it takes these to run on my machine- I don't really know why.

Open questions:

For individual tests, we can hard code timing information for catching performance regressions. What about in this case? Presumably, we don't want a build to fail, just because the tests took too long, but that information should be stored somewhere.

cheelee · 2015-07-03T07:42:26Z

@gsarma @travs Hi, you don't have to hard code timing information if you don't have to. Running the test with "python -m cProfile -o testPerf.out test.py" will give you a (possibly overly) detailed performance profile to help you get an idea of where the time is spent. I can try to find the mechanisms by which we can reduce the profiling overhead, but it currently stands at 10% overhead.

gsarma · 2015-07-03T14:12:46Z

Thanks Chee Wai- that's useful information! Maybe the bigger question then is to store this information and keep track of it over time? In the future, I think the right thing is to have some kind of anomaly detection framework so that we are notified if anything unusual happens. And as you say, this will allow us to not hard code timing information.

cheelee · 2015-07-04T14:59:53Z

@gsarma It ought to be possible. I can do a quick check of what people are using to store/process performance information gathered this way to support performance regression testing for their codes. The output from using "-m cProfile" is a binary data file that the cProfile module in Python provides an access API for accessing and displaying the data:

https://docs.python.org/2/library/profile.html (26.4.3)
http://pymotw.com/2/profile/

I'm assuming there ought to be a way to store the profile as database entries in some MySQL performance database for performance regression purposes. If they do not yet exist, this might be a very good impetus for me to design and create something of this nature. I had been thinking for some time now that such a tool would be a very useful addition to any software engineering workflow.

gsarma · 2015-07-04T15:31:58Z

@cheelee that sounds great! I think this should be part of a larger effort to have a web-based information dashboard for tracking tests across all repos.

@travs @slarson

cheelee · 2015-07-04T16:31:58Z

@gsarma Commercial systems do exist for performance regression via web dashboards, but the one I'm familiar with (NewRelic - http://newrelic.com/) is somewhat expensive, and holds your data. Might be nice to design an extensible open source skeleton for supporting something similar for groups like ours.

gsarma · 2015-07-04T16:34:27Z

@cheelee We could conceivably make a bare bones one to serve our needs with Jupyter notebooks.

cheelee · 2015-07-04T16:43:58Z

@gsarma hmmm this looks interesting. Thanks, I'll check it out! https://jupyter.org/

gsarma · 2015-07-04T16:47:15Z

@cheelee I think the most basic thing would be to simply have the .out files dumped somewhere that can be accessed from the web and write some simple tools to summarize the results. We'll need to figure out where to store these and then to setup an iPython server for OpenWorm.

@slarson @travs

gsarma · 2015-07-07T16:51:24Z

Suggestion from @travs:
https://www.pythonanywhere.com

cheelee · 2015-07-08T04:45:53Z

@gsarma That's a python development/execution cloud service. What we're gonna need is a way to store/process/present performance regression data. Some of my thoughts on this are:

Storage - depending on frequency, nature and size of data (I'll elaborate below,) we'll need some large-ish free hosting ... my guess is in the 50Gb to 100Gb range over the long-term. The 500Mb offered by pythonanywhere will not be anything close to sufficient.
Nature of data - I do not expect us to have to go past collecting profile information. This should limit our per-experiment (i.e., each test, not each battery of tests) to 50-100kb of data. Heaven help us if we find ourselves requiring detailed performance log traces.
Frequency of data collection - in the long term, I'm expecting this to be a per-week thing unless we extend this to collecting performance regression data of some extensive test of the scientifically production simulation of the C. Elegans worm itself - that may require a daily performance regression test.

So given those rough estimates above, we're talking about 30-50 tests a week, which amounts to roughly 5Mb per week. It is fairly optimistic so while I guess 500Mb kinda works (less than 2 years of data under the above regime,) I'd be far more comfortable with more. Someone's local machine, or some kind of larger generic free hosting might work. Thoughts?

gsarma · 2015-07-08T15:16:59Z

@cheelee thanks for the great analysis! I think @travs suggested Python Anywhere in the context of figuring how to host our own server for Jupyter notebooks. What would be the right way to do that?

In terms of hosting, this doesn't sound like it will be very expensive. Once we absolutely nail down what we need we can talk to Stephen.

travs · 2015-07-09T21:57:56Z

@cheelee @gsarma
At a glance, I think codespeed might be exactly what we're looking for...
Here's one implementation running for PyPy.

It's a django app, so that's 40+MB of overhead, but I think the framework it provides may outweigh the small amount of lost storage.

In any case, we can start on PythonAnywhere and scale accordingly.

cheelee · 2015-07-26T15:46:43Z

Argh! Dropped the ball on this one. I'll put this on my TODOs, and try out codespeed and see if it can be configured to our purposes.

cheelee · 2015-07-26T18:37:25Z

Alright, tested codespeed using its instructions and it sort of works. The necessary caveats:

The release version has not been updated in a while, and will not work with modern environments.
The trick is to use the current repository version (still maintained) and virtualenv to create a sandbox pip environment. The steps can be found in this gist: https://gist.github.com/cheelee/3423e1c580e079bf09e5

The next step is to get our data in the necessary json format, making changes to suit our needs as needed, and we should have a basic framework for regression analysis we can improve upon.

travs · 2015-07-27T16:21:47Z

@cheelee
Awesome stuff here! Have you and @kevcmk discussed where we could potentially deploy this? Is PythonAnywhere suitable, or should we go somewhere else?

cheelee · 2015-07-27T16:29:54Z

@travs Not yet. PythonAnywhere looks like a good place to start. Codespeed's workflow expects performance data generation to make use of codespeed too, so any initial deployment of this will be a prototype which we can use to craft something more suited to our needs. Right now I think @kevcmk and my plan is to quickly shoehorn the performance data we get from python's cProfile output into the json format codespeed's visualization/analysis unit expects. With this early prototype, we should be able to see enough to start coming up with ideas on crafting something that works better for us.

cheelee · 2015-07-27T16:42:44Z

Realized I should throw up some screenshots of their sample data to give people a feel for what kind of visualizations to expect from codespeed, and drive some discussion on what more we could/would like to see. This is run on my Mac, but it can be served from pythonanywhere I think.

travs · 2015-07-27T21:12:12Z

Ok that plan definitely makes sense to me too. As for the PyAnywhere side of things, I recently deployed a Flask app to that service with minimal trouble. I used the Flask counterpart to this django tutorial to get the wsgi server up and running.

@cheelee Would you be interested in giving the django setup a shot?

kevcmk · 2015-07-28T06:29:15Z

@travs are you using Flask for the webserver? Or are you planning on running apache/nginx

cheelee · 2015-07-28T07:49:12Z

@travs I can take a look. I'll first have to familiarize myself with PyAnywhere, and Django. Shouldn't be more than a few days! Do we have an OpenWorm account set up for PyAnywhere?

travs · 2015-07-28T16:51:58Z

@kevcmk To be clear here, the Flask app is a separate project, but I used Gunicorn as the webserver in that one locally, so when I pushed it to PythonAnywhere I just kept it (for now). PythonAnywhere can also use uWSGI+nginx to serve up the Flask app directly, so my gunicorn setup is kind of needless.

@cheelee I think that account I showed you a few weeks ago should be the one we use for now. If you need the creds again let me know!

cheelee · 2015-07-28T16:55:33Z

@travs Ah ok. Time Machine to the rescue! If I can't rescue it, I'll let you know! Thanks! Still chugging through that long Django tutorial, but I think I got the gist of what PythonAnywhere will support. I'll soon put codespeed on PythonAnywhere and try it out.

cheelee · 2015-07-29T03:58:41Z

@gsarma @travs @kevcmk Codespeed with sample data is up and running at my own PythonAnywhere account. You ought to be able to try it at http://cheewai1972.pythonanywhere.com/

Next step is to design a workflow for automatically getting performance data incrementally added to the tool for analysis and an updated display.

kevcmk · 2015-07-29T04:38:39Z

@cheelee Should we just follow their use case, using codespeed's REST routes to submit the data from Jenkins? Or do you see any advantages to writing our own inserts.

cheelee · 2015-07-29T05:49:11Z

@kevcmk I'm not entirely familiar with REST, and not at all with Jenkins. Would you mind giving me a summary of their features and capabilities? We can do this over our private gitter chat, and I can also spend a few hours tonight familiarizing myself with codespeed's relationship with using the REST model, and with Jenkins. My current thoughts are naively using codespeed as a template for creating our own tool which could involve quite a bit of effort. Anything to reach for low-hanging fruit without any unnecessary effort would be a great boon to getting a performance regression framework for OpenWorm off the ground :)

travs · 2015-07-29T14:40:11Z

@cheelee Awesome stuff! 😄

@kevcmk If I am understanding this correctly, using the builtin REST routes seems like a good idea. I wonder is there a way to submit data from TravisCI rather than Jenkins? I've only ever used the former myself.

slarson · 2015-07-29T18:14:20Z

@gsarma @cheelee @kevcmk Good stuff -- I've moved this issue under the milestone: "Make all functions return in 1 second or less".

@cheelee @kevcmk I wonder if we can use the functions named in the specific issues #42, #90, #21 as our first functions in PyOpenWorm to put under performance regression? This would help move the whole milestone forward as well as giving a specific focus in a lean way to this effort.

kevcmk · 2015-07-31T08:42:31Z

@travs Correction -- we'll run the submits from Travis, not Jenkins. And regarding the REST routes (also @cheelee ) Codespeed has a JSON api (/result/add/json/), I'll be sitting down to figure that out in the next couple have days.

@slarson I think that's a good idea. Chee and I will use those as a benchmark.

cheelee · 2015-07-31T08:57:51Z

@kevcmk Awesome! Meanwhile, I'm trying to get a handle on those specific tests (e.g. which tests, and how to run just those tests).

We don't really have to limit ourselves to running just those tests since your changes to the testing framework pretty much allows us to get performance profiles on every single one of those tests, but keeping our performance data compact and relevant might be the right way to go at first.

If anyone has the dibs on how I can run each test individually (am looking at the testing framework code right now to figure this out - this is not documented to the best of my knowledge)

travs · 2015-07-31T15:56:28Z

@cheelee to run an individual test you can do (from the project root):

# example: python -m unittest tests.test.TestCase.test_name
python -m unittest tests.test.NeuronTest.test_same_name_same_id

cheelee · 2015-08-01T02:16:54Z

@travs Thanks! I'll update the README.md documentation if that's not already done.

mwatts15 · 2019-05-30T02:21:51Z

Some good ideas here (particularly the use of Python's cProfile for tracking performance in a cross-platform way), but there's no real acceptance criteria. For now, the build time on Travis-CI is good enough for identifying regressions, so closing this issue.

gsarma added the test quality label Jul 7, 2015

gsarma mentioned this issue Jul 7, 2015

Create a web dashboard for tracking test results openworm/OpenWorm#218

Closed

travs added this to the Generic testing milestone milestone Jul 9, 2015

travs mentioned this issue Jul 15, 2015

Added performance profile generating instructions and script. #196

Merged

cheelee self-assigned this Jul 26, 2015

cheelee added the in progress label Jul 26, 2015

slarson modified the milestones: Make all functions return in 1 second or less, General test improvements Jul 29, 2015

mwatts15 removed the in progress label Mar 21, 2017

mwatts15 closed this as completed May 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for performance regressions in testing suite #174

Check for performance regressions in testing suite #174

gsarma commented Jul 2, 2015

cheelee commented Jul 3, 2015

gsarma commented Jul 3, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

gsarma commented Jul 7, 2015

cheelee commented Jul 8, 2015

gsarma commented Jul 8, 2015

travs commented Jul 9, 2015

cheelee commented Jul 26, 2015

cheelee commented Jul 26, 2015

travs commented Jul 27, 2015

cheelee commented Jul 27, 2015

cheelee commented Jul 27, 2015

travs commented Jul 27, 2015

kevcmk commented Jul 28, 2015

cheelee commented Jul 28, 2015

travs commented Jul 28, 2015

cheelee commented Jul 28, 2015

cheelee commented Jul 29, 2015

kevcmk commented Jul 29, 2015

cheelee commented Jul 29, 2015

travs commented Jul 29, 2015

slarson commented Jul 29, 2015

kevcmk commented Jul 31, 2015

cheelee commented Jul 31, 2015

travs commented Jul 31, 2015

cheelee commented Aug 1, 2015

mwatts15 commented May 30, 2019

Check for performance regressions in testing suite #174

Check for performance regressions in testing suite #174

Comments

gsarma commented Jul 2, 2015

cheelee commented Jul 3, 2015

gsarma commented Jul 3, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

cheelee commented Jul 4, 2015

gsarma commented Jul 4, 2015

gsarma commented Jul 7, 2015

cheelee commented Jul 8, 2015

gsarma commented Jul 8, 2015

travs commented Jul 9, 2015

cheelee commented Jul 26, 2015

cheelee commented Jul 26, 2015

travs commented Jul 27, 2015

cheelee commented Jul 27, 2015

cheelee commented Jul 27, 2015

travs commented Jul 27, 2015

kevcmk commented Jul 28, 2015

cheelee commented Jul 28, 2015

travs commented Jul 28, 2015

cheelee commented Jul 28, 2015

cheelee commented Jul 29, 2015

kevcmk commented Jul 29, 2015

cheelee commented Jul 29, 2015

travs commented Jul 29, 2015

slarson commented Jul 29, 2015

kevcmk commented Jul 31, 2015

cheelee commented Jul 31, 2015

travs commented Jul 31, 2015

cheelee commented Aug 1, 2015

mwatts15 commented May 30, 2019