Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warcio and perf #72

Closed
wants to merge 1 commit into from
Closed

Conversation

N0taN3rd
Copy link
Contributor

@N0taN3rd N0taN3rd commented Feb 19, 2019

Refactored warcio's loop string/byte concatenation heavy sections to gain performance increases by using in place concatenation via list/bytearray (https://docs.python.org/3/faq/programming.html#id37)

Changed the caught exception in try_brotli_init from ImportError to Exception due to finding causing pywb PR #444 (webrecorder/pywb#444)

…be gain performance increases (https://docs.python.org/3/faq/programming.html#id37)

Changed the caught exception in try_brotli_init from ImportError to Exception due to finding causing pywb PR #444 (webrecorder/pywb#444)
@N0taN3rd N0taN3rd marked this pull request as ready for review February 20, 2019 17:09
@codecov
Copy link

codecov bot commented Feb 20, 2019

Codecov Report

Merging #72 into develop will increase coverage by <.01%.
The diff coverage is 97.29%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #72      +/-   ##
===========================================
+ Coverage    98.69%   98.69%   +<.01%     
===========================================
  Files           15       15              
  Lines         1527     1536       +9     
  Branches       250      250              
===========================================
+ Hits          1507     1516       +9     
  Misses           1        1              
  Partials        19       19
Impacted Files Coverage Δ
warcio/capture_http.py 97.74% <100%> (+0.01%) ⬆️
warcio/bufferedreaders.py 99.48% <100%> (ø) ⬆️
warcio/statusandheaders.py 98.93% <95.45%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f533c8...8fb6994. Read the comment docs.

@wumpus
Copy link
Collaborator

wumpus commented Feb 24, 2019

This patch is a bit too much "I learned a new thing and I want to apply it" :-)

This is definitely an optimization that is good for things that are large -- payloads. But it's not something worth making url-building code more complicated -- urls are never large. Headers are never large. And it's hard to believe that hoisting constants in statusandheaders.py actually changes anything.

For the payload handling in bufferedreaders.py:read(), you switched from the "make a list of immutable things and then join the list" idiom to a bytearray. Is that actually faster? self.buff.read() already built the byte object. The join can precompute the size of the final thing, so that should be faster than += to a bytearray, which will have to extend the bytearray repeatedly because it doesn't know the final size.

The only change that's a clear perf win is bufferedreaders.py:_try_decode(). In the current code it is doing the thing that is quadratic. Personally, I would implement that as a list of byte objects and end with a join, similar to what bufferedreaders.py:read()'s existing code does

@N0taN3rd N0taN3rd closed this Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants