warcio and perf #72

N0taN3rd · 2019-02-19T22:52:05Z

Refactored warcio's loop string/byte concatenation heavy sections to gain performance increases by using in place concatenation via list/bytearray (https://docs.python.org/3/faq/programming.html#id37)

Changed the caught exception in try_brotli_init from ImportError to Exception due to finding causing pywb PR #444 (webrecorder/pywb#444)

…be gain performance increases (https://docs.python.org/3/faq/programming.html#id37) Changed the caught exception in try_brotli_init from ImportError to Exception due to finding causing pywb PR #444 (webrecorder/pywb#444)

codecov · 2019-02-20T17:24:04Z

Codecov Report

Merging #72 into develop will increase coverage by <.01%.
The diff coverage is 97.29%.

@@             Coverage Diff             @@
##           develop      #72      +/-   ##
===========================================
+ Coverage    98.69%   98.69%   +<.01%     
===========================================
  Files           15       15              
  Lines         1527     1536       +9     
  Branches       250      250              
===========================================
+ Hits          1507     1516       +9     
  Misses           1        1              
  Partials        19       19

Impacted Files	Coverage Δ
warcio/capture_http.py	`97.74% <100%> (+0.01%)`	⬆️
warcio/bufferedreaders.py	`99.48% <100%> (ø)`	⬆️
warcio/statusandheaders.py	`98.93% <95.45%> (+0.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f533c8...8fb6994. Read the comment docs.

wumpus · 2019-02-24T20:43:10Z

This patch is a bit too much "I learned a new thing and I want to apply it" :-)

This is definitely an optimization that is good for things that are large -- payloads. But it's not something worth making url-building code more complicated -- urls are never large. Headers are never large. And it's hard to believe that hoisting constants in statusandheaders.py actually changes anything.

For the payload handling in bufferedreaders.py:read(), you switched from the "make a list of immutable things and then join the list" idiom to a bytearray. Is that actually faster? self.buff.read() already built the byte object. The join can precompute the size of the final thing, so that should be faster than += to a bytearray, which will have to extend the bytearray repeatedly because it doesn't know the final size.

The only change that's a clear perf win is bufferedreaders.py:_try_decode(). In the current code it is doing the thing that is quadratic. Personally, I would implement that as a list of byte objects and end with a join, similar to what bufferedreaders.py:read()'s existing code does

N0taN3rd force-pushed the perf-improvements branch from 1d3ab94 to 8fb6994 Compare February 20, 2019 17:07

N0taN3rd marked this pull request as ready for review February 20, 2019 17:09

N0taN3rd closed this Aug 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warcio and perf #72

warcio and perf #72

N0taN3rd commented Feb 19, 2019 •

edited

Loading

codecov bot commented Feb 20, 2019 •

edited

Loading

wumpus commented Feb 24, 2019

warcio and perf #72

warcio and perf #72

Conversation

N0taN3rd commented Feb 19, 2019 • edited Loading

codecov bot commented Feb 20, 2019 • edited Loading

Codecov Report

wumpus commented Feb 24, 2019

N0taN3rd commented Feb 19, 2019 •

edited

Loading

codecov bot commented Feb 20, 2019 •

edited

Loading