Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • pygolang pygolang
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1
    • Merge requests 1
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • nexedinexedi
  • pygolangpygolang
  • Merge requests
  • !19

golang_str: Speedup utf-8 decoding a bit on py2

  • Review changes

  • Download
  • Patches
  • Plain diff
Merged Kirill Smelkov requested to merge kirr/pygolang:y/bstr into master Oct 04, 2022
  • Overview 3
  • Commits 1
  • Changes 3

We recently moved our custom UTF-8 encoding/decoding routines to Cython. Now we can start taking speedup advantage on C level to make our own UTF-8 decoder a bit less horribly slow on py2:

name       old time/op  new time/op  delta
stddecode   752ns ± 0%   743ns ± 0%   -1.19%  (p=0.000 n=9+10)
udecode     216µs ± 0%    75µs ± 0%  -65.19%  (p=0.000 n=9+10)
stdencode   328ns ± 2%   327ns ± 1%     ~     (p=0.252 n=10+9)
bencode    34.1µs ± 1%  32.1µs ± 1%   -5.92%  (p=0.000 n=10+10)

So it is ~ 3x speedup for u(), but still significantly slower compared to std unicode.decode('utf-8').

Only low-hanging fruit here to make _utf_decode_rune a bit more prompt, since it sits in the most inner loop. In the future _utf8_decode_surrogateescape might be reworked as well to avoid constructing resulting unicode via py-level list of py-unicode character objects. And similarly for _utf8_encode_surrogateescape.

On py3 the performance of std and u/b decode/encode is approximately the same.

/cc @jerome

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: y/bstr
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7