From 54243fecfc8867c79aa4adbfd9948ecd8dcbd8fc Mon Sep 17 00:00:00 2001 From: schneems Date: Wed, 20 Apr 2016 15:39:45 -0500 Subject: Speed up String#blank? Regex MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow up on https://github.com/rails/rails/commit/697384df36a939e565b7c08725017d49dc83fe40#commitcomment-17184696. The regex to detect a blank string `/\A[[:space:]]*\z/` will loop through every character in the string to ensure that all of them are a `:space:` type. We can invert this logic and instead look for any non-`:space:` characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking. Thanks @nellshamrell for the regex talk at LSRC. By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump. Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default. ```ruby require 'benchmark/ips' def string_generate str = " abcdefghijklmnopqrstuvwxyz\t".freeze str[rand(0..(str.length - 1))] * rand(0..23) end strings = 100.times.map { string_generate } ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/ Benchmark.ips do |x| x.report('current regex ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } } x.report('+ instead of * ') { strings.each {|str| str.empty? || /\A[[:space:]]+\z/ === str } } x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } } x.compare! end # Warming up -------------------------------------- # current regex # 1.744k i/100ms # not a non-whitespace char # 2.264k i/100ms # Calculating ------------------------------------- # current regex # 18.078k (± 8.9%) i/s - 90.688k # not a non-whitespace char # 23.580k (± 7.1%) i/s - 117.728k # Comparison: # not a non-whitespace char: 23580.3 i/s # current regex : 18078.2 i/s - 1.30x slower ``` This makes the method roughly 30% faster `(23.580 - 18.078)/18.078 * 100`. cc/ @fxn --- activesupport/lib/active_support/core_ext/object/blank.rb | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'activesupport') diff --git a/activesupport/lib/active_support/core_ext/object/blank.rb b/activesupport/lib/active_support/core_ext/object/blank.rb index 71d411b6d6..f7efa1e01a 100644 --- a/activesupport/lib/active_support/core_ext/object/blank.rb +++ b/activesupport/lib/active_support/core_ext/object/blank.rb @@ -112,12 +112,9 @@ class String # # @return [true, false] def blank? - # In practice, the majority of blank strings are empty. As of this writing - # checking for empty? is about 3.5x faster than matching against the regexp - # in MRI, so we call the predicate first, and then fallback. - # - # The penalty for blank strings with whitespace or present ones is marginal. - empty? || BLANK_RE === self + # Regex check is slow, only check non-empty strings. + # A string not blank if it contains a single non-space string. + empty? || !(/[[:^space:]]/ === self) end end -- cgit v1.2.3