| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| | |
Support extended grapheme clusters and UAX 29
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
http://www.unicode.org/reports/tr29/tr29-21.html is the version of UAX
29 that corresponds to Unicode 6.2.0. Unicode.unpack_graphemes now
implements all the rules listed there, including the ones for extended
grapheme clusters.
I added a new optional test,
test/multibyte_grapheme_break_conformance.rb, that is heavily based on
test/multibyte_normalization_conformance.rb, which runs the Unicode test
suite.
|
| |
| |
| |
| | |
This will make it easier to add the rest of the rules listed in UAX 29.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Using each_key is faster and more intention revealing.
Calculating -------------------------------------
each 31.378k i/100ms
each_key 33.790k i/100ms
-------------------------------------------------
each 450.225k (± 7.0%) i/s - 2.259M
each_key 494.459k (± 6.3%) i/s - 2.467M
Comparison:
each_key: 494459.4 i/s
each: 450225.1 i/s - 1.10x slower
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I wrote a utility that helps find areas where you could optimize your program using a frozen string instead of a string literal, it's called [let_it_go](https://github.com/schneems/let_it_go). After going through the output and adding `.freeze` I was able to eliminate the creation of 1,114 string objects on EVERY request to [codetriage](codetriage.com). How does this impact execution?
To look at memory:
```ruby
require 'get_process_mem'
mem = GetProcessMem.new
GC.start
GC.disable
1_114.times { " " }
before = mem.mb
after = mem.mb
GC.enable
puts "Diff: #{after - before} mb"
```
Creating 1,114 string objects results in `Diff: 0.03125 mb` of RAM allocated on every request. Or 1mb every 32 requests.
To look at raw speed:
```ruby
require 'benchmark/ips'
number_of_objects_reduced = 1_114
Benchmark.ips do |x|
x.report("freeze") { number_of_objects_reduced.times { " ".freeze } }
x.report("no-freeze") { number_of_objects_reduced.times { " " } }
end
```
We get the results
```
Calculating -------------------------------------
freeze 1.428k i/100ms
no-freeze 609.000 i/100ms
-------------------------------------------------
freeze 14.363k (± 8.5%) i/s - 71.400k
no-freeze 6.084k (± 8.1%) i/s - 30.450k
```
Now we can do some maths:
```ruby
ips = 6_226k # iterations / 1 second
call_time_before = 1.0 / ips # seconds per iteration
ips = 15_254 # iterations / 1 second
call_time_after = 1.0 / ips # seconds per iteration
diff = call_time_before - call_time_after
number_of_objects_reduced * diff * 100
# => 0.4530373333993266 miliseconds saved per request
```
So we're shaving off 1 second of execution time for every 220 requests.
Is this going to be an insane speed boost to any Rails app: nope. Should we merge it: yep.
p.s. If you know of a method call that doesn't modify a string input such as [String#gsub](https://github.com/schneems/let_it_go/blob/b0e2da69f0cca87ab581022baa43291cdf48638c/lib/let_it_go/core_ext/string.rb#L37) please [give me a pull request to the appropriate file](https://github.com/schneems/let_it_go/blob/b0e2da69f0cca87ab581022baa43291cdf48638c/lib/let_it_go/core_ext/string.rb#L37), or open an issue in LetItGo so we can track and freeze more strings.
Keep those strings Frozen
![](https://www.dropbox.com/s/z4dj9fdsv213r4v/let-it-go.gif?dl=1)
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
7.0.0 was released on June 16, 2014
http://unicode-inc.blogspot.com.ar/2014/10/unicode-version-70-complete-text-of.html
ruby bin/generate_tables
|
| |
| |
| |
| |
| | |
Ruby 2.2 knows this, and no longer matches it with [[:space:]], so it's
not a good candidate for testing String#squish.
|
| |
| |
| |
| |
| |
| | |
This fixes random multibyte_chars_test fail under Ruby 1.9.3.
I don't know why the tests fail. And I really don't know why this fixes.
Maybe we need some more investigation...
|
| | |
|
| |
| |
| |
| |
| |
| | |
Rubinius' has built-in support for String#scrub but it doesn't have yet
support for ASCII-incompatible chars so for now, we should rely on the
old implementation of #tidy_bytes.
|
| |
| |
| |
| |
| | |
The previous implementation was broken because JRuby (1.7.10) doesn't
have a code converter for UTF-8 to UTF8-MAC.
|
| |
| |
| |
| | |
Decouple the code from the particular Ruby version.
|
| |
| |
| |
| |
| |
| | |
6.3.0 was released on September 30, 2013.
http://unicode-inc.blogspot.com.ar/2013/09/announcing-unicode-standard-version-63.html
|
|/ |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation was quite slow. This leverages some of the
transcoding abilities built into Ruby 1.9 instead. It is roughly 96%
faster.
The roundtrip through UTF_8_MAC here is because ruby won't let you
transcode from UTF_8 to UTF_8. I chose the closest encoding I could
find as an intermediate.
|
|
|
|
| |
Release notes at: http://www.unicode.org/versions/Unicode6.2.0/
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit abf8de85519141496a6773310964ec03f6106f3f.
We should take a deeper look to those cases flat_map doesn't do deep
flattening.
irb(main):002:0> [[[1,3], [1,2]]].map{|i| i}.flatten
=> [1, 3, 1, 2]
irb(main):003:0> [[[1,3], [1,2]]].flat_map{|i| i}
=> [[1, 3], [1, 2]]
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
http://www.geek.com/articles/geek-pick/unicode-6-1-released-complete-with-emoji-characters-and-a-pile-of-poo-2012022/
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: José Valim <jose.valim@gmail.com>
|
|
|
|
| |
Signed-off-by: José Valim <jose.valim@gmail.com>
|
|
Makes String#mb_chars on Ruby 1.9 return an instance of ActiveSupport::Multibyte::Chars to work around 1.9's lack of Unicode case folding.
Refactors class methods from ActiveSupport::Multibyte::Chars into new Unicode module, adding other related functionality for consistency.
[#4594 state:resolved]
Signed-off-by: Jeremy Kemper <jeremy@bitsweat.net>
|