aboutsummaryrefslogtreecommitdiffstats
path: root/activesupport/lib/active_support/multibyte/unicode.rb
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #12877 from aroben/extended-graphemesRafael França2015-12-311-13/+38
|\ | | | | Support extended grapheme clusters and UAX 29
| * Support extended grapheme clusters and UAX 29Adam Roben2013-11-131-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | http://www.unicode.org/reports/tr29/tr29-21.html is the version of UAX 29 that corresponds to Unicode 6.2.0. Unicode.unpack_graphemes now implements all the rules listed there, including the ones for extended grapheme clusters. I added a new optional test, test/multibyte_grapheme_break_conformance.rb, that is heavily based on test/multibyte_normalization_conformance.rb, which runs the Unicode test suite.
| * Refactor Unicode.unpack_graphemes slightlyAdam Roben2013-11-131-13/+23
| | | | | | | | This will make it easier to add the rest of the rules listed in UAX 29.
* | [ci skip] default_normalization_form accessing from UnicodeGaurav Sharma2015-09-291-1/+1
| |
* | File encoding is defaulted to utf-8 in Ruby >= 2.1Akira Matsuda2015-09-181-1/+0
| |
* | Update Unicode Version to 8.0.0Anshul Sharma2015-09-041-1/+1
| |
* | replace each with each_key when only the key is neededAaron Lasseigne2015-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using each_key is faster and more intention revealing. Calculating ------------------------------------- each 31.378k i/100ms each_key 33.790k i/100ms ------------------------------------------------- each 450.225k (± 7.0%) i/s - 2.259M each_key 494.459k (± 6.3%) i/s - 2.467M Comparison: each_key: 494459.4 i/s each: 450225.1 i/s - 1.10x slower
* | String#freeze optimizationsschneems2015-07-301-1/+1
| |
* | Freeze string literals when not mutated.schneems2015-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I wrote a utility that helps find areas where you could optimize your program using a frozen string instead of a string literal, it's called [let_it_go](https://github.com/schneems/let_it_go). After going through the output and adding `.freeze` I was able to eliminate the creation of 1,114 string objects on EVERY request to [codetriage](codetriage.com). How does this impact execution? To look at memory: ```ruby require 'get_process_mem' mem = GetProcessMem.new GC.start GC.disable 1_114.times { " " } before = mem.mb after = mem.mb GC.enable puts "Diff: #{after - before} mb" ``` Creating 1,114 string objects results in `Diff: 0.03125 mb` of RAM allocated on every request. Or 1mb every 32 requests. To look at raw speed: ```ruby require 'benchmark/ips' number_of_objects_reduced = 1_114 Benchmark.ips do |x| x.report("freeze") { number_of_objects_reduced.times { " ".freeze } } x.report("no-freeze") { number_of_objects_reduced.times { " " } } end ``` We get the results ``` Calculating ------------------------------------- freeze 1.428k i/100ms no-freeze 609.000 i/100ms ------------------------------------------------- freeze 14.363k (± 8.5%) i/s - 71.400k no-freeze 6.084k (± 8.1%) i/s - 30.450k ``` Now we can do some maths: ```ruby ips = 6_226k # iterations / 1 second call_time_before = 1.0 / ips # seconds per iteration ips = 15_254 # iterations / 1 second call_time_after = 1.0 / ips # seconds per iteration diff = call_time_before - call_time_after number_of_objects_reduced * diff * 100 # => 0.4530373333993266 miliseconds saved per request ``` So we're shaving off 1 second of execution time for every 220 requests. Is this going to be an insane speed boost to any Rails app: nope. Should we merge it: yep. p.s. If you know of a method call that doesn't modify a string input such as [String#gsub](https://github.com/schneems/let_it_go/blob/b0e2da69f0cca87ab581022baa43291cdf48638c/lib/let_it_go/core_ext/string.rb#L37) please [give me a pull request to the appropriate file](https://github.com/schneems/let_it_go/blob/b0e2da69f0cca87ab581022baa43291cdf48638c/lib/let_it_go/core_ext/string.rb#L37), or open an issue in LetItGo so we can track and freeze more strings. Keep those strings Frozen ![](https://www.dropbox.com/s/z4dj9fdsv213r4v/let-it-go.gif?dl=1)
* | String already respond_to scrub at Ruby 2.2Rafael Mendonça França2015-01-041-2/+1
| |
* | Update to Unicode 7.0.0Benjamin Fleischer2014-11-151-1/+1
| | | | | | | | | | | | | | | | 7.0.0 was released on June 16, 2014 http://unicode-inc.blogspot.com.ar/2014/10/unicode-version-70-complete-text-of.html ruby bin/generate_tables
* | As of Unicode 6.3, Mongolian Vowel Separator is not whitespaceMatthew Draper2014-09-151-1/+0
| | | | | | | | | | Ruby 2.2 knows this, and no longer matches it with [[:space:]], so it's not a good candidate for testing String#squish.
* | Preload UnicodeDatabase outside the loopAkira Matsuda2014-08-181-0/+1
| | | | | | | | | | | | This fixes random multibyte_chars_test fail under Ruby 1.9.3. I don't know why the tests fail. And I really don't know why this fixes. Maybe we need some more investigation...
* | formatAkira Matsuda2014-08-181-2/+1
| |
* | Prevent using String#scrub on RubiniusRobin Dupret2014-07-301-1/+2
| | | | | | | | | | | | Rubinius' has built-in support for String#scrub but it doesn't have yet support for ASCII-incompatible chars so for now, we should rely on the old implementation of #tidy_bytes.
* | Fix tidy_bytes for JRubyJustin Coyne2014-02-101-3/+3
| | | | | | | | | | The previous implementation was broken because JRuby (1.7.10) doesn't have a code converter for UTF-8 to UTF8-MAC.
* | use feature detection to decide which implementation to useAaron Patterson2014-02-081-1/+1
| | | | | | | | Decouple the code from the particular Ruby version.
* | Update to Unicode 6.3.0Norman Clarke2013-12-271-1/+1
| | | | | | | | | | | | 6.3.0 was released on September 30, 2013. http://unicode-inc.blogspot.com.ar/2013/09/announcing-unicode-standard-version-63.html
* | Use String#scrub when available to tidy bytesNorman Clarke2013-12-261-35/+35
|/
* Initializing Codepoint object with default valuesHitendra Singh2013-09-201-0/+7
|
* compatability => compatibilityVipul A M2013-05-261-3/+3
|
* Use ruby's Encoding support for tidy_bytesBurke Libbey2013-05-081-39/+19
| | | | | | | | | | The previous implementation was quite slow. This leverages some of the transcoding abilities built into Ruby 1.9 instead. It is roughly 96% faster. The roundtrip through UTF_8_MAC here is because ruby won't let you transcode from UTF_8 to UTF_8. I chose the closest encoding I could find as an intermediate.
* Update to latest Unicode data.Norman Clarke2013-02-101-1/+1
| | | | Release notes at: http://www.unicode.org/versions/Unicode6.2.0/
* Revert "Use flat_map { } instead of map {}.flatten"Santiago Pastorino2012-10-051-2/+2
| | | | | | | | | | | This reverts commit abf8de85519141496a6773310964ec03f6106f3f. We should take a deeper look to those cases flat_map doesn't do deep flattening. irb(main):002:0> [[[1,3], [1,2]]].map{|i| i}.flatten => [1, 3, 1, 2] irb(main):003:0> [[[1,3], [1,2]]].flat_map{|i| i} => [[1, 3], [1, 2]]
* Use flat_map { } instead of map {}.flattenSantiago Pastorino2012-10-051-2/+2
|
* update AS/log_subscriber and AS/multibyte docs [ci skip]Francesco Rodriguez2012-09-141-21/+31
|
* Avoid unnecessary catching of Exception instead of StandardError.Dylan Smith2012-06-171-1/+1
|
* removing unnecessary 'examples' noise from activesupportFrancesco Rodriguez2012-05-131-3/+0
|
* Update Unicode database to recently-released 6.1.Norman Clarke2012-02-031-1/+1
| | | | http://www.geek.com/articles/geek-pick/unicode-6-1-released-complete-with-emoji-characters-and-a-pile-of-poo-2012022/
* Implement Chars#swapcase.Norman Clarke2012-01-061-0/+8
|
* Use friendlier method names for upcasing/downcasingNorman Clarke2012-01-051-9/+17
|
* Use more descriptive method namesNorman Clarke2012-01-051-6/+6
|
* Replace Unicode.u_unpack with String#codepointsNorman Clarke2012-01-051-16/+3
|
* Remove "_codepoints" from compose/decomposeNorman Clarke2012-01-051-7/+7
|
* Update to Unicode 6.0Norman Clarke2012-01-051-1/+1
|
* Remove useless parensNorman Clarke2012-01-051-1/+1
|
* adds a couple of missing magic comments [fixes #1374]Xavier Noria2011-07-231-0/+1
|
* Active Support typos.R.T. Lechow2011-03-051-1/+1
|
* edit pass to apply API guideline wrt the use of "# =>" in example codeXavier Noria2010-07-301-4/+4
|
* Removes unused varsSantiago Pastorino2010-07-241-6/+5
| | | | Signed-off-by: José Valim <jose.valim@gmail.com>
* Update Unicode database to 5.2.0. [#5011 state:resolved]Norman Clarke2010-06-301-1/+1
| | | | Signed-off-by: José Valim <jose.valim@gmail.com>
* Use multibyte proxy class on 1.9, refactor Unicode.Norman Clarke2010-05-211-0/+393
Makes String#mb_chars on Ruby 1.9 return an instance of ActiveSupport::Multibyte::Chars to work around 1.9's lack of Unicode case folding. Refactors class methods from ActiveSupport::Multibyte::Chars into new Unicode module, adding other related functionality for consistency. [#4594 state:resolved] Signed-off-by: Jeremy Kemper <jeremy@bitsweat.net>