aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAshe Connor <ashe@kivikakk.ee>2018-03-07 11:41:46 +1100
committerAshe Connor <ashe@kivikakk.ee>2018-03-07 12:58:02 +1100
commite52ab312069a9af0c37c1666141752f3bc805054 (patch)
tree4ba840a506256a5c7c3ebcabda9e2031d1c5af6e
parente126078a0e013acfe0a397a8dad33b2c9de78732 (diff)
downloadrails-e52ab312069a9af0c37c1666141752f3bc805054.tar.gz
rails-e52ab312069a9af0c37c1666141752f3bc805054.tar.bz2
rails-e52ab312069a9af0c37c1666141752f3bc805054.zip
URI.unescape handles mixed Unicode/escaped input
Previously, URI.enscape could handle Unicode input (without any actual escaped characters), or input with escaped characters (but no actual Unicode characters) - not both. URI.unescape("\xe3\x83\x90") # => "バ" URI.unescape("%E3%83%90") # => "バ" URI.unescape("\xe3\x83\x90%E3%83%90") # => # Encoding::CompatibilityError We need to let `gsub` handle this for us, and then force back to the original encoding of the input. The result String will be mangled if the percent-encoded characters don't conform to the encoding of the String itself, but that goes without saying. Signed-off-by: Ashe Connor <ashe@kivikakk.ee>
-rw-r--r--activesupport/CHANGELOG.md10
-rw-r--r--activesupport/lib/active_support/core_ext/uri.rb2
-rw-r--r--activesupport/test/core_ext/uri_ext_test.rb2
3 files changed, 12 insertions, 2 deletions
diff --git a/activesupport/CHANGELOG.md b/activesupport/CHANGELOG.md
index a7af51f83e..9351a75dfa 100644
--- a/activesupport/CHANGELOG.md
+++ b/activesupport/CHANGELOG.md
@@ -1,5 +1,15 @@
## Rails 6.0.0.alpha (Unreleased) ##
+* Fix bug where `URI.unscape` would fail with mixed Unicode/escaped character input:
+
+ URI.unescape("\xe3\x83\x90") # => "バ"
+ URI.unescape("%E3%83%90") # => "バ"
+ URI.unescape("\xe3\x83\x90%E3%83%90") # => Encoding::CompatibilityError
+
+ GH#32183
+
+ *Ashe Connor*, *Aaron Patterson*
+
* Add `:private` option to ActiveSupport's `Module#delegate`
in order to delegate methods as private:
diff --git a/activesupport/lib/active_support/core_ext/uri.rb b/activesupport/lib/active_support/core_ext/uri.rb
index c93c0b5c2d..c4ac0baa32 100644
--- a/activesupport/lib/active_support/core_ext/uri.rb
+++ b/activesupport/lib/active_support/core_ext/uri.rb
@@ -13,7 +13,7 @@ unless str == parser.unescape(parser.escape(str))
# YK: My initial experiments say yes, but let's be sure please
enc = str.encoding
enc = Encoding::UTF_8 if enc == Encoding::US_ASCII
- str.gsub(escaped) { |match| [match[1, 2].hex].pack("C") }.force_encoding(enc)
+ str.dup.force_encoding(Encoding::ASCII_8BIT).gsub(escaped) { |match| [match[1, 2].hex].pack("C") }.force_encoding(enc)
end
end
end
diff --git a/activesupport/test/core_ext/uri_ext_test.rb b/activesupport/test/core_ext/uri_ext_test.rb
index 8816b0d392..c0686bc720 100644
--- a/activesupport/test/core_ext/uri_ext_test.rb
+++ b/activesupport/test/core_ext/uri_ext_test.rb
@@ -9,6 +9,6 @@ class URIExtTest < ActiveSupport::TestCase
str = "\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E" # Ni-ho-nn-go in UTF-8, means Japanese.
parser = URI.parser
- assert_equal str, parser.unescape(parser.escape(str))
+ assert_equal str + str, parser.unescape(str + parser.escape(str).encode(Encoding::UTF_8))
end
end