Here is some Spyne/SOAP code where I'm returning one or two strings. Is there a way to avoid having to set string2 to None to indicate that it's not present?
class TestResult(ComplexModel):
"""
"""
string1 = Unicode
string2 = Unicode(min_occurs=0, max_occurs=1)
def __init__(self):
# Say hello
self.string1 = "Hello World"
# Either of the following works
#
# self.string2 = "...from Paul"
# self.string2 = None
#
# But omitting them doesn't.
class Test(ServiceBase):
"""
"""
#srpc(Unicode, _returns=TestResult)
def Test(Query):
"""
"""
response = TestResult()
return response
I don't understand the question really, please clarify.
string2 will be returned explicitly as xsi:null when it has min_occurs=1 if that's what you're asking.
string2 is optional (min_occurs=0) but I have to explicitly set it to 'None' to have it omitted completely from the XML response. I am asking is there something that I'm missing that would allow me to have the code as written (i.e. string2 never explicitly set) and generate a valid response.
The background is that I'm having to support some XML where responses have large numbers of optional fields but I'm having to set a long list of them tom NULL - so I'm hoping I can be lazy and find a way to avoid having to do this ;-).
Related
I have a controller with PUT method:
class UtilsController < ActionController::API
def update_user_password
email = params[:email]
password = params[:password]
new_password = params[:new_password]
puts("'#{password}'")
end
end
and use openapi/javascript and Postman to send password nRP63P#$
and in console it logs nRP63P\#$
Is params of controller escaped, and how to get real value? URI.unescape?
Reference is welcome.
Thank you.
Please compare output with p and puts
password = 'nRP63P#$'
p password # will print "nRP63P\#$"
puts password # will print nRP63P#$
You can unescape value with tools of Rack
I think you were looking exactly for this Rack::Utils#unescape method.
So in your case it will be
unescaped_password = Rack::Utils.unescape(params[:password])
Keep in mind that, ruby output by default escapes some special chars # is one of them, cause it is indicator of comment, in order to evaluate further code ruby escapes it. So to verify yourself you need to write it to STDOUT or to a file.
puts unescaped_password
File.open('test.txt', 'w') { |f| f.write(unescaped_password)
and then inspect your result.
Do not use Kernel#pp or Kernel#p or String#inspect cause all of them will print values with special chars escaped by ruby itself and can mislead you.
I have a string as given below,
./component/unit
and need to split to get result as component/unit which I will use this as key for inserting hash.
I tried with .split(/.\//).last but its giving result as unit only not getting component/unit.
I think, this should help you:
string = './component/unit'
string.split('./')
#=> ["", "component/unit"]
string.split('./').last
#=> "component/unit"
Your regex was almost fine :
split(/\.\//)
You need to escape both . (any character) and / (regex delimiter).
As an alternative, you could just remove the first './' substring :
'./component/unit'.sub('./','')
#=> "component/unit"
All the other answers are fine, but I think you are not really dealing with a String here but with a URI or Pathname, so I would advise you to use these classes if you can. If so, please adjust the title, as it is not about do-it-yourself-regexes, but about proper use of the available libraries.
Link to the ruby doc:
https://docs.ruby-lang.org/en/2.1.0/URI.html
and
https://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html
An example with Pathname is:
require 'pathname'
pathname = Pathname.new('./component/unit')
puts pathname.cleanpath # => "component/unit"
# pathname.to_s # => "component/unit"
Whether this is a good idea (and/or using URI would be cool too) also depends on what your real problem is, i.e. what you want to do with the extracted String. As stated, I doubt a bit that you are really intested in Strings.
Using a positive lookbehind, you could do use regex:
reg = /(?<=\.\/)[\w+\/]+\w+\z/
Demo
str = './component'
str2 = './component/unit'
str3 = './component/unit/ruby'
str4 = './component/unit/ruby/regex'
[str, str2, str3, str4].each { |s| puts s[reg] }
#component
#component/unit
#component/unit/ruby
#component/unit/ruby/regex
Say that we have the following text
example abc http://www.example.com
I know how to replace example by some text for instance. But, when I do that, how can I tell the program NOT to substitute the example in the URL?
UPDATE
#kiddorails reminded me of a known trick to work around a missing variable-width look-behind that can be implemented in Ruby as well. However, the regex used by #kiddorails will not replace example before the URL. Also, it is not dynamic.
Here is a function that will replace specific words (whole word mode is enforced by using \bs, but they can be removed in case you need to match strings with non-word leading and trailing characters) not in an URL even if they contain symbols that must be escaped in a regex:
def removeOutsideOfURL(word, input)
rx = Regexp.new("(?i)\\b" + Regexp.escape(word.reverse) + "(?!\\S+ptth\\b)")
return input.reverse.gsub(rx,"").reverse
end
puts removeOutsideOfURL("example", "example def http://www.example.com with a new example")
Output of a sample program:
def http://www.example.com with a new
ORIGINAL ANSWER
For this concrete example and context, you can use (?<!http:\/\/www\.)example/:
puts "example def http://www.example.com".gsub(/(?<!http:\/\/www\.)example/, '')
>> def http://www.example.com
Demo on IDEONE
You can add more look-behinds to set more conditions, e.g. /(?<!http:\/\/www\.)(?<!http:\/\/)example/ to also keep example straight after http://.
Or, you can also check for periods on both ends:
(?<!\.)example(?!\.)
You can use sub:
"example def http://www.example.com".sub("example","")
Result:
" def http://www.example.com"
Update: Pointers by #stribizhev :)
For this particular use case, I will go along with negative lookbehind regex as #stribizhev used above.
But there is one gotcha with negative lookbehind regex - It only accepts fixed length regex.
So, if urls are like: http://example.com or http://www.example.com, the check can either pass for first case or last.
I suggest this approach - reverse the url, use negative lookahead regex and substitute reverse of "example" in your string. Here is the demo below:
regex = /elpmaxe(?!\S+ptth)/
str1 = "example http://example.com"
str2 = "example http://www.example.com"
str3 = "foo example http://wwww.someexampleurl.com"
str4 = "example def http://www.example.com with a new example"
[str1, str2, str3, str4].map do |str|
str.reverse.gsub(regex, '').reverse
end
#=>[" http://example.com",
" http://www.example.com",
"foo http://wwww.someexampleurl.com",
" def http://www.example.com with a new "]
I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like:
/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/
I'm matching it to my string producing a MatchData where exactly one token is parsed:
bigregex =~ "\n ... garbage"
puts $~.inspect
Which outputs
#<MatchData
"\n"
__anonymous_-1038694222803470993:"\n"
__anonymous_-1394418499721420065:nil
__anonymous_3077187815313752157:nil
LET:nil
IN:nil
CLASS:nil
DEF:nil
DEFM:nil
MULTICLASS:nil
FUNCNAME:nil
ID:nil
STRING:nil
NUMBER:nil>
So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically).
I could not find any option other than iterating over #names:
m.names.each do |n|
if m[n]
type = n.to_sym
resolved_type = (n.start_with?('__anonymous_') ? nil : type)
val = m[n]
break
end
end
which verifies that the match group did have a match.
The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the #input[#pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the #pos in it).
You can check the full code at GH repo.
Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?
You can do this using the regexp methods .captures() and .names():
matching_string = "\n ...garbage" # or whatever this really is in your code
#input = matching_string.match bigregex # bigregex = your regex
arr = #input.captures
arr.each_with_index do |value, index|
if not value.nil?
the_name_you_want = #input.names[index]
end
end
Or if you expect multiple successful values, you could do:
success_names_arr = []
success_names_arr.push(#input.names[index]) #within the above loop
Pretty similar to your original idea, but if you're looking for efficiency .captures() method should help with that.
I may have misunderstood this completely but but I'm assuming that all but one token is not nil and that's the one your after?
If so then, depending on the flavour of regex you're using, you could use a negative lookahead to check for a non-nil value
([^\n:]+:(?!nil)[^\n\>]+)
This will match the whole token ie NAME:value.
I'm working with mails, and names and subjects sometimes come q-encoded, like this:
=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?=
Is there a way to decode them in Ruby? It seems TMail should take care of it, but it's not doing it.
I use this to parse email subjects:
You could try the following:
str = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
if m = /=\?([A-Za-z0-9\-]+)\?(B|Q)\?([!->#-~]+)\?=/i.match(str)
case m[2]
when "B" # Base64 encoded
decoded = Base64.decode64(m[3])
when "Q" # Q encoded
decoded = m[3].unpack("M").first.gsub('_',' ')
else
p "Could not find keyword!!!"
end
Iconv.conv('utf-8',m[1],decoded) # to convert to utf-8
end
Ruby includes a method of decoding Quoted-Printable strings:
puts "Pablo_Fern=C3=A1ndez".unpack "M"
# => Pablo_Fernández
But this doesn't seem to work on your entire string (including the =?UTF-8?Q? part at the beginning. Maybe you can work it out from there, though.
This is a pretty old question but TMail::Unquoter (or its new incarnation Mail::Encodings) does the job as well.
TMail::Unquoter.unquote_and_convert_to(str, 'utf-8' )
or
Mail::Encodings.unquote_and_convert_to( str, 'utf-8' )
Decoding on a line-per-line basis:
line.unpack("M")
Convert STDIN or file provided input of encoded strings into a decoded output:
if ARGV[0]
lines = File.read(ARGV[0]).lines
else
lines = STDIN.each_line.to_a
end
puts lines.map { |c| c.unpack("M") }.join
This might help anyone wanting to test an email. delivery.html_part is normally encoded, but can be decoded to a straight HTML body using .decoded.
test "email test" do
UserMailer.confirm_email(user).deliver_now
assert_equal 1, ActionMailer::Base.deliveries.size
delivery = ActionMailer::Base.deliveries.last
assert_equal "Please confirm your email", delivery.subject
assert delivery.html_part.decoded =~ /Click the link below to confirm your email/ # DECODING HERE
end
The most efficient and up to date solution it seems to use the value_decode method of the Mail gem.
> Mail::Encodings.value_decode("=?UTF-8?Q?Greg_of_Google?=")
=> "Greg of Google"
https://www.rubydoc.info/github/mikel/mail/Mail/Encodings#value_decode-class_method
Below is Ruby code you can cut-and-paste, if inclined. It will run tests if executed directly with ruby, ruby ./copy-pasted.rb. As done in the code, I use this module as a refinement to the String core class.
A few remarks on the solution:
Other solutions perform .gsub('_', ' ') on the unpacked string. However, I do not believe this is correct, and can result in an incorrect decoding depending on the charsets. RFC2047 Section 4.2 (2) indicates "_ always represents hexidecimal 20", so it seems correct to first substitute =20 for _ then rely on the unpack result. (This also makes the implementation more elegant.) This is also discussed in an answer to a related question.
To be more instructive, I have written the regular expression in free-spacing mode to allow comments (I find this generally helpful for complex regular expressions). If you adjust the regular expression, take note that free-spacing mode changes the matching of white-space, which must then be done escaped or as a character class (as in the code). I've also added the regular expression on regex101, so you can read an explanation of the named capture groups, lazy quantifiers, etc. and experiment yourself.
The regular expression will absorb space ( ; but not TAB or newline) between multiple Q-encoded phrases in a single string, as shown with string test_4. This is because RFC2047 Section 5 (1) indicates that multiple Q encoded phrases must be separated from each other by linear white-space. Depending on your use-case, absorbing the white-space may not be desired.
The regular expression code named capture permits unexpected quoted printable codes (other than [bBqQ] so that a match will occur and the code can raise an error. This helps me to detect unexpected values when processing text. Change the regular expression named capture for code to [bBqQ] if you do not want this behaviour. (There will be no match and the original string will be returned.)
It makes use of the global Regexp.last_match as a convenience in the gsub block. You may need to take care if using this in multi-threaded code, I have not given this any consideration.
Additional references and reading:
https://en.wikipedia.org/wiki/Quoted-printable
https://en.wikipedia.org/wiki/MIME#Encoded-Word
require "minitest/autorun"
module QuotedPrintableDecode
class UnhandledCodeError < StandardError
def initialize(code)
super("Unhandled quoted printable code: '#{code}'.")
end
end
##qp_text_regex = %r{
=\? # Opening literal: `=?`
(?<charset>[^\?]+) # Character set, e.g. "Windows-1252" in `=?Windows-1252?`
\? # Literal: `?`
(?<code>[a-zA-Z]) # Encoding, e.g. "Q" in `?Q?` (`B`ase64); [BbQq] expected, others raise
\? # Literal: `?`
(?<text>[^\?]+?) # Encoded text, lazy (non-greedy) matched, e.g. "Foo_bar" in `?Foo_bar?`
\?= # Closing literal: `?=`
(?:[ ]+(?==\?))? # Optional separating linear whitespace if another Q-encode follows
}x # Free-spacing mode to allow above comments, also changes whitespace match
refine String do
def decode_q_p(to: "UTF-8")
self.gsub(##qp_text_regex) do
code, from, text = Regexp.last_match.values_at(:code, :charset, :text)
q_p_charset_to_charset(code, text, from, to)
end
end
private
def q_p_charset_to_charset(code, text, from, to)
case code
when "q", "Q"
text.gsub("_", "=20").unpack("M")
when "b", "B"
text.unpack("m")
else
raise UnhandledCodeError.new(code)
end.first.encode(to, from)
end
end
end
class TestQPDecode < Minitest::Test
using QuotedPrintableDecode
def test_decode_single_utf_8_phrase
encoded = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
assert_equal encoded.decode_q_p, "J. Pablo Fernández"
end
def test_decoding_preserves_space_between_unencoded_phrase
encoded = "=?utf-8?Q?Alfred_Sanford?= <me#example.com>"
assert_equal encoded.decode_q_p, "Alfred Sanford <me#example.com>"
end
def test_decodinge_multiple_adjacent_phrases_absorbs_separating_whitespace
encoded = "=?Windows-1252?Q?Foo_-_D?= =?Windows-1252?Q?ocument_World=9617=96520;_Recor?= =?Windows-1252?Q?d_People_to_C?= =?Windows-1252?Q?anada's_History?="
assert_equal encoded.decode_q_p, "Foo - Document World–17–520; Record People to Canada's History"
end
def test_decoding_string_without_encoded_phrases_preserves_original
encoded = "Contains no QP phrases"
assert_equal encoded.decode_q_p, encoded
end
def test_unhandled_code_raises
klass = QuotedPrintableDecode::UnhandledCodeError
message = "Unhandled quoted printable code: 'Z'."
encoded = "=?utf-8?Z?Unhandled code Z?="
raised_error = assert_raises(klass) { encoded.decode_q_p }
assert_equal message, raised_error.message
end
end