Multibyte character issue with .match? - ruby

The following code is something I am beginning to test for use within a "Texas Hold Em" style game I am working on.
My question is why, when running the following code, does the puts involving a "♥" return a "\u" in it's place. I feel certain it is this multibyte character that is causing the issue becuse on the second puts , I replaced the ♦ with a d in the array of strings and it returned what i was expecting. See Below:
My Code:
#! /usr/bin/env ruby
# encoding: utf-8
table_cards = ["|2♥|", "|8♥|", "|6d|", "|6♣|", "|Q♠|"]
# Array of cards
player_1_face_1 = "8"
player_1_suit_1 = "♦"
# Player 1's face and suit of first card he has
player_1_face_2 = "6"
player_1_suit_2 = "♥"
# Player 1's face and suit of second card he has
test_str_1 = /(\D8\D{2})/.match(table_cards.to_s)
# EX: Searching for match between face values on (player 1's |8♦|) and the |8♥| on the table
test_str_2 = /(\D6\D{2})/.match(table_cards.to_s)
# EX: Searching for match between face values on (player 1's |6♥|) and the |6d| on the table
puts "#{test_str_1}"
puts "#{test_str_2}"
Puts to Screen:
|8\u
|6d|
-- My goal would be to get the first puts to return: |8♥|
I am not so much looking for a solution to this (there may not even be one) but more so a "as simple as possible" explanation of what is causing this issue and why. Thanks ahead of time for any information on what is happening here and how I can tackle the goal.

The "\u" you're seeing is the Unicode string indicator.
For example, Unicode character 'HEAVY BLACK HEART' (U+2764) can be printed as "\u2764".
A friendly Unicode character listing site is http://unicode-table.com/en/sets/
Are you able to launch interactive Ruby in your shell and print a heart like this?
irb
irb> puts "\u2764"
❤
When I run your code in my Ruby, I get the answer you expect:
test_str_1 = /(\D8\D{2})/.match(table_cards.to_s)
=> #<MatchData "|8♥|" 1:"|8♥|">
What happens if you try a regex that is more specific to your cards?
test_str_1 = /(\|8[♥♦♣♠]\|)/.match(table_cards.to_s)
In your example output, you're not seeing the Unicode heart symbol as you want. Instead, your output is printing the "\u" which is the Unicode starter, but then not printing the rest of the expected string which is "2764".
See the comment by the Tin Man that describes encoding for your console. If he's correct, then I expect the more-specific regex will succeed, but still print the wrong output.
See the comment by David Knipe that says it looks like it gets truncated because the regex only matches 4 characters. If he's correct, then I expect the more-specific regex will succeed and also print the right output.
(The rest of this answer is typical for Unix; if you're on Windows, ignore the rest here...)
To show your system language settings, try this in your shell:
echo $LC_ALL
echo $LC_CTYPE
If they are not "UTF-8" or something like that, try this in your shell:
export LC_ALL=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
Then re-run your code -- be sure to use the same shell.
If this works, and you want to make this permanent, one way is to add these here:
# /etc/environment
LC_ALL=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
Then source that file from your .bashrc or .zshrc or whatever shell startup file you use.

Related

Way to create multiline comments in Bash?

I have recently started studying shell script and I'd like to be able to comment out a set of lines in a shell script. I mean like it is in case of C/Java :
/* comment1
comment2
comment3
*/`
How could I do that?
Use : ' to open and ' to close.
For example:
: '
This is a
very neat comment
in bash
'
Multiline comment in bash
: <<'END_COMMENT'
This is a heredoc (<<) redirected to a NOP command (:).
The single quotes around END_COMMENT are important,
because it disables variable resolving and command resolving
within these lines. Without the single-quotes around END_COMMENT,
the following two $() `` commands would get executed:
$(gibberish command)
`rm -fr mydir`
comment1
comment2
comment3
END_COMMENT
Note: I updated this answer based on comments and other answers, so comments prior to May 22nd 2020 may no longer apply. Also I noticed today that some IDE's like VS Code and PyCharm do not recognize a HEREDOC marker that contains spaces, whereas bash has no problem with it, so I'm updating this answer again.
Bash does not provide a builtin syntax for multi-line comment but there are hacks using existing bash syntax that "happen to work now".
Personally I think the simplest (ie least noisy, least weird, easiest to type, most explicit) is to use a quoted HEREDOC, but make it obvious what you are doing, and use the same HEREDOC marker everywhere:
<<'###BLOCK-COMMENT'
line 1
line 2
line 3
line 4
###BLOCK-COMMENT
Single-quoting the HEREDOC marker avoids some shell parsing side-effects, such as weird subsitutions that would cause crash or output, and even parsing of the marker itself. So the single-quotes give you more freedom on the open-close comment marker.
For example the following uses a triple hash which kind of suggests multi-line comment in bash. This would crash the script if the single quotes were absent. Even if you remove ###, the FOO{} would crash the script (or cause bad substitution to be printed if no set -e) if it weren't for the single quotes:
set -e
<<'###BLOCK-COMMENT'
something something ${FOO{}} something
more comment
###BLOCK-COMMENT
ls
You could of course just use
set -e
<<'###'
something something ${FOO{}} something
more comment
###
ls
but the intent of this is definitely less clear to a reader unfamiliar with this trickery.
Note my original answer used '### BLOCK COMMENT', which is fine if you use vanilla vi/vim but today I noticed that PyCharm and VS Code don't recognize the closing marker if it has spaces.
Nowadays any good editor allows you to press ctrl-/ or similar, to un/comment the selection. Everyone definitely understands this:
# something something ${FOO{}} something
# more comment
# yet another line of comment
although admittedly, this is not nearly as convenient as the block comment above if you want to re-fill your paragraphs.
There are surely other techniques, but there doesn't seem to be a "conventional" way to do it. It would be nice if ###> and ###< could be added to bash to indicate start and end of comment block, seems like it could be pretty straightforward.
After reading the other answers here I came up with the below, which IMHO makes it really clear it's a comment. Especially suitable for in-script usage info:
<< ////
Usage:
This script launches a spaceship to the moon. It's doing so by
leveraging the power of the Fifth Element, AKA Leeloo.
Will only work if you're Bruce Willis or a relative of Milla Jovovich.
////
As a programmer, the sequence of slashes immediately registers in my brain as a comment (even though slashes are normally used for line comments).
Of course, "////" is just a string; the number of slashes in the prefix and the suffix must be equal.
I tried the chosen answer, but found when I ran a shell script having it, the whole thing was getting printed to screen (similar to how jupyter notebooks print out everything in '''xx''' quotes) and there was an error message at end. It wasn't doing anything, but: scary. Then I realised while editing it that single-quotes can span multiple lines. So.. lets just assign the block to a variable.
x='
echo "these lines will all become comments."
echo "just make sure you don_t use single-quotes!"
ls -l
date
'
what's your opinion on this one?
function giveitauniquename()
{
so this is a comment
echo "there's no need to further escape apostrophes/etc if you are commenting your code this way"
the drawback is it will be stored in memory as a function as long as your script runs unless you explicitly unset it
only valid-ish bash allowed inside for instance these would not work without the "pound" signs:
1, for #((
2, this #wouldn't work either
function giveitadifferentuniquename()
{
echo nestable
}
}
Here's how I do multiline comments in bash.
This mechanism has two advantages that I appreciate. One is that comments can be nested. The other is that blocks can be enabled by simply commenting out the initiating line.
#!/bin/bash
# : <<'####.block.A'
echo "foo {" 1>&2
fn data1
echo "foo }" 1>&2
: <<'####.block.B'
fn data2 || exit
exit 1
####.block.B
echo "can't happen" 1>&2
####.block.A
In the example above the "B" block is commented out, but the parts of the "A" block that are not the "B" block are not commented out.
Running that example will produce this output:
foo {
./example: line 5: fn: command not found
foo }
can't happen
Simple solution, not much smart:
Temporarily block a part of a script:
if false; then
while you respect syntax a bit, please
do write here (almost) whatever you want.
but when you are
done # write
fi
A bit sophisticated version:
time_of_debug=false # Let's set this variable at the beginning of a script
if $time_of_debug; then # in a middle of the script
echo I keep this code aside until there is the time of debug!
fi
in plain bash
to comment out
a block of code
i do
:||{
block
of code
}

Weird behavior when changing line separator and then changing it back

I was following the advice from this question when trying to read in multi-line input from the command line:
# change line separator
$/ = 'END'
answer = gets
pp answer
However, I get weird behavior from STDIN#gets when I try to change $/ back:
# put it back to normal
$/ = "\n"
answer = gets
pp answer
pp 'magic'
This produces output like this when executed with Ruby:
$ ruby multiline_input_test.rb
this is
a multiline
awesome input string
FTW!!
END
"this is\n\ta multiline\n awesome input string\n \t\tFTW!!\t\nEND"
"\n"
"magic"
(I input up to the END and the rest is output by the program, then the program exits.)
It does not pause to get input from the user after I change $/ back to "\n". So my question is simple: why?
As part of a larger (but still small) application, I'm trying to devise a way of recording notes; as it is, this weird behavior is potentially devastating, as the rest of my program won't be able to function properly if I can't reset the line separator. I've tried all manner of using double- and single-quotes, but that doesn't seem to be the issue. Any ideas?
The problem you're having is that your input ends with END\n. Ruby sees the END, and there's still a \n left in the buffer. You do successfully set the input record separator back to \n, so that character is immediately consumed by the second gets.
You therefore have two easy options:
Set the input record separator to END\n (use double quotes in order to have the newline character work):
$/ = "END\n"
Clear the buffer with an extra call to gets:
$/ = 'END'
answer = gets
gets # Consume extra `\n`
I consider option 1 clearer.
This shows it working on my system using option 1:
$ ruby multiline_input_test.rb
this is
a multiline
awesome input string
FTW!!
END
"this is\n a multiline\n awesome input string\n FTW!!\nEND\n"
test
"test\n"
"magic"

Convert Hex STDIN / ARGV / gets to ASCII in ruby

my Question is how I can convert the STDIN of cmd ARGV or gets from hex to ascii
I know that if I assigned hex string to variable it'll be converted once I print it
ex
hex_var = "\x41\41\x41\41"
puts hex_var
The result will be
AAAA
but I need to get the value from command line by (ARGV or gets)
say I've this lines
s = ARGV
puts s
# another idea
puts s[0].gsub('x' , '\x')
then I ran
ruby gett.rb \x41\x41\x41\x41
I got
\x41\x41\x41\x41
is there a way to get it work ?
There are a couple problems you're dealing with here. The first you've already tried to address, but I don't think your solution is really ideal. The backslashes you're passing in with the command line argument are being evaluated by the shell, and are never making it to the ruby script. If you're going to simply do a gsub in the script, there's no reason to even pass them in. And doing it your way means any 'x' in the arguments will get swapped out, even those that aren't being used to indicate a hex. It would be better to double escape the \ in the argument if possible. Without context of where the values are coming from, it's hard to say with way would actually be better.
ruby gett.rb \\x41\\x41
That way ARGV will actually get '\x41\x41', which is closer to what you want.
It's still not exactly what you want, though, because ARGV arguments are created without expression substitution (as though they are in single quotes). So Ruby is escaping that \ even though you don't want it to. Essentially you need to take that and re-evaluate it as though it were in double quotes.
eval('"%s"' % s)
where s is the string.
So to put it all together, you could end up with either of these:
# ruby gett.rb \x41\x41
ARGV.each do |s|
s = s.gsub('x' , '\x')
p eval('"%s"' % s)
end
# => "AA"
# ruby gett.rb \\x41\\x41
ARGV.each do |s|
p eval('"%s"' % s)
end
# => "AA"
Backlashes entered in the console will be interpreted by the shell and will
not make it into your Ruby script, unless you enter two backlashes in a row,
in which case you script will get a literal backlash and no automatic
conversion of hexadecimal character codes following those backlashes.
You can convert these escaped codes to characters manually if you replace the last line of your script with this:
puts s.gsub(/\\x([[:xdigit:]]{1,2})/) { $1.hex.chr }
Then run it with double backlashed input:
$ ruby gett.rb \\x41\\x42\\x43
ABC
When fetching user input through gets or similar, only a single backslash will be need to be entered by the user for each character escape, since that will indeed be passed to your script as literal backslashes and thus handled correctly by the above gsub call.
An alternative way when parsing command line arguments would be to let the shell interpret the character escapes for you. How to do this will depend on what shell you are using. If using bash, it can be done
like this:
$ echo $'\x41\x42\x43'
ABC
$ ruby -e 'puts ARGV' $'\x41\x42\x43'
ABC

Rubular/Ruby discrepancy in captured text

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(
Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n
I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.
The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/

Ruby? How to ignore newlines in cut and pasted user input?

I've written a little Ruby script that requires some user input. I anticipate that users might be a little lazy at some point during the data entry where long entries are required and that they might cut and paste from another document containing newlines.
I've been playing with the Highline gem and quite like it. I suspect I am just missing something in the docs but is there a way to get variable length multiline input?
Edit: The problem is that the newline terminates that input and the characters after the newline end up as the input for the next question.
Here's what the author uses in his example: (from highline-1.5.0/examples)
#!/usr/local/bin/ruby -w
# asking_for_arrays.rb
#
# Created by James Edward Gray II on 2005-07-05.
# Copyright 2005 Gray Productions. All rights reserved.
require "rubygems"
require "highline/import"
require "pp"
grades = ask( "Enter test scores (or a blank line to quit):",
lambda { |ans| ans =~ /^-?\d+$/ ? Integer(ans) : ans} ) do |q|
q.gather = ""
end
say("Grades:")
pp grades
General documentation on HighLine::Question#gather (from highline-1.5.0/lib/highline/question.rb)
# When set, the user will be prompted for multiple answers which will
# be collected into an Array or Hash and returned as the final answer.
#
# You can set _gather_ to an Integer to have an Array of exactly that
# many answers collected, or a String/Regexp to match an end input which
# will not be returned in the Array.
#
# Optionally _gather_ can be set to a Hash. In this case, the question
# will be asked once for each key and the answers will be returned in a
# Hash, mapped by key. The <tt>#key</tt> variable is set before each
# question is evaluated, so you can use it in your question.
#
attr_accessor :gather
These seem to be your main options w/in the library. Anything else, you'd have to do yourself.
Wouldn't it be something like:
input.gsub!('\r\n', '')

Resources