Difference of answers while using split function in Ruby - ruby

Given the following inputs:
line1 = "Hey | Hello | Good | Morning"
line2 = "Hey , Hello , Good , Morning"
file1=length1=name1=title1=nil
Using ',' to split the string as follows:
file1, length1, name1, title1 = line2.split(/,\s*/)
I get the following output:
puts file1,length1,name1,title1
>Hey
>Hello
>Good
>Morning
However, using '|' to split the string I receive a different output:
file1, length1, name1, title1 = line2.split(/|\s*/)
puts file1,length1,name1,title1
>H
>e
>y
Both the strings are same except the separating symbol (a comma in first case and a pipe in second case). The format of the split function I am using is also the same except, of course, for the delimiting character. What causes this variation?

The problem is because | has the meaning of OR in regex. If you want literal character, then you need to escape it \|. So the correct regex should be /\|\s*/
Currently, the regex /|\s*/ means empty string or series of whitespace character. Since the empty string specified first in the OR, the regex engine will break the string up at every character (you can imagine that there is an empty string between characters). If you swap it to /\s*|/, then the whitespaces will be preferred over empty string where possible and there will be no white spaces in the list of tokens after splitting.

Related

Swap two characters in bash with tr or similar

I'm doing a bash script and I have a problem. I would like to change the position of two characters in a string.
My input is the following:
"aaaaa_eeeee"
The desired output is:
"eeeee_aaaaa"
I don't want to invert the string or anything else like that, what I need is to replace the character "a" by the "e" and the "e" by the "a". I have tried to make a echo "aaaaa_eeeee" | tr "a" "e . The first replacement is simple but the second one I don't know how to do it.
You can give multiple original and replacement characters to tr. Each character in the original string is replaced with the corresponding replacement character.
echo "aaaaa_eeeee" | tr "ae" "ea"
Pass Translation Sets as Arguments
To make the substitutions work in a single logical pass, you need to pass multiple characters to the tr utility. The man page for the BSD version of tr describes the use of translation sets as follows:
[T]he characters in string1 are translated into the characters in string2 where the first character in string1 is translated into the first character in string2 and so on. If string1 is longer than string2, the last character found in string2 is duplicated until string1 is exhausted.
For example:
$ tr "ae" "ea" <<< "aaaaa_eeeee"
eeeee_aaaaa
This maps a => e and e => a in a single logical pass, avoiding the issues that would result in trying to map the replacements sequentially.
This is a job for rev:
echo "aaaaa_eeeee"|rev
eeeee_aaaaa

Word after a particular word in a string

I have string say e.g. ab_abc_bbb_ccc_ssss_pppp, I want the word after ccc i.e. ssss from the string, how to achieve the same using unix command
you mean something like this?
echo "ab_abc_bbb_ccc_ssss_pppp" | sed 's/.*ccc_\([^_]*\).*/\1/'
explanation
s/ # substitute
.*ccc_ # find search pattern
\([^_]*\) # save all chars without '_' into arg1 (\1)
_.*/ # ignore trailing chars
\1/ # print \1
output
ssss

adding a colon after every two letters in an alphanumeric string in shell

So i have an alphanumeric string 10006cc2190ab011 i am trying to add a colon after every two letters in this alphanumeric string.
this is the string : 10006cc2190ab011
i want it be - 10:00:6c:c2:19:0a:b0:11
Thanks in advance.
A sed solution:
$ echo 10006cc2190ab011 | sed 's/../&:/g; s/:$//'
10:00:6c:c2:19:0a:b0:11
Replaces each non-overlapping pair of characters with the same pair plus :. In the end removes the trailing : (if input text had even length).
str=10006cc2190ab011; str="${str//??/${.sh.match}:}"; echo ${str%:}
is doing the same replacement without the use of an external command, just using ksh-internals.
Doing the same as in sed (the other answer). Replace in $str every // two charactes ?? with / the matched string and a : (every match is kept in the ksh-variable ${.sh.match}). Then print $str without the last % ':'.

How to strip out \r\n in between a quoted string in between tabs when rows are also delimited by \r\n?

In Ruby 2.1.3, I have a string representing a title such as in a tab delimited csv file format:
string = "helloworld\r\n14522\tAB-12-00420\t\"PROTOCOL \r\nRisk Effectiveness \r\nand Device Effectiveness In \r\Ebola Candidates \"\tData Collection only\t\t20\t"
I want to strip out the "\r\n" only in the tab delimited portion that starts with Protocol so I can read a complete title as "PROTOCOL Risk Effectiveness and Device Effectiveness In Ebola Candidates"....I want the end result to be:
"helloworld\r\n14522\tAB-12-00420\t\"PROTOCOL Risk Effectiveness and Device Effectiveness In Heart Failure Candidates \"\tData Collection only\t\t20\t"
If I don't do this, trying to read it in via CSV truncates the title so I only end up reading "PROTOCOL" and not the rest of the title.
Keep in mind there may be an indeterminate number of \r\n characters I want to remove within a title (I'll be parsing through different titles). How do I accomplish this? I was thinking a regular expression might be the way...
Since a newline (outside of quotes) is treated as a delimiter,
you could use this regex to isolate quoted fields then replace any \r?\n just
within that field.
You would then pass the string into the CSV module.
There are 3 groups that together constitute the entire match.
1. Delimiter
2. Double quoted field
3 Non-quoted field
Would need a replace-with-callback function implementation.
Within the callback, if group 2 is not empty, do a separate replace of all CRLF's.
Catenate goup 1 + replaced(group2) + group 3, then return the catenation.
# ((?:^|\t|\r?\n)[^\S\r\n]*)(?:("[^"\\]*(?:\\[\S\s][^"\\]*)*"(?:[^\S\r\n]*(?=$|\t|\r?\n)))|([^\t\r\n]*(?:[^\S\r\n]*(?=$|\t|\r?\n))))
( # (1 start), Delimiter tab or newline
(?: ^ | \t | \r? \n )
[^\S\r\n]* # leading optional whitespaces
) # (1 end)
(?:
( # (2 start), Quoted string field
"
[^"\\]*
(?: \\ [\S\s] [^"\\]* )*
"
(?:
[^\S\r\n]* # trailing optional whitespaces
(?= $ | \t | \r? \n ) # Delimiter ahead, tab or newline
)
) # (2 end)
| # OR
( # (3 start), Non quoted field
[^\t\r\n]*
(?:
[^\S\r\n]* # trailing optional whitespaces
(?= $ | \t | \r? \n ) # Delimiter ahead, tab or newline
)
) # (3 end)
)
Unfortunately I don't know ruby, and the solution I'm going to offer is not very nice, but here goes:
Since ruby's implementation of regex doesn't support dynamic width lookbehinds, I couldn't come up with a pattern that matches only the \r\n you want to remove. But you can replace all matches of this regex pattern
(\t"?PROTOCOL[^\t]*)[\r\n]+
with \1 (the text that has been matched by group 1), until the pattern no longer matches. Only one substitution won't remove all occurences of \r\n. See demo.
I hope you'll find a nicer solution.

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Resources