In bash, why does it output a right bracket as a value in when replacing? - bash

So I was looking into tr and I was playing around with the following command:
echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n*]' vs echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n]'. These are the different outputs:
> echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n*]'
test
new
LINE
> echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n]'
test]]]]]new]LINE]
Without the addition of the wildcard, it appears to be replacing each new line with a right bracket character. Taking a look at the man page (https://linuxcommand.org/lc3_man_pages/tr1.html), it says that for the argument [CHAR*], it "in SET2, copies of CHAR until length of SET1". So clearly it still replaces characters in SET1 based on SET2 but where is it getting the right bracket from?
Nothing to do with my job so no need to tell me about sed or awk, just came across this and was curious.

In your second command, the replacement set is not in one of the special formats
[CHAR*]
[CHAR*REPEAT]
[:<keyword>:]
[=CHAR=]
So it doesn't get any special treatment and the square brackets are treated literally. So the first two non-alphabetic characters are replaced with [ and \n, respectively, and all other characters are replaced with ] (because the replacement set is extended by repeating the last character).

Related

grep between some characters (quotes, etc) of after (eg. hashtag) any content (text, numbers, emojis) [duplicate]

This question already has answers here:
Grep string inside double quotes
(4 answers)
Closed 1 year ago.
Based on this question: Bash sed - find hashtags in string; with no solutions for this case (when you have special characters).
This question is well-researched and not a duplicate of this unrelated question as the referred doesn't covers all the asked topics (support to special characters and numbers; grep both between and after/before).
echo "Text and #hashtag" | grep -o '#[[:alpha:]]\+*' | tr -d '"' works successfully, returning #hashtag; that's still related to the mentioned question...
...About this new question with mine own needs (that can be useful to you), this is my version, parsing text between doublequotes instead of after hashtag:
echo '#first = "Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"' and it works, returning Yes.
However, when it have an emoji or other characters such as > and / (example: echo '#first = "✅ Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"') it returns an empty output.
It have to support any kind of character (emojis, html tags, numbers).
This should be useful not only for parsing between characters, but also after a character (such as parsing any #hashtag text) or before.
The way to extract text between double quotes is to match any character except double quote, as many as possible, between double quotes.
grep -o '"[^"]*"' | tr -d '"'
Some test cases:
grep -o '"[^"]*"' <<\___here | tr -d '"'
there is "text" between "double quotes"
just one "?" here, "test me!"
any unpaired double quote " will not match
___here
The second one of these will fail with the current code in your own answer.
Thanks to #Aserre's pointings, I could come up with an answer.
In order for the "get every text when it appear AFTER a charater" and "get every text when it appear BETWEEN quotes" (grep) to work with any character, we have to replace [[:alpha:]] in the block to ...
So, it is:
echo '#first = "✅ Yes"' | grep -o '"...\+"' | tr -d '"' (get anything which is between double quotes)
and:
echo "Text and #hashtag" | grep -o '#...\+' | tr -d '"' (get anything which is after a hashtag)
Update:
If you want to support things with only 1 character (such as numbers ranging from 0 to 9), replace ... to . (single dot)
It works, as in the question, for: emojis, letters, numbers and other special characters.

How to convert a line into camel case?

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString
To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

print upto second last character in unix

If the length of a string is 5 then how can I print upto 4th character of the string using shell scripting.I have stored the string in a variable and length in other variable.but how can i print upto length -1.
If you are using BASH then it is fairly straight forward to remove last character:
s="string1,string2,"
echo "${s%?}"
? matches any single character and %? removes any character from right hand side.
That will output:
string1,string2
Otherwise you can use this sed to remove last character:
echo "$s" | sed 's/.$//'
string1,string2
You can do it with bash "parameter substitution":
string=12345
new=${string:0:$((${#string}-1))}
echo $new
1234
where I am saying:
new=${string:a:b}
where:
a=0 (meaning starting from the first character)
and:
b=${#string} i.e. the length of the string minus 1, performed in an arithmetic context, i.e. inside `$((...))`
str="something"
echo $str | cut -c1-$((${#str}-1))
will give result as
somethin
If you have two different variables, then you can try this also.
str="something"
strlen=9
echo $str | cut -c1-$((strlen-1))
cut -c1-8 will print from first character to eighth.
Just for fun:
When you have the string and length in vars already,
s="example"
slen=${#s}
you can use
printf "%.$((slen-1))s\n" "$s"
As #anubhava showed, you can also have a clean solution.
So do not try
rev <<< "${s}" | cut -c2- | rev

Print word between two characters by going backward in the line

I having problems in extracting the word from a line. What i want is that it picks the first word before the symbol # but after the /. Which is the only delimiter that stand out.
A line looks like this:
,["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
I want the word Programming.
To get that line i am using this which narrows it down.
sed -n '/.*picasa.*.jpg/p' 5743548866439293105
So i want it to pretty much find # and then go backward until it hit the first /. Then print it out. In this case the word should be Programming but could be anything.
I want it to be as short as possible and have experimented with
sed -n '/.*picasa.*.jpg/p' 5743548866439293105 | awk '$0=$2' FS="/" RS="[$#]"
You can do that with sed (slightly shortened for formatting but works on your original string as well):
pax> echo ',["https://p.g.com/111/Prog#574' | sed 's/^[^#]*\/\([^#]*\)#.*$/\1/'
Prog
pax>
Explaining in more detail:
/---+------------------> greedy capture up to '/'.
/ |
| | /------+---------> capture the stuff between '/' and '#'.
| |/ |
| || | /-+-----> everything from '#' to end of line.
| || |/ |
| || || |
's/^[^#]*\/\([^#]*\)#.*$/\1/'
||
\+---> replace with captured group.
It basically searches for an entire line that has the pattern you want (first # following a /), whilst capturing (with the \( and \) brackets) just the stuff between / and #.
The substitution then replaces the entire line with just that captured text you're interested in (via \1).
Using grep with some Perl regex extensions:
echo $string | grep -P -o "(?<=/)[^/]+(?=#)"
-P tells grep to use Perl extensions. -o tells grep to display only the matched text. To understand what gets matched, break the regex into three parts: (?<=/), [^/]+?, and (?=#). The first part says that the matched text must follow a '/', without including the '/' in the match. The second parts matches a string of non-'/' characters. The last part says that the matched text must be immediately followed by a '#', without including the '#' in the match.
Another grep, using the "\K" feature to "throw away" the match up to the last '/' before the '#':
# Match as much as possible up to a '/', but throw it away, then match as much as you can
# up to the first #
echo $string | grep -oP ".*/\K.+(?=#)"
Using cut and awk to get the first field (splitting on #) followed by the last field (splitting on /):
echo $string | cut -d# -f1 | awk -F/ '{print $NF}'
Using some temporary variables and bash's parameter expansion facilities:
$ FOO=["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
$ BAR=${FOO%#*} # Strip the last # and everything after
$ echo $BAR
[https://picasaweb.google.com/111560558537332305125/Programming
$ BAZ=${BAR##*/} # Strip everything up to and including the last /
$ echo $BAZ
Programming
This might work for you:
sed '/.*\/\([^#]*\)#.*/{s//\1/;q};d' file

Resources