Bash what is the return value of grep and cut - bash

when I write something like this:
x = `grep "#include $1 | cut -f2"`
or any use with grep, cut like:
x = `grep string file.c`
I don't understand if x is an array or one long string? because when I write
echo ${#x[*]}
it prints 1, but I can write:
for d in `grep....`
as it was an array, please explain.

It is giving you a single long string. This is not called the "return value" of grep or cut, but rather the "standard output" (the text they print which you capture with backticks or perhaps clearer with $(...)).
What's happening here is that you get one single string, perhaps even with newlines inside, and then you iterate over it with your for d in .... In Bash, iterating over a string splits on spaces, so you get one d value for each word. Try this to see it in action, plus a way to avoid it:
x="foo bar baz"
for d in $x; do echo $d; done
for d in "$x"; do echo $d; done
If you quote in the loop, splitting on spaces will not occur.

x is a string.
In your example for loops through words.

Related

Looping through variable with spaces

This piece of code works as expected:
for var in a 'b c' d;
do
echo $var;
done
The bash script loops through 3 arguments printing
a
b c
d
However, if this string is read in via jq , and then looped over like so:
JSON_FILE=path/to/jsonfile.json
ARGUMENTS=$(jq -r '.arguments' "${JSON_FILE}")
for var in ${ARGUMENTS};
do
echo $var;
done
The result is 4 arguments as follows:
a
'b
c'
d
Example json file for reference:
{
"arguments" : "a 'b c' d"
}
What is the reason for this? I tried putting quotes around the variable like suggested in other SO answers but that caused everything to just be handled as 1 argument.
What can I do to get the behavior of the first case (3 arguments)?
What is the reason for this?
The word splitting expansion is run over unquoted results of other expansions. Because ${ARGUMENTS} expansion in for var in ${ARGUMENTS}; is unquoted, word splitting is performed. No, word splitting ignores quotes resulted from variable expansion - it only cares about whitespaces.
What can I do to get the behavior of the first case (3 arguments)?
The good way™ would be to write your own parser, to parse the quotes inside the strings and split the argument depending on the quotes.
I advise to use xargs, it (by default, usually a confusing behavior) parses quotes in the input strings:
$ arguments="a 'b c' d"
$ echo "${arguments}" | xargs -n1 echo
a
b c
d
# convert to array
$ readarray -d '' arr < <(<<<"${arguments}" xargs printf "%s\0")
As presented in the other answer, you may use eval, but please do not, eval is evil and will run expansions over the input string.
Change IFS to a new line to make it work:
...
IFS='\n'; for var in $ARGUMENTS;
do
echo $var;
done

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in §3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

I want to extract the strings from file name

one_two_three_four_five.rtf
I need five in A variable
I need four in B variable
And remaining in C variable
Should read from the last character
Note after 2 underscore from the last. There could be many underscores but should take has C variable.
Is it possible?
For example using parameter expansion
#!/bin/ksh
string="one_two_three_four_five.rtf"
base=${string%.rtf}
a=${base##*_}; base=${base%_$a}
b=${base##*_}; base=${base%_$b}
c=$base
echo "$a - $b - $c"
s="one_two_three_four_five.rtf"
source <(sed -r 's/(.*)_([^_]*)_([^_]*)[.].*/C="\1"; B="\2";A="\3"/' <<< "${s}")
# Result:
echo "A=$A, B=$B, C=$C"
A=five, B=four, C=one_two_three
Explanation:
sed -r No need for escaping backslashes
(.*)_ Matches largest string until underscore with the condition that there are underscores left for matching the remaining string
([^_]*) String without underscore
[.] A dot without special meaning
"\1" First remembered string
<<< "${s}" Input for sed is like echo "${s}" | sed ...
<(..) Simulates a file, so sourcing these will execute the commands.

How do I recursively replace part of a string with another given string in bash?

I need to write bash script that converts a string of only integers "intString" to :id. intString always exists after /, may never contain any other types (create_step2 is not a valid intString), and may end at either a second / or end of line. intString may be any 1-8 characters. Script needs to be repeated for every line in a given file.
For example:
/sample/123456/url should be converted to /sample/:id/url
and /sample_url/9 should be converted to /sampleurl/:id however /sample_url_2/ should remain the same.
Any help would be appreciated!
It seems like the long way around the problem to go recursive but then I don't know what problem you are solving. It seems like a good sed command like
sed -E 's/\/[0-9]{1,}/\/:id/g'
could do it in one shot, but if you insist on being recursive then it might go something like this ...
#!/bin/bash
function restring()
{
s="$1"
s="$(echo $s | sed -E 's/\/[0-9]{1,}/\/:id/')"
if ( echo $s | grep -E '\/[0-9]{1,}' > /dev/null ) ; then
restring $s
else
echo $s
exit
fi
echo $s
}
restring "$1"
now run it
$ ./restring.sh "/foo/123/bar/456/baz/45435/andstuff"
/foo/:id/bar/:id/baz/:id/andstuff

Bash - extracting a string between two points

For example:
((
extract everything here, ignore the rest
))
I know how to ignore everything within, but I don't know how to do the opposite. Basically, it'll be a file and it needs to extract the data between the two points and then output it to another file. I've tried countless approaches, and all seem to tell me the indentation I'm stating doesn't exist in the file, when it does.
If somebody could point me in the right direction, I'd be grateful.
If your data are "line oriented", so the marker is alone (as in the example), you can try some of the following:
function getdata() {
cat - <<EOF
before
((
extract everything here, ignore the rest
someother text
))
after
EOF
}
echo "sed - with two seds"
getdata | sed -n '/((/,/))/p' | sed '1d;$d'
echo "Another sed solution"
getdata | sed -n '1,/((/d; /))/,$d;p'
echo "With GNU sed"
getdata | gsed -n '/((/{:a;n;/))/b;p;ba}'
echo "With perl"
getdata | perl -0777 -pe "s/.*\(\(\s*\\n(.*)?\)\).*/\$1/s"
Ps: yes, its looks like a dance of crazy toothpicks
Assuming you want to extract the string inside (( and )):
VAR="abc((def))ghi"
echo "$VAR"
VAR=${VAR##*((}
VAR=${VAR%%))*}
echo "$VAR"
## cuts away the longest string from the beginning; # cuts away the shortest string from the beginning; %% cuts away the longest string at the end; % cuts away the shortes string at the end
The file :
$ cat /tmp/l
((
extract everything here, ignore the rest
someother text
))
The script
$ awk '$1=="((" {p=1;next} $1=="))" {p=o;next} p' /tmp/l
extract everything here, ignore the rest
someother text
sed -n '/^((/,/^))/ { /^((/b; /^))/b; p }'
Brief explanation:
/^((/,/^))/: range addressing (inclusive)
{ /^((/b; /^))/b; p }: sequence of 3 commands
1. skip line with ^((
2. skip line with ^))
3. print
The line skipping is required to make the range selection exclusive.

Resources