I cannot understand this grep pattern - bash

file="value"
here is the content of value:
first col1 col2 col3
second col1 col2 col3
three col1 col2 col3
wanted output:
first col1 col2 col3
three col1 col2 col3
condition: we work with variables not with files directly. the search pattern should be stored in a variable as well.
In order to have this result, I do:
var=second
grep -v $var $file > $tmp;mv $tmp $file
tmp must be initialized with any value in order to work
But I came accross this and I just do not understand my bash anymore:
grep −v "^${var}," $file > $tmp;mv $temp $file
Apart from the fact that it does not work for me, I would like to understand the meaning or at least what it is intended to do; maybe I can arrange it after.
what does mean "^${var}," !!??
what I know so far:
^ -> start of line
$ -> end of line
^$ -> ?
{} -> match exactly
, -> ?
all these symbols together -> ?
Any idea folks ? thank you so much.

Your command is using double quotes, which means that bash gets to play with the argument first.
In the following:
grep −v "^${var}," $file > $tmp;mv $temp $file
...with:
var=foo
...bash will expand ${var} to foo, resulting in:
grep −v "^foo," $file > $tmp;mv $temp $file
(it'll also expand $file, $tmp and $temp (?), but that's not interesting)

To add to Roger's answer,
defining your variable between {} within a string indicate to bash which part is the string and which part is the variable.
For instance if you write it like this:
"^$var,"
bash will expand it this way: "^'value of variable named var,'" which would result in a blank value since the variable is var and not var, giving this end result : "^"
if you use the curly brace instead like this:
"^${var},"
bash will look for a variable var and append ^ at the start and a , at the end giving you this value after expansion "^'var value',"
so with
var=foo
you would get "^foo," as your grep pattern

Related

Are conditional group substitutions possible in `jq` ? if so how?

Consider the situation:
echo '"field:bla"\n"field:"' \
| jq ' . | gsub( "^field:(?<val>.+?)?$" ; "(?(val)value=\(.val)|NULL)" )'
We're taking in a list of strings of the form: field:<VALUE>, where <VALUE> can be either '' (empty), or one or more characters.
The objective is to return: NULL if <VALUE> is '' (empty), or value=<VALUE> if non-empty.
the question is, can jq do this using conditional substitution ? based on whether or not the group <val> is set ? if so what is the syntax ? or is this not supported ?
PS: it's not a problem of whether this can be done, or how to do it, i'm just wondering if jq's gsub supports conditional group substitution, and if so what's the right syntax for it.
The closest you can get to a "conditional substitution" within sub would be along these lines:
(echo "field:bla"; echo "field:") |
jq -rR 'sub( "^field:(?<val>.*)$" ; "value=\(.val | if length==0 then "NULL" else . end )" ) '
This produces:
value=bla
value=NULL
You might also like to consider this alternative, which produces the same result:
(echo "field:bla"; echo "field:") |
jq -rR 'sub( "^(field:(?<val>.+)|field:)$" ; "value=\(.val // "NULL")" )'
In both these cases, replacing sub by gsub has no effect on the results.

Test if a value is in a csv file in bash

I have a 3-4M lines csv file (my_csv.csv) with two columns as :
col1,col2
val11,val12
val21,val22
val31,val32
...
The csv contains only two columns with one comma per line. Col1 and Col2 values are only strings (nothing else). The result shown above is the result of the command head my_csv.cs..
I would like to check if a string test_str is into the col2 values. What I mean here is, if test_str = val12 I would like the test to return True because val12 is located in column 2 (as show in the example).
But if test_str = val1244 I want the code to return False.
In python it would be something as :
import pandas as pd
df = pd.read_csv('my_csv.csv')
test_str = 'val42'
if test_str in df['col2'].to_list():
# Expected to return true
# Do the job
But I have no clues how to do it in bash.
(I know that df['col2'].to_list() is not a good idea, but I didn't want to use built-in pandas function for the code to be easier to understand)
awk is most suited amongst the bash utilities to handle csv data:
awk -F, -v val='val22' '$2 == val {print "found a match:", $0}' file
found a match: val21,val22
An equivalent bash loop would be like this:
while IFS=',' read -ra arr; do
if [[ ${arr[1]} == 'val22' ]]; then
echo "found a match: ${arr[#]}"
fi
done < file
But do keep in mind that Bash while read loop extremely slow compared to cat, why?
Parsing CSV is difficult... unless your fields do not contain commas, newlines... And you don't do what you want in bash, on a large file it will be extremely slow. You do it using utilities like awk or grep that would also be available with dash, zsh or another shell. So, if you have a very simple CSV format you can use, e.g., grep:
if grep -q ',val42$' my_csv.csv; then
<do that>
fi
We can also put the string to search for in a variable but remember that some characters have a special meaning in regular expressions and shall be escaped. Example if there are no special characters in the string to search for:
test_str="val42"
if grep -q ",$test_str$" my_csv.csv; then
<do that>
fi
3-4M rows is a small file to awk. might as well just do
{m,g}awk 'END { exit !index($_,","(__)"\n") }' RS='^$' FS='^$' __="${test_str}"

HOW To ASSIGN THE OUT PUT OF THIS EXECUTIOn TO VARIABLE

I am newbie in shell script , may be stupid query to experts, I am using following code to remove leading and trailing spaces from value, how do I assign output of echo variable to StringVar variable again or to other Variable. I am using ksh shell.
StringVar= ' abc '
echo StringVar | awk '{$1=$1};1'
x=$(command)
where x is the variable to which you want to assign the output of the command.
In your case, do not give space before or after the assignment operator =
StringVar=' abc '
x=$(echo "$StringVar" | awk '{$1=$1};1')

Change delimiter in string type of variable in bash

I have small task.
I should write:
data="duke,rock,hulk,donovan,john"
And in the next variable, i should change delimiter of first variable.
data2="duke|rock|hulk|donovan|john"
What is the correct way to do this on bash ?
This is a small part of script, what i should do.
For example, i use construction "WHILE-GETOPS-CASE" to use usernames in parameter for excluding them.
ls /home/ | egrep -v $data2
You can easily replace a single character with an expansion:
data="duke,rock,hulk,donovan,john"
data2=${data//,/|}
echo "$data2"
Breaking down the syntax:
${data means "expand based on the value found in variable data;
// means "search all occurences of";
The lone / means "replace with what follows".
Note that some characters may need to be escaped, but not the comma and vertical bar.
Then you may filter the results like this:
ls /home/ | egrep -v "$data2"
Another very similar way would be to use tr (translate or delete characters):
data="duke,rock,hulk,donovan,john"
data2=$(echo $data | tr ',' '|')
echo "$data2"

Bash script: regexp reading numerical parameters from text file

Greetings!
I have a text file with parameter set as follows:
NameOfParameter Value1 Value2 Value3 ...
...
I want to find needed parameter by its NameOfParameter using regexp pattern and return a selected Value to my Bash script.
I tried to do this with grep, but it returns a whole line instead of Value.
Could you help me to find as approach please?
It was not clear if you want all the values together or only one specific one. In either case, use the power of cut command to cut the columns you want from a file (-f 2- will cut columns 2 and on (so everything except parameter name; -d " " will ensure that the columns are considered to be space-separated as opposed to default tab-separated)
egrep '^NameOfParameter ' your_file | cut -f 2- -d " "
Bash:
values=($(grep '^NameofParameter '))
echo ${values[0]} # NameofParameter
echo ${values[1]} # Value1
echo ${values[2]} # Value2
# etc.
for value in ${values[#:1]} # iterate over values, skipping NameofParameter
do
echo "$value"
done

Resources