Bash script: regexp reading numerical parameters from text file - bash

Greetings!
I have a text file with parameter set as follows:
NameOfParameter Value1 Value2 Value3 ...
...
I want to find needed parameter by its NameOfParameter using regexp pattern and return a selected Value to my Bash script.
I tried to do this with grep, but it returns a whole line instead of Value.
Could you help me to find as approach please?

It was not clear if you want all the values together or only one specific one. In either case, use the power of cut command to cut the columns you want from a file (-f 2- will cut columns 2 and on (so everything except parameter name; -d " " will ensure that the columns are considered to be space-separated as opposed to default tab-separated)
egrep '^NameOfParameter ' your_file | cut -f 2- -d " "

Bash:
values=($(grep '^NameofParameter '))
echo ${values[0]} # NameofParameter
echo ${values[1]} # Value1
echo ${values[2]} # Value2
# etc.
for value in ${values[#:1]} # iterate over values, skipping NameofParameter
do
echo "$value"
done

Related

How to prepend to a string that comes out of a pipe

I have two strings saved in a bash variable delimited by :. I want to get extract the second string, prepend that with THIS_VAR= and append it to a file named saved.txt
For example if myVar="abc:pqr", THIS_VAR=pqr should be appended to saved.txt.
This is what I have so far,
myVar="abc:pqr"
echo $myVar | cut -d ':' -f 2 >> saved.txt
How do I prepend THIS_VAR=?
printf 'THIS_VAR=%q\n' "${myVar#*:}"
See Shell Parameter Expansion and run help printf.
The more general solution in addition to #konsolebox's answer is piping into a compound statement, where you can perform arbitrary operations:
echo This is in the middle | {
echo This is first
cat
echo This is last
}

Display First and Last string entries stored in a variable

I have a variable MyVar with values stored in it. For example:
MyVar="123, 234, 345, 456"
Each entry in the variable is separated by a coma as in the example above.
I want to be able to pick the first and last entry from this variable, i.e 123 and 456 respectively.
Any idea how I can achieve this from the command prompt terminal ?
Thanks!
Using bash substring removal:
$ echo ${MyVar##*,}
456
$ echo ${MyVar%%,*}
123
Also:
$ echo ${MyVar/,*,/,}
123, 456
More for example here:
https://tldp.org/LDP/abs/html/parameter-substitution.html
Edit: Above kind of expects the substrings to be separated by commas only. See comments where #costaparas gloriously demonstrates a case with , .
Try using sed:
MyVar="123, 234, 345, 456"
first=$(echo "$MyVar" | sed 's/,.*//')
last=$(echo "$MyVar" | sed 's/.*, //')
echo $first $last
Explanation:
To obtain the first string, we replace everything after & including
the first comma with nothing (empty string).
To obtain the last string, we replace everything before & including the last comma with nothing (empty string).
Using bash array:
IFS=', ' arr=($MyVar)
echo ${arr[0]} ${arr[-1]}
Where ${arr[0]} and ${arr[-1]} are your first and last respective values. Negative index requires bash 4.2 or later.
You could try following also with latest BASH version, by sending variable values into an array and then retrieve first and last element, keeping all either values in it saved in case you need them later in program etc.
IFS=', ' read -r -a array <<< "$MyVar"
echo "${array[0]}"
123
echo "${array[-1]}"
456
Awk alternative:
awk -F "(, )" '{ print $1" - "$NF }' <<< $MyVar
Set the field separator to command and a space. Print the first field and the last field (NF) with " - " in between.

Output the value/word after one pattern has been found in string in variable (grep, awk, sed, pearl etc)

I have a program that prints data into the console like so (separated by space):
variable1 value1
variable2 value2
variable3 value3
varialbe4 value4
EDIT: Actually the output can look like this:
data[variable1]: value1
pre[variable2] value2
variable3: value3
flag[variable4] value4
In the end I want to search for a part of the name e.g. for variable2 or variable3 but only get value2 or value3 as output.
EDIT: This single value should then be stored in a variable for further processing within the bash script.
I first tried to put all the console output into a file and process it from there with e.g.
# value3_var="$(grep "variable3" file.log | cut -d " " -f2)"
This works fine but is too slow. I need to process ~20 of these variables per run and this takes ~1-2 seconds on my system. Also I need to do this for ~500 runs. EDIT: I actually do not need to automatically process all of the ~20 'searches' automatically with one call of e.g. awk. If there is a way to do it automaticaly, it's fine, but ~20 calls in the bash script are fine here too.
Therefore I thought about putting the console output directly into a variable to remove the slow file access. But this will then eliminate the newline characters which then again makes it more complicated to process:
# console_output=$(./programm_call)
# echo $console_output
variable1 value1 variable2 value2 variable3 value3 varialbe4 value4
EDIT: IT actually looks like this:
# console_output=$(./programm_call)
# echo $console_output
data[variable1]: value1 pre[variable2] value2 variable3: value3 flag[variable4] value4
I found a solution for this kind of string arangement, but these seem only to work with a text file. At least I was not able to use the string stored in $console_output with these examples
How to print the next word after a found pattern with grep,sed and awk?
So, how can I output the next word after a found pattern, when providing a (long) string as variable?
PS: grep on my system does not know the parameter -P...
I'd suggest to use awk:
$ cat ip.txt
data[variable1]: value1
pre[variable2] value2
variable3: value3
flag[variable4] value4
$ cat var_list
variable1
variable3
$ awk 'NR==FNR{a[$1]; next}
{for(k in a) if(index($1, k)) print $2}' var_list ip.txt
value1
value3
To use output of another command as input file, use ./programm_call | awk '...' var_list - where - will indicate stdin as input.
This single value should then be stored in a variable for further processing within the bash script.
If you are doing further text processing, you could do it within awk and thus avoid a possible slower bash loop. See Why is using a shell loop to process text considered bad practice? for details.
Speed up suggestions:
Use LC_ALL=C awk '..' if input is ASCII (Note that as pointed out in comments, this doesn't apply for all cases, so you'll have to test it for your use case)
Use mawk if available, that is usually faster. GNU awk may still be faster for some cases, so again, you'll have to test it for your use case
Use ripgrep, which is usually faster than other grep programs.
$ ./programm_call | rg -No -m1 'variable1\S*\s+(\S+)' -r '$1'
value1
$ ./programm_call | rg -No -m1 'variable3\S*\s+(\S+)' -r '$1'
value3
Here, -o option is used to get only the matched portion. -r is used to get only the required text by replacing the matched portion with the value from the capture group. -m1 option is used to stop searching input once the first match is found. -N is used to disable line number prefix.
Exit after the first grep match, like so:
value3_var="$(grep -m1 "variable3" file.log | cut -d " " -f2)"
Or use Perl, also exiting after the first match. This eliminates the need for a pipe to another process:
value3_var="$(perl -le 'print $1, last if /^variable3\s+(.*)/' file.log)"
If I'm understanding your requirements correctly, how about feeding
the output of programm_call directly to the awk script instead of
assinging a shell variable.
./programm_call | awk '
# the following block is invoked line by line of the input
{
a[$1] = $2
}
# the following block is executed after all lines are read
END {
# please modify the print statement depending on your required output format
print "variable1 = " a["variable1"]
print "variable3 = " a["variable3"]
}'
Output:
variable1 = value1
variable3 = value3
As you see, the script can process all (~20) variables at once.
[UPDATE]
Assumptions including the provided information:
The ./program_call prints approx. 50 pairs of "variable value"
variable and value are delimited by blank character(s)
variable may be enclosed with [ and ]
variable may be followed by :
We have interest with up to 20 variables out of the ~50 pairs
We use just one of the 20 variables at once
We don't want to invoke ./program_call whenever accessing just one variable
We want to access the variable values from within bash script
We may use an associative array to fetch the value via the variable name
Then it will be convenient to read the variable-value pairs directly within
bash script:
#!/bin/bash
declare -A hash # declare an associative array
while read -r key val; do # read key (variable name) and value
key=${key#*[} # remove leading "[" and the characters before it
key=${key%:} # remove trailing ":"
key=${key%]} # remove trailing "]"
hash["$key"]="$val" # store the key and value pair
done < <(./program_call) # feed the output of "./program_call" to the loop
# then you can access the values via the variable name here
foo="${hash["variable2"]}" # the variable "foo" is assigned to "value2"
# do something here
bar="${hash["variable3"]}" # the variable "bar" is assigned to "value3"
# do something here
Some people criticize that bash is too slow to process text lines,
but we process just about 50 lines in this case. I tested a simulation by
generating 50 lines, processing the output with the script above,
repeating the whole process 1,000 times. It completed within a few seconds. (Meaning one batch ends within a few milliseconds.)
This is how to do the job efficiently AND robustly (your approach and all other current answers will result in false matches from some input and some values of the variables you want to search for):
$ cat tst.sh
#!/usr/bin/env bash
vars='variable2 variable3'
awk -v vars="$vars" '
BEGIN {
split(vars,tmp)
for (i in tmp) {
tags[tmp[i]":"]
tags["["tmp[i]"]"]
tags["["tmp[i]"]:"]
}
}
$1 in tags || ( (s=index($1,"[")) && (substr($1,s) in tags) ) {
print $2
}
' "${#:--}"
$ ./tst.sh file
value2
value3
$ cat file | ./tst.sh
value2
value3
Note that the only loop is in the BEGIN section where it populates a hash table (tags[]) with the strings from the input that could match your variable list so that while processing the input it doesn't have to loop, it just does a hash lookup of the current $1 which will be very efficient as well as robust (e.g. will not fail on partial matches or even regexp metachars).
As shown, it'll work whether the input is coming from a file or a pipe. If that's not all you need then edit your question to clarify your requirements and improve your example to show a case where this does not do what you want.

Unix bash - using cut to regex lines in a file, match regex result with another similar line

I have a text file: file.txt, with several thousand lines. It contains a lot of junk lines which I am not interested in, so I use the cut command to regex for the lines I am interested in first. For each entry I am interested in, it will be listed twice in the text file: Once in a "definition" section, another in a "value" section. I want to retrieve the first value from the "definition" section, and then for each entry found there find it's corresponding "value" section entry.
The first entry starts with ' gl_ ', while the 2nd entry would look like ' "gl_ ', starting with a '"'.
This is the code I have so far for looping through the text document, which then retrieves the values I am interested in and appends them to a .csv file:
while read -r line
do
if [[ $line == gl_* ]] ; then (param=$(cut -d'\' -f 1 $line) | def=$(cut -d'\' -f 2 $line) | type=$(cut -d'\' -f 4 $line) | prompt=$(cut -d'\' -f 8 $line))
while read -r glline
do
if [[ $glline == '"'$param* ]] ; then val=$(cut -d'\' -f 3 $glline) |
"$project";"$param";"$val";"$def";"$type";"$prompt" >> /filepath/file.csv
done < file.txt
done < file.txt
This seems to throw some syntax errors related to unexpected tokens near the first 'done' statement.
Example of text that needs to be parsed, and paired:
gl_one\User Defined\1\String\1\\1\Some Text
gl_two\User Defined\1\String\1\\1\Some Text also
gl_three\User Defined\1\Time\1\\1\Datetime now
some\junk
"gl_one\1\Value1
some\junk
"gl_two\1\Value2
"gl_three\1\Value3
So effectively, the while loop reads each line until it hits the first line that starts with 'gl_', which then stores that value (ie. gl_one) as a variable 'param'.
It then starts the nested while loop that looks for the line that starts with a ' " ' in front of the gl_, and is equivalent to the 'param' value. In other words, the
script should couple the lines gl_one and "gl_one, gl_two and "gl_two, gl_three and "gl_three.
The text file is large, and these are settings that have been defined this way. I need to collect the values for each gl_ parameter, to save them together in a .csv file with their corresponding "gl_ values.
Wanted regex output stored in variables would be something like this:
first while loop:
$param = gl_one, $def = User Defined, $type = String, $prompt = Some Text
second while loop:
$val = Value1
Then it stores these variables to the file.csv, with semi-colon separators.
Currently, I have an error for the first 'done' statement, which seems to indicate an issue with the quotation marks. Apart from this,
I am looking for general ideas and comments to the script. I.e, not entirely sure I am looking for the quotation mark parameters "gl_ correctly, or if the
semi-colons as .csv separators are added correctly.
Edit: Overall, the script runs now, but extremely slow due to the inner while loop. Is there any faster way to match the two lines together and add them to the .csv file?
Any ideas and comments?
This will generate a file containing the data you want:
cat file.txt | grep gl_ | sed -E "s/\"//" | sort | sed '$!N;s/\n/\\/' | awk -F'\' '{print $1"; "$5"; "$7"; "$NF}' > /filepath/file.csv
It uses grep to extract all lines containing 'gl_'
then sed to remove the leading '"' from the lines that contain one [I have assumed there are no further '"' in the line]
The lines are sorted
sed removes the return from each pair of lines
awk then prints
the required columns according to your requirements
Output routed to the file.
LANG=C sort -t\\ -sd -k1,1 <file.txt |\
sed '
/^gl_/{ # if definition
N; # append next line to buffer
s/\n"gl_[^\\]*//; # if value, strip first column
t; # and start next loop
}
D; # otherwise, delete the line
' |\
awk -F\\ -v p="$project" -v OFS=\; '{print p,$1,$10,$2,$4,$8 }' \
>>/filepath/file.csv
sort lines so gl_... appears immediately before "gl_... (LANG fixes LC_TYPE) - assumes definition appears before value
sed to help ensure matching definition and value (may still fail if duplicate/missing value), and tidy for awk
awk to pull out relevant fields

String capturing and print the next characters.

I have tried few options but that not working on my case. My requirement is..
Suppose I have a parameter in a file and wanted to capture the details as below and run a shell script(ksh).
PARAMETR=aname1:7,aname2:5
The parameter contains 2 values delimited by a comma and each value separated by a colon.
So, wanted to process it as if the string matched as aname1 then print both in different variable $v1=aname1 and $v2=7. The same applies to the other value too if string searched as aname2 then $v1=aname2 and $v2=5.
Thank you in advance.
That will do what you're asking for
#!/bin/ksh
typeset -A valueArray
PARAMETR=aname1:7,aname2:5
paramArray=(${PARAMETR//,/ })
for ((i=0;i<=${#paramArray[#]};i++)); do
valueArray[${paramArray[$i]%:*}]=${paramArray[$i]#*:}
done
for j in ${!valueArray[#]}; do
print "$j = ${valueArray[$j]}"
done
Hope it can help
First split the line in two sets and than process each set.
echo "${PARAMETR}" | tr "," "\n" | while IFS=: read -r v1 v2; do
echo "v1=$v1 and v2=$v2"
done
Result:
v1=aname1 and v2=7
v1=aname2 and v2=5

Resources