Shell Script - Extract number at X column in current line in file - shell

I am reading a file (test.log.csv) line by line until the end of the file, and I want to extract the value at 4th column of current line read then output the value to a text file. (output.txt)
For example, right now I read until 2nd line (INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1) and I want to extract the number at column 4 in the current line and output to a text file named as output.txt.
test.log.csv
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
The desired output is
output.txt
1127192896
1127192896
1127192896
Right now my script is as below
#! /bin/bash
clear
rm /home/mobaxterm/Script/output.txt
while IFS= read -r line
do
if [[ $line == *"INSERT"* ]] && [[ $line == *"$1"* ]]
then
echo $line >> /home/mobaxterm/Script/output.txt
lastID=$(awk -F "," '{if (NR==curLine) { print $4 }}' curLine="${lineCount}")
echo $lastID
else
if [ lastID == "$1" ]
then
echo $line >> /home/mobaxterm/Script/output.txt
fi
fi
lineCount=$(($lineCount+1))
done < "/home/mobaxterm/Script/test.log.csv"
The parameter ($1) will be 1127192896
I tried declaring a counter in the loop and compare NR with the counter, but the script just stopped after it found the first one.

Find all the lines where the 4th field is 1127192896 and output the 4th field:
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH {print $4}' test.log.csv
1127192896
1127192896
1127192896
Find all the lines containing the word "INSERT" and where the 4th field is 1127192896
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv
If you have the number you want to look for in a variable called $1, put that in place of the 1127192896, like this:
awk -F, -v SEARCH="$1" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv

You can combine variable substitution and definition of array.
array_variable=( ${line//,/ /} )
sth_you_need=${array_variable[1]}
Or you can just use awk/cut
sth_you_need=$(echo $line | awk -F, 'NR==2{print $2}')
# or
sth_you_need=$(echo $line | cut -d, -f2)

Related

how to change words with the same words but with number at the back bash

I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2

Filter lines based on certain string and then print only some attributes greater

I have a big text file with million of log lines.
I would like to filter all the lines which satisfy following criteria
url should be url=/v2/testB
totalTime value should be greater than 500
INFO|id=1|totaltime=5000|httpmethod=POST|url=/v1/testA
INFO|id=2|totaltime=200|httpmethod=POST|url=/v2/testB
INFO|id=3|totaltime=1000|httpmethod=POST|url=/v2/testB
INFO|id=4|totaltime=501|httpmethod=POST|url=/v2/testB
result:-
id=3,totaltime=1000
id=4,totaltime=501
I have tried using multiple awk and then putting if block, I wonder, it can be done quickly? Thanks !
while IFS= read -r line; do
value=`echo $line|grep "url=/v2/testB" | awk -F"totaltime=" '{ print $2}'| awk -F"|" '{ print $1}'`
if (( $value > 500 )); then
echo $line
fi
done < file.log
You may use this awk:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" {v=$3; sub(/^totaltime=/, "", v); if (v+0 > 500) print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
To make it more readable:
awk -F '|' -v OFS=, '
$NF == "url=/v2/testB" {
v = $3
sub(/^totaltime=/, "", v)
if (v+0 > 500)
print $2, $3
}' file
If you have gnu-awk then it can be reduced to:
awk -F '|' -v OFS=, '$NF == "url=/v2/testB" &&
gensub(/^totaltime=/, "", "1", $3)+0 > 500 {print $2, $3}' file
v+0 is shorthand in awk to covert a string value to number.
$ awk -F'|' -v OFS=',' '{split($3,t,/=/)} $5=="url=/v2/testB" && t[2]>500{print $2, $3}' file
id=3,totaltime=1000
id=4,totaltime=501
You seem to be in luck:
awk -F'|' 'BEGIN{FS="|"; OFS=","}
{ url = substr($NF,index($NF,"=")+1)
totaltime = substr($3,index($3,"=")+1)
}
(url == "/v1/testB") && (totaltime+0 > 500) { print $2,$3 }
' file
With your shown samples, please try following awk program.
awk -F'\\||totaltime=' '$NF=="url=/v2/testB" && $4>500{print $2",totaltime="$4}' Input_file
Explanation: Following is the detailed explanation for above code.
Setting field separator by using -F option in awk program.
Setting field separators to | and totaltime= for all the lines of Input_file.
In main program, checking conditions:
a- If $NF(last field) is equal to url=/v2/testB AND
b- 4th field is greater than 500 then do:
print 2nd field of current line followed by string ,totaltime= followed by 4th field as per required output by OP.
All the awk solutions are great, and if that is a solution use them.
If you wanted to fix your Bash effort, you can do:
while IFS='|' read -r id ti; do
[[ "${ti#*=}" -gt 500 ]] && printf "%s,%s\n" "$id" "$ti"
done < <(grep 'url=/v2/testB$' file | cut -d '|' -f 2,3)
Alternatively, you can eliminate cut and keep all five fields:
while IFS='|' read -r c1 c2 c3 c4 c5; do
[[ "${c3#*=}" -gt 500 ]] && printf "%s,%s\n" "$c2" "$c5"
done < <(grep 'url=/v2/testB$' file)
Either prints:
id=3,totaltime=1000
id=4,totaltime=501

How do I check for blank fields on a delimited line with sed or awk

I'm parsing source input files using a bash script. I'm generating delimited output in a file. I need a way to check that each field of the delimited output is populated. For example AA,BB,3,4,5,6,7,8 would be good and AA,,3,4,5,6,,8 would be bad. How do I check if there are blank fields on a line using sed/awk or some other tool I can put in a bash script? Thanks in advance!
With bash:
string='AA,,3,4,5,6,,8'
if [[ $string =~ ^,|,,|,$ ]]; then
echo "error"
else
echo "okay"
fi
Output:
error
You can print the lines with at least one empty field using:
awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}'
-F, sets the field delimiter as ,
for (i=1;i<=NF;i++) iterates over the fields
if ($i=="") {print; next} prints the record if the field being tested is empty and goes to the next record
Example:
% cat file.txt
AA,BB,3,4,5,6,7,8
AA,,3,4,5,6,,8
% awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}' file.txt
AA,,3,4,5,6,,8
You can test with a regular expression with a repeating group that fits with your requirement:
grep -E '^([^,]+,)*[^,]$' <<< "${AA,,3,4,5,6,,8}"
Testcode:
for str in "AA,BB,3,4,5,6,7,8" "AA,,3,4,5,6,,8" ; do
echo "==========="
echo "Testing >>>${str}<<<"
grep -Eq '^([^,]+,)*[^,]$' <<< "${str}" || echo "String incorrect"
done
You can grep the incorrect lines from a file using
grep -vE '^([^,]+,)*[^,]$' inputfile

remove delimiter if condition not satisfied and substitute a string on condition

Consider the below file.
HEAD~XXXX
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~TPSCRI~~~
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~RSCPIT~~~
TAIL~20
wish the Output to be like below for the above:
HEAD~XXXX
XXX~XXX~XXX~XXX~XXX~XXX~~WIN~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~~~~~
TAIL~20
If the 9th field is SCRIPT, I want both 8th & 9th fields to be empty like the 10th & if the line contains words HEAD/TAIL those have to ignored from our above condition, i.e., NF!=13 - will need the header & footer as it is in the input.
I have tried the below, but there should be a smarter way.
awk -F'~' -v OFS='~' '($9 != "Working line takeover with change of CP" {$9 = ""}) && ($9 != "Working line takeover with change of CP" {$8 = ""}) {NF=13; print}' file
the above doesn't work
head -1 file > head
tail -1 file > tail
sed -i '/HDR/d' file
sed -i '/TLR/d' file
sed -i '/^\s*$/d' file
awk -F'~' -v OFS='~' '$9 != "Working line takeover with change of CP" {$9,$8 = ""} {NF=13; print}' file >> file.tmp //syntax error
cat file.tmp >> head
cat tail >> head
echo "" >> head
mv head file1
I'm trying an UNIX shell script with the below requirements.
Consider a file like this..
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~SCRIPT~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~OTHERS~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~OTHERS~~~
Each file should have 12 fields(~ as delimiter), if not a ~ has to removed.
If anything OTHER than SCRIPT string present in the 10th field, the field has to be removed.
I tried the below in /bin/bash, I know I'm not doing it so well. I'm feeding line to sed & awk commands.
while read readline
echo "entered while"
do
fieldcount=`echo $readline | awk -F '~' '{print NF}'`
echo "Field count printed"
if [ $fieldcount -eq 13 ] && [ $fieldcount -ne 12 ]
then
echo "entering IF & before deletion"
#remove delimiter at the end of line
#echo "$readline~" >> $S_DIR/$1.tmp
#sed -i '/^\s*$/d' $readline
sed -i s'/.$//' $readline
echo "after deletion"
if [ awk '/SCRIPT/' $readline -ne "SCRIPT"]
then
#sed -i 's/SCRIPT//' $readline
replace_what="OTHERS"
#awk -F '~' -v OFS=~ '{$'$replace_what'=''; print }'
sed -i 's/[^,]*//' $replace_what
echo "$readline" >> $S_DIR/$1.tmp
fi
else
echo "$readline" >> $S_DIR/$1.tmp
fi
done < $S_DIR/$1
awk -F'~' -v OFS='~' '$10 != "SCRIPT" {$10 = ""} {NF=12; print}' file
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~SCRIPT~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~~~
XXX~XXX~XXX~XXX~XXX~XXX~~XXX~~~~
In bash, I would write:
(
# execute in a subshell, so the IFS setting is localized
IFS='~'
while read -ra fields; do
[[ ${fields[9]} != "SCRIPT" ]] && fields[9]=''
echo "${fields[*]:0:12}"
done < file
)
Your followup question:
awk -F'~' -v OFS='~' '
$1 == "HEAD" || $1 == "TAIL" {print; next}
$9 != "SCRIPT" {$8 = $9 = ""}
{NF=13; print}
' file
If you have further questions, please create a new question instead of editing this one.

Save variable from txt using awk

I have a txt in my folder named parameters.txt which contains
PP1 20 30 40 60
PP2 0 0 0 0
I'd like to use awk to read the different parameters depending on the value of the first text field in each line. At the moment, if I run
src_dir='/PP1/'
awk "$src_dir" '{ print $2 }' parameters.txt
I correctly get
20
I would simply like to store that 20 into a variable and to export the variable itself.
Thanks in advance!
If you want to save the output, do var=$(awk expression):
result=$(awk -v value=$src_dir '($1==value) { print $2 }' parameters.txt)
You can make your command more general giving awk the variable with the -v syntax:
$ var="PP1"
$ awk -v v=$var '($1==v) { print $2 }' a
20
$ var="PP2"
$ awk -v v=$var '($1==v) { print $2 }' a
0
You don't really need awk for that. You can do it in bash.
$ src_dir="PP1"
$ while read -r pattern columns ; do
set - $columns
if [[ $pattern =~ $src_dir ]]; then
variable=$2
fi
done < parameters.txt
shell_pattern=PP1
output_var=$(awk -v patt=$shell_pattern '$0 ~ patt {print $2}' file)
Note that $output_var may contain more than one value if the pattern matches more than one line. If you're only interested in the first value, then have the awk program exit after printing .

Resources