Removing columns from a csv file with different numbers of columns per line - bash

I have this bash script to remove columns from lines of a given csv file, but it runs very slowly. I need to use this script for files larger than 1GB, so I'm looking for a faster solution.
#!/bin/bash
while read line; do
columns=`echo $line | awk '{print NF}' FS=,`
if [ "$columns" == "9" ]; then
echo `echo $line | cut -d \, -f 1,5,6,8,9`
elif [ "$columns" == "24" ]; then
echo `echo $line | cut -d \, -f 1,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24`
elif [ "$columns" == "8" ]; then
echo `echo $line | cut -d \, -f 1,4,5,6,7,8`
else
echo $line
fi
done <$1
If anyone has advice on how to speed this up or if theres a better way to do it, that'd be awesome. Thanks a lot!

Your entire script can be handled by a single awk.
Try this:
awk 'BEGIN{FS=OFS=","}
NF==9 {print $1, $5, $6, $8, $9; next}
NF==8 {print $1, $4, $5, $6, $8; next}
NF==24{print $1,$4,$5,$6,$8,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24} "$1"

Related

How to grab fields in inverted commas

I have a text file which contains the following lines:
"user","password_last_changed","expires_in"
"jeffrey","2021-09-21 12:54:26","90 days"
"root","2021-09-21 11:06:57","0 days"
How can I grab two fields jeffrey and 90 days from inverted commas and save in a variable.
If awk is an option, you could save an array and then save the elements as individual variables.
$ IFS="\"" read -ra var <<< $(awk -F, '/jeffrey/{ print $1, $NF }' input_file)
$ $ var2="${var[3]}"
$ echo "$var2"
90 days
$ var1="${var[1]}"
$ echo "$var1"
jeffrey
while read -r line; do # read in line by line
name=$(echo $line | awk -F, ' { print $1} ' | sed 's/"//g') # grap first col and strip "
expire=$(echo $line | awk -F, ' { print $3} '| sed 's/"//g') # grap third col and strip "
echo "$name" "$expire" # do your business
done < yourfile.txt
IFS=","
arr=( $(cat txt | head -2 | tail -1 | cut -d, -f 1,3 | tr -d '"') )
echo "${arr[0]}"
echo "${arr[1]}"
The result is into an array, you can access to the elements by index.
May be this below method will help you using
sed and awk command
#!/bin/sh
username=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $1}')
echo "$username"
expires_in=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $3}')
echo "$expires_in"
Output :
jeffrey
90 days
Note :
This above method will work if their is only distinct username
As far i know username are not duplicate

Testing grep output

The cmd:
STATUS=`grep word a.log | tail -1 | awk '{print $1,$2,$7,$8,$9}'`
echo "$STATUS"
The output:
2020-05-18 09:27:01 1 of 122
I need to display this $STATUS and need to do the test comparison as well.
How to compare number 122 in below? How to represent 122 in $X?
The number 122 can be any number, resulted from above cmd.
if [ "$X" -gt "300" ]
then
echo "$STATUS. This in HIGH queue ($X)"
else
echo "$STATUS. This is NORMAL ($X)"
fi
You could do it with one awk script:
awk '
/word/{ status=$1" "$2" "$7" "$8" "$9; x=$9 }
END{ printf status". This %s (%s)\n", (x>300 ? "in HIGH queue" : "is NORMAL"), x }
' a.log
I would suggest using lowercase for variables to reduce possible confusion for someone other than the original author reading the script in the future. Also using $() is typically preferable to using back-ticks -- makes quoting easier to get right.
status="$(grep word a.log | tail -1 | awk '{print $1,$2,$7,$8,$9}')"
x="$(printf '%s' "$status" | awk '{ print $NF }')"
if [ "$x" -gt 300 ]
then
echo "$status. This in HIGH queue ($x)"
else
echo "$status. This is NORMAL ($x)"
fi
Note -- we could refactor the status line a bit:
status="$(awk '/word/ { x = $1 OFS $2 OFS $7 OFS $8 OFS $9 } END { print x }' a.log)"

shell script : comma in the beginning instead of end

This is a part of my shell script.
for line in `cat $1`
do
startNum=`echo $line | awk -F "," '{print $1}'`
endNum=`echo $line | awk -F "," '{print $2}'`
operator=`echo $line | awk -F "," '{print $3}'`
termPrefix=`echo $line | awk -F "," '{print $4}'`
if [[ "$endNum" == 81* ]] || [[ "$endNum" == 33* ]] || [[ "$endNum" == 55* ]]
then
areaCode="${endNum:0:2}"
series="${endNum:2:4}"
startCLI="${startNum:6:4}"
endCLI="${endNum:6:4}"
else
areaCode="${endNum:0:3}"
series="${endNum:3:3}"
startCLI="${startNum:6:4}"
endCLI="${endNum:6:4}"
fi
echo "Add,${areaCode},${series},${startCLI},${endCLI},${termPrefix},"
#>> ${File}
done
input is csv contains below many rows :
5557017101,5557017101,102,1694
5515585614,5515585614,102,084
Output od shell script :
,dd,55,5701,7101,7101,1694
,dd,55,1558,5614,5614,0848
Not sure why comma is coming in startign of output, instead as per shell script it should come in the end.
please help
Here is a suggested awk command that should replace all of your shell+awk code. This awk also takes care of trailing \r:
awk -v RS=$'\r' 'BEGIN{FS=OFS=","} NF>3{
startNum=$1; endNum=$2; termPrefix=$4;
if (endNum ~ /^(81|33|55)/) {
areaCode=substr(endNum,1,2); series=substr(endNum,3,4)
}
else {
areaCode=substr(endNum,1,3); series=substr(endNum,4,3)
}
startCLI=substr(startNum,7,4); endCLI=substr(endNum,7,4);
print "Add", areaCode, series, startCLI, endCLI, termPrefix
}' file
Add,55,5701,7101,7101,1694
Add,55,1558,8561,5614,084

Argument not recognised/accesed by egrep - Shell

Egrep and Awk to output columns of a line , with a specific value for the first column
I am to tasked to write a shell program which when ran as such
./tool.sh -f file -id id OR ./tool.sh -id id -f file
must output the name surname and birthdate (3 columns of the file ) for that specific id.
So far my code is structured as such :
elif [ "$#" -eq 4 ];
then
while [ "$1" != "" ];
do
case $1 in
-f)
cat < "$2" | egrep '"$4"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
;;
-id)
cat < "$4" | egrep '"$2"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
esac
done
(Ignoring the opening elif cause there are more subtasks for later)
My output is nothing. The program just runs.
I've tested the cat < people.dat | egrep '125' | awk ' {print $3 "\t" $2 "\t" $5}'
and it runs just fine.
I also had an instance where i had an output from the program while it was run like so
cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'
but it wasnt only that specific ID.
`egrep "$4"` was correct instead of `egrep '["$4"]'` in
`cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'`
Double quotes allow variables, single quotes don't. No commands need
certain types of quotes, they are purely a shell feature that are not
passed to the command. mentioned by(#that other guy)

Length of a specific field, and showing the record in much easier way

My goal is to find out the length of the second field and if the length is more than five characters, then I need to show the entire record using shell scripts/command.
echo "From the csv file"
cat latency.csv |
while read line
do
latency=`echo $line | cut -d"," -f2 | tr -d " "`
length=$(echo ${#latency})
if [ $length -gt 5 ]
then
echo $line
fi
done
There is nothing wrong with my code, but being UNIX/Linux, I thought there should be a simpler way of doing such things.
Is there one such simpler method?
awk -F, 'length($2)>5' file
this should work
updated
awk -F, '{a=$0;gsub(/ /,"",$2);if(length($2)>5)print a}' file
awk -F, '{
t = $2
gsub(/ /, x, t)
if (length(t) > 5)
print
}' latency.csv
Or:
perl -F, -ane'
print if
$F[1] =~ tr/ //dc > 5
' latency.csv

Resources