How do I pipe commands inside for loop in bash - bash

I am writing a bash script to iterate through file lines with given value.
The command I am using to list the possible values is:
cat file.csv | cut -d';' -f2 | sort | uniq | head
When I use it in for loop like this it stops working:
for i in $( cat file.csv | cut -d';' -f2 | sort | uniq | head )
do
//do something else with these lines
done
How can I use piped commands in for loop?

You can use this awk command to get sum of 3rd column for each unique value of 2nd columns:
awk -F ';' '{sums[$2]+=$3} END{for (i in sums) print i ":", sums[i]}' file.csv
Input data:
asd;foo;0
asd;foo;2
asd;bar;1
asd;foo;4
Output:
foo: 6
bar: 1

Related

How can I sort and list the word frequency in the second column?

My data input looks like this:
1RDD4_00022_02842 o220
1RDD4_00024_03137 o132
1RDD4_00035_05208 o216
1RDD4_00045_05573 o132
1RDD4_00046_02134 o132
1RDD4_00051_04040 o154
In numerical order, I want to sort and list the frequency of words in the right column so the output looks like this:
o132 3
o154 1
o216 1
o220 1
I've tried the following pipeline, but it only works on the input's left column, and I don't know how to modify for the right column:
sed 's/\.//g;s/\(.*\)/\L\1/;s/\ /\n/g' inputfile | sort | uniq -c
use
cat inputfile | cut -f2
or
cat inputfile | awk '{print $2}'
(heavy)
to select second column only
cat inputfile | cut -f2 | sort | uniq -c

Error calling system() within awk

I'm trying to execute a system command to find out how many unique references a csv file has in its first seven characters as part of a larger awk script that processes the same csv file. There are duplicate entries and I don't want awk to parse the whole file twice so I'm avoiding NR. The gist of this part of the script is:
#!/bin/bash
awk '
{
#do some stuff, then when finished, count the number of unique references
productFile="BusinessObjects.csv";
systemCall = sprintf( "cat %s | cut -c 1-7 | sort | uniq | wc -l", $productFile );
productCount=`system( systemCall )`-1; #subtract 1 to remove column label row
}' < BusinessObjects.csv
And the interpreter doesn't like it:
awk: cmd. line:19: ^ syntax error ./awkscript.sh: line 38: syntax error near unexpected token '('
./awkscript.sh: line 38: systemCall = sprintf( "cat %s | cut -c 1-7 | sort | uniq | wc -l", $productFile );
If I hard-code the system command
productCount=`system( "cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l" )`-1;
I get:
./awkscript.sh: command substitution: line 39: syntax error near unexpected token '"cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l"'
./awkscript.sh: command substitution: line 39: 'system( "cat BusinessObjects.csv | cut -c 1-7 | sort | uniq | wc -l" )'
Technically, I could do this outside of awk at the start of the shell script, store the result in a system variable, and then pass it to awk using -v, but it's not great for the readability of the awk script (it's a few hundred lines long). Do I have a space or quotes in the wrong place? I've tried fiddling, but I can't seem to present the call to system() in a way that the interpreter will accept. Finally, is there a more sensible way to do this?
Edit: the csv file is indeed semicolon-delimited, so it's best to cut using the delimiter rather than the number of chars (thanks!).
ProductRef;Data1;Data2;etc
1234567;etc;etc;etc
Edit 2:
I'm trying to parse a csv file whose first column is full of N unique product references, and create a series of associated HTML pages that include a "Page n of N" information field. It's (painfully obviously) the first time I've used awk, but it seemed like an appropriate tool for parsing csv files. I'm trying to hence count and return the number of unique references. At the shell
cut -d\; -f1 BusinessObjects.csv | sort | uniq | wc -l
works fine, but I can't get it working inside awk by doing
#!/bin/bash
if [ -n "$1" ]
then
productFile=$1
else
echo "Missing product file argument."
exit
fi
awk -v productFile=$productFile '
BEGIN {
FS=";";
productCount = 0;
("cut -d\"\;\" -f1 " productFile " | sort | uniq | wc -l") | getline productCount;
productCount -=1; #remove the column label row
}
{
print productCount;
}'
I get a syntax error on the cut code if I don't wrap the semicolon in \"\;\" and the script just hangs without printing anything when I do.
I don't remember that you can use backticks in awk.
productCount=`system( systemCall )`-1; #subtract 1 to remove column label row
You can read your output by not using system and running your command directly, and using getline instead:
systemCall | getline productCount
productCount -= 1
Or more completely
productFile = "BusinessObjects.csv"
systemCall = "cut -c 1-7 " productFile " | sort | uniq | wc -l"
systemCall | getline productCount
productCount -= 1
No need to use sprintf and include cat.
Assigning strings to variables is also optional. You can just have "xyz" | getline ....
sort | uniq can just be sort -u if supported.
Quoting may be necessary if filename has spaces or characters that may confuse the command.
getline may alter global variables differently from expected. See https://www.gnu.org/software/gawk/manual/html_node/Getline.html.
Could something like this be an option?
$ cat productCount.sh
#!/bin/bash
if [ -n "$1" ]
then
productCount=`cat $1 | cut -c 1-7 | sort | uniq | wc -l`
echo $productCount
else
echo "please supply a filename as parameter"
fi
$ ./productCount.sh BusinessObjects.csv
9

How to sort in bash?

I have several lines in file, just like below:
/adbc/eee/ddd/baa/
/adbc/fff/ddd/ccc/avfff/
/adbc/ccc/ddd/b/
/adbc/fff/ddd/c/
/adbc/ccc/ddd/bf/
/adbc/ccc/ddd/bc/
The sort algorithm must first get the string before last /, that is:
baa
avfff
b
c
bf
bc
and then sort it by the first character, and then the length of the string, and then alphabetically.
The expected result is
/adbc/fff/ddd/ccc/avfff/
/adbc/ccc/ddd/b/
/adbc/ccc/ddd/bc/
/adbc/ccc/ddd/bf/
/adbc/eee/ddd/baa/
/adbc/fff/ddd/c/
You could use awk in a pre-processing step to add 3 columns all based on the field of the interest, feed to sort, then use cut to discard the extra fields
awk -F'/' -v OFS="/" '{x=substr($(NF-1), 1, 1);
print(x, length($(NF-1)), $(NF-1), $0)}' file.txt |
sort -k1,1 -k2,2n -k3,3 -t'/' |
cut -f4- -d'/'
/adbc/fff/ddd/ccc/avfff/
/adbc/ccc/ddd/b/
/adbc/ccc/ddd/bc/
/adbc/ccc/ddd/bf/
/adbc/eee/ddd/baa/
/adbc/fff/ddd/c/
cat sortthisfile | while read line
do
field=$( echo $line | sed -e 's:/$::' -e 's:.*/::' )
firstchar=${field:0:1}
fieldlen=${#field}
echo "${firstchar},${fieldlen},${field},${line}"
done | sort-k1,1 -k2,2n -k3,3 -t, | sed 's:.*,/::'
Obviously, sortthisfile is the name of your file.

Unix uniq command to CSV file

I have a text file (list.txt) containing single and multi-word English phrases. My goal is to do a word count for each word and write the results to a CSV file.
I have figured out the command to write the amount of unique instances of each word, sorted from largest to smallest. That command is:
$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | less > output.txt
The problem is the way the new file (output.txt) is formatted. There are 3 leading spaces, followed by the number of occurrences, followed by a space, followed by the word. Then on to a next line. Example:
9784 the
6368 and
4211 for
2929 to
What would I need to do in order to get the results in a more desired format, such as CSV? For example, I'd like it to be:
9784,the
6368,and
4211,for
2929,to
Even better would be:
the,9784
and,6368
for,4211
to,2929
Is there a way to do this with a Unix command, or do I need to do some post-processing within a text editor or Excel?
Use awk as follows:
> cat input
9784 the
6368 and
4211 for
2929 to
> cat input | awk '{ print $2 "," $1}'
the,9784
and,6368
for,4211
to,2929
You full pipeline will be:
$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2 "," $1}' > output.txt
use sed to replace the spaces with comma
cat extra_set.txt | sort -i | uniq -c | sort -nr | sed 's/^ *//g' | sed 's/ /\, /'

Storing the grep output in loop

This grep commands prints the numbers (the count of groups merged)
grep "merged" sombe_conversion_PSTN.sh.sql.log | awk '{print $1}' | sed 's/ //g'
The Output is as follows:
1000000
41474
41543
83410
83153
83085
82861
82904
82715
41498
41319
I need to add the data from second to last row of output and store it in a variable and
first element in a different variable.
for example :
var_num=1000000
sum_others=663962
How do i loop and add the variables?
Do it twice. If your list of numbers is in the file output, do
$ var_num=$(cat output | head -1)
$ sum_others=$(cat output | sed '1d' | awk '{s += $1} END {print s}')

Resources