awk command fails with command substitution - bash

Running this command fails:
$(printf "awk '{%sprint}'" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')) file.txt
It gives the following error:
awk: '
awk: ^ invalid char ''' in expression
However, if I run the substituted command, I get this:
awk '{gsub("ACB",1);gsub("ASW",2);gsub("BEB",3);gsub("CDX",4);gsub("CEU",5);gsub("CHB",6);gsub("CHS",7);gsub("CLM",8);gsub("ESN",9);gsub("FIN",10);gsub("GBR",11);gsub("GIH",12);gsub("GWD",13);gsub("IBS",14);gsub("ITU",15);gsub("JPT",16);gsub("KHV",17);gsub("LWK",18);gsub("MSL",19);gsub("MXL",20);gsub("PEL",21);gsub("PJL",22);gsub("PUR",23);gsub("STU",24);gsub("TSI",25);gsub("YRI",26);print}'
which I can run like so:
awk '{gsub("ACB",1);gsub("ASW",2);gsub("BEB",3);gsub("CDX",4);gsub("CEU",5);gsub("CHB",6);gsub("CHS",7);gsub("CLM",8);gsub("ESN",9);gsub("FIN",10);gsub("GBR",11);gsub("GIH",12);gsub("GWD",13);gsub("IBS",14);gsub("ITU",15);gsub("JPT",16);gsub("KHV",17);gsub("LWK",18);gsub("MSL",19);gsub("MXL",20);gsub("PEL",21);gsub("PJL",22);gsub("PUR",23);gsub("STU",24);gsub("TSI",25);gsub("YRI",26);print}' file.txt
And it works perfectly. What am I doing wrong?
#ChrisLear gave me a working solution, but I still don't quite understand what the command solution is doing. Here's the working code:
$(printf "awk {%sprint}" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')) file.txt
The single quotes around {%sprint} are removed. Why do those single quotes break the command substitution?
edit: changed backtick to $(...) notation. Also added solution I don't understand.

Try removing the quotes from the command being generated.
`printf "awk {%sprint}" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')` file.txt
For an explanation, see the accepted answer at Why does command substitution change how quoted arguments work?

It looks like you're trying to take a bunch of unique 2nd fields from a file starting at line 2 and map those to numbers based on their alphabetic ordering, then apply the change to the same file. If so then with GNU awk for sorted_in and inplace editing that'd be:
awk -i inplace '
NR==FNR {
if (NR>1) {
map[$2]
}
next
}
FNR==1 {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (str in map) {
map[str] = ++i
}
}
{
$2 = map[$2]
print
}
' file.txt
If that's not what you need then edit your question to show concise, testable sample input and expected output.

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

Awk is overwriting letters when printing reversed order, why?

I'm currently using awk to replicate the function uniq -c with commas as delimiters.
This gives correct output:
$ cut --delimiter=, -s -f2 wordlist.csv | awk '{ cnts[$0] += 1 } END { for (v in cnts) print cnts[v], v}' OFS="," | head
2,laecherlichen
111,doctrine
1,cremonas
1,embedding
1,conincks
2,similiter
1,mitgesellen
1,hysnelement
1,geringem
1,aquarian
However, if I reverse the awk command print cnts[v], v into print v, cnts[v], I get a messed up output:
$ cut --delimiter=, -s -f2 wordlist.csv | awk '{ cnts[$0] += 1 } END { for (v in cnts) print v, cnts[v]}' OFS="," | head
,2echerlichen
,111rine
,1emonas
,1bedding
,1nincks
,2militer
,1tgesellen
,1snelement
,1ringem
,1uarian
I'm confused by this output, because I'm expecting something like word,1 as output. What is the problem?
Most likely you have DOS line feed characters i.e. \r before end of line \n. You can use RS variable in awk to ignore this:
cut --delimiter=, -s -f2 wordlist.csv | awk -v RS='\r|\n' '{
cnts[$0] += 1 } END { for (v in cnts) print cnts[v], v}' OFS="," | head
However if you show your csv file I believe even cut and head can be removed from above commands.
PS: Thanks to #Bammar you can also run:
dos2unix file.csv
to convert your csv file to unix compatible file.

SSH call inside ruby, using %x

I am trying to make a single line ssh call from a ruby script. My script takes a hostname, and then sets out to return the hostname's machine info.
return_value = %x{ ssh #{hostname} "#{number_of_users}; #{number_of_processes};
#{number_of_processes_running}; #{number_of_processes_sleeping}; "}
Where the variables are formatted like this.
number_of_users = %Q(users | wc -w | cat | awk '{print "Number of Users: "\$1}')
number_of_processes = %Q(ps -el | awk '{print $2}' | wc -l | awk '{print "Number of Processes: "$1}')
I have tried both %q, %Q, and just plain "" and I cannot get the awk to print anything before the output. I either get this error (if I include the colon)
awk: line 1: syntax error at or near :
or if I don't include the slash in front of $1 I just get empty output for that line. Is there any solution for this? I thought it might be because I was using %q, but it even happens with just double quotes.
Use backticks to capture the output of the command and return the output as a string:
number_of_users = `users | wc -w | cat | awk '{print "Number of Users:", $1}'`
puts number_of_users
Results on my system:
48
But you can improve your pipeline:
users | awk '{ print "Number of Users:", NF }'
ps -e | awk 'END { print "Number of Processes:", NR }'
So the solution to this problem is:
%q(users | wc -w | awk '{print \"Number of Users: \"\$1}')
Where you have to use %q, not %, not %Q, and not ""
You must backslash double quotes and the dollar sign in front of any awk variables
If somebody could improve upon this answer by explaining why, that would be most appreciated
Though as Steve pointed out I could have improved my code using users | awk '{ print \"Number of Users:\", NF }'
In which case there is no need to backslash the NF.

bash awk first 1st column and 3rd column with everything after

I am working on the following bash script:
# contents of dbfake file
1 100% file 1
2 99% file name 2
3 100% file name 3
#!/bin/bash
# cat out data
cat dbfake |
# select lines containing 100%
grep 100% |
# print the first and third columns
awk '{print $1, $3}' |
# echo out id and file name and log
xargs -rI % sh -c '{ echo %; echo "%" >> "fake.log"; }'
exit 0
This script works ok, but how do I print everything in column $3 and then all columns after?
You can use cut instead of awk in this case:
cut -f1,3- -d ' '
awk '{ $2 = ""; print }' # remove col 2
If you don't mind a little whitespace:
awk '{ $2="" }1'
But UUOC and grep:
< dbfake awk '/100%/ { $2="" }1' | ...
If you'd like to trim that whitespace:
< dbfake awk '/100%/ { $2=""; sub(FS "+", FS) }1' | ...
For fun, here's another way using GNU sed:
< dbfake sed -r '/100%/s/^(\S+)\s+\S+(.*)/\1\2/' | ...
All you need is:
awk 'sub(/.*100% /,"")' dbfake | tee "fake.log"
Others responded in various ways, but I want to point that using xargs to multiplex output is rather bad idea.
Instead, why don't you:
awk '$2=="100%" { sub("100%[[:space:]]*",""); print; print >>"fake.log"}' dbfake
That's all. You don't need grep, you don't need multiple pipes, and definitely you don't need to fork shell for every line you're outputting.
You could do awk ...; print}' | tee fake.log, but there is not much point in forking tee, if awk can handle it as well.

Why uniq -c output with space instead of \t?

I use uniq -c some text file.
Its output like this:
123(space)first word(tab)other things
2(space)second word(tab)other things
....
So I need extract total number(like 123 and 2 above), but I can't figure out how to, because if I split this line by space, it will like this ['123', 'first', 'word(tab)other', 'things'].
I want to know why doesn't it output with tab?
And how to extract total number in shell? ( I finally extract it with python, WTF)
Update: Sorry, I didn't describe my question correctly. I didn't want to sum the total number, I just want to replace (space) with (tab), but it doesn't effect the space in words, because I still need the data after. Just like this:
123(tab)first word(tab)other things
2(tab)second word(tab)other things
Try this:
uniq -c | sed -r 's/^( *[^ ]+) +/\1\t/'
Try:
uniq -c text.file | sed -e 's/ *//' -e 's/ /\t/'
That will remove the spaces prior to the line count, and then replace only the first space with a tab.
To replace all spaces with tabs, use tr:
uniq -c text.file | tr ' ' '\t'
To replace all continuous runs of tabs with a single tab, use -s:
uniq -c text.file | tr -s ' ' '\t'
You can sum all the numbers using awk:
awk '{s+=$1}END{print s}'
$ cat <file> | uniq -c | awk -F" " '{sum += $1} END {print sum}'
One possible solution to getting tabs after counts is to write a uniq -c-like script that formats exactly how you want. Here's a quick attempt (that seems to pass my minute or so of testing):
awk '
(NR == 1) || ($0 != lastLine) {
if (NR != 1) {
printf("%d\t%s\n", count, lastLine);
}
lastLine = $0;
count = 1;
next;
}
{
count++;
}
END {
printf("%d\t%s\n", count, lastLine);
}
' yourFile.txt
Another solution. This is equivalent to the earlier sed solution, but it does use awk as requested / tagged!
cat yourFile.txt \
| uniq -c \
| awk '{
match($0, /^ *[^ ]* /);
printf("%s\t%s\n", $1, substr($0, RLENGTH + 1));
}'
Based on William Pursell answer , if you like Perl compatible regular expressions (PCRE) maybe a more elegant and modern way would be
perl -pe 's/ *(\d+) /$1\t/'
Options are to execute (-e) and print (-p).

Resources