Print a comma except on the last line in Awk - shell

I have the following script
awk '{printf "%s", $1"-"$2", "}' $a >> positions;
where $a stores the name of the file. I am actually writing multiple column values into one row. However, I would like to print a comma only if I am not on the last line.

Single pass approach:
cat "$a" | # look, I can use this in a pipeline!
awk 'NR > 1 { printf(", ") } { printf("%s-%s", $1, $2) }'
Note that I've also simplified the string formatting.

Enjoy this one:
awk '{printf t $1"-"$2} {t=", "}' $a >> positions
Yeh, looks a bit tricky at first sight. So I'll explain, first of all let's change printf onto print for clarity:
awk '{print t $1"-"$2} {t=", "}' file
and have a look what it does, for example, for file with this simple content:
1 A
2 B
3 C
4 D
so it will produce the following:
1-A
, 2-B
, 3-C
, 4-D
The trick is the preceding t variable which is empty at the beginning. The variable will be set {t=...} only on the next step of processing after it was shown {print t ...}. So if we (awk) continue iterating we will got the desired sequence.

I would do it by finding the number of lines before running the script, e.g. with coreutils and bash:
awk -v nlines=$(wc -l < $a) '{printf "%s", $1"-"$2} NR != nlines { printf ", " }' $a >>positions
If your file only has 2 columns, the following coreutils alternative also works. Example data:
paste <(seq 5) <(seq 5 -1 1) | tee testfile
Output:
1 5
2 4
3 3
4 2
5 1
Now replacing tabs with newlines, paste easily assembles the date into the desired format:
<testfile tr '\t' '\n' | paste -sd-,
Output:
1-5,2-4,3-3,4-2,5-1

You might think that awk's ORS and OFS would be a reasonable way to handle this:
$ awk '{print $1,$2}' OFS="-" ORS=", " input.txt
But this results in a final ORS because the input contains a newline on the last line. The newline is a record separator, so from awk's perspective there is an empty last record in the input. You can work around this with a bit of hackery, but the resultant complexity eliminates the elegance of the one-liner.
So here's my take on this. Since you say you're "writing multiple column values", it's possible that mucking with ORS and OFS would cause problems. So we can achieve the desired output entirely with formatting.
$ cat input.txt
3 2
5 4
1 8
$ awk '{printf "%s%d-%d",t,$1,$2; t=", "} END{print ""}' input.txt
3-2, 5-4, 1-8
This is similar to Michael's and rook's single-pass approaches, but it uses a single printf and correctly uses the format string for formatting.
This will likely perform negligibly better than Michael's solution because an assignment should take less CPU than a test, and noticeably better than any of the multi-pass solutions because the file only needs to be read once.

Here's a better way, without resorting to coreutils:
awk 'FNR==NR { c++; next } { ORS = (FNR==c ? "\n" : ", "); print $1, $2 }' OFS="-" file file

awk '{a[NR]=$1"-"$2;next}END{for(i=1;i<NR;i++){print a[i]", " }}' $a > positions

Related

How can one dynamically create a new csv from selected columns of another csv file?

I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas.
I save these indexes to an array that looks like
echo "${cols_needed[#]}"
1 3 4 7 8
I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.
awk -v fields=${cols_needed[#]} 'BEGIN{ n = split(fields,f) }
{ for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv
This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.
fields="1 2’
I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so
for item in "${cols_needed[#]}";
do
cat test.csv | cut -f$item
done
which outputs one column at a time.
And I have tried to dynamically create it with commas - input as 1,3,4,7...
cat test.csv | cut -f${cols_needed[#]};
which also does not work!
Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.
Your first approach is ok, just:
change -v fields=${cols_needed[#]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
ie. BEGIN {n = split(fields, f); FS=OFS=","}
Also, if there are no commas embedded in quoted csv fields, you can use cut:
IFS=,; cut -d, -f "${cols_needed[*]}" test.csv
If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.
Here's an example using that.
# prepend $ to each number
for i in "${cols_needed[#]}"; do
fields[j++]="\$$i"
done
IFS=,
gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"
Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.
Expanding on my comments re: passing the bash array into awk:
Passing the array in as an awk variable:
$ cols_needed=(1 3 4 7 8)
$ typeset -p cols_needed
declare -a cols_needed=([0]="1" [1]="3" [2]="4" [3]="7" [4]="8")
$ awk -v fields="${cols_needed[*]}" 'BEGIN{n=split(fields,f); for (i=1;i<=n;i++) print i,f[i]}'
1 1
2 3
3 4
4 7
5 8
Passing the array in as a 'file' via process substitution:
$ awk 'FNR==NR{f[++n]=$1;next} END {for (i=1;i<=n;i++) print i,f[i]}' <(printf "%s\n" "${cols_needed[#]}")
1 1
2 3
3 4
4 7
5 8
As for OP's main question of extracting a specific set of columns from a .csv file ...
Borrowing dawg's .csv file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
Expanding on the suggestion for passing the bash array in as an awk variable:
awk -v fields="${cols_needed[*]}" '
BEGIN { FS=OFS=","
n=split(fields,f," ")
}
{ pfx=""
for (i=1;i<=n;i++) {
printf "%s%s", pfx, $(f[i])
pfx=OFS
}
print ""
}
' file.csv
NOTE: this assumes OP has provided a valid list of column numbers; if there's some doubt as to the validity of the input (column) numbers then OP can add some logic to address said doubts (eg, are they integers? are they positive integers? do they reference a field (in file.csv) that actually exists?, etc)
This generates:
1,3,4,7,8
11,13,14,17,18
21,23,24,27,28
Suppose you have this variable in bash:
$ echo "${cols_needed[#]}"
3 4 7 8
And this CSV file:
$ cat file.csv
1,2,3,4,5,6,7,8
11,12,13,14,15,16,17,18
21,22,23,24,25,26,27,28
You can select columns of that csv file in awk this way:
awk '
BEGIN{FS=OFS=","}
FNR==NR{split($0, cols," "); next}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' <(echo "${cols_needed[#]}") file.csv
Prints:
3,4,7,8
13,14,17,18
23,24,27,28
Or, you can do:
awk -v cw="${cols_needed[*]}" '
BEGIN{FS=OFS=","; split(cw, cols," ")}
{
s=""
for (e=1;e<=length(cols); e++)
s=e<length(cols) ? s $(cols[e]) OFS : s $(cols[e])
print s
}' file.csv
# same output
BTW, you can do this entirely with cut:
cut -d ',' -f $(IFS=, ; echo "${cols_needed[*]}") file.csv
3,4,7,8
13,14,17,18
23,24,27,28

I want to use awk to print rearranged fields then print from the 4th field to the end

I have a text file containing filesize, filedate, filetime, and filepath records. The filepath can contain spaces and can be very long (classical music names). I would like to print the file with filedate, filetime, filesize, and filepath. The first part, without the filepath is easy:
awk '{print $2,$3,$1}' filelist.txt
This works, but it prints the record on two lines:
awk '{print $2,$3,$1,$1=$2=$3=""; print $0}' filelist.txt
I've tried using cut -d' ' -f '2 3 1 4-' , but that doesn't allow rearranging fields. I can fix the two line issue using sed to join. There must be a way to only use awk. In summary, I want to print the 2nd, 3rd, 1st, and from the 4th field to the end. Can anyone help?
Since the print statement in awk always prints a newline in the end (technically ORS, which defaults to a newline), your first print will break the output in two lines.
With printf, on the other hand, you completely control the output with your format string. So, you can print the first three fields with printf (without the newline), then set them to "", and just finish off with the print $0 (which is equivalent to print without arguments):
awk '{ printf("%s %s %s",$2,$3,$1); $1=$2=$3=""; print }' file
I avoid awk when I can. If I understand correctly what you have said -
while read size date time path
do echo "$date $time $size $path"
done < filelist.txt
You could printf instead of echo for more formatting options.
Embedded spaces in $path won't matter since it's the last field.
I have no awk at hand to test but I suppose you may use printf to format a one-line output. Just locate the third space in $0 and take a substring from that position through the end of the input line.
You may also try to swap fields before a standard print, although I'm not sure it will produce desired results...
It always helps to delimit your fields with something like <tab>, so subsequent operations are easier... (I can see you used cut without -d, so maybe your data is already tab delimited.)
echo 1 2 3 very long name |
sed -e 's/ /\t/' -e 's/ /\t/' -e 's/ /\t/' |
awk -v FS='\t' -v OFS='\t' '{print $2, $3, $1, $4}'
The first line generates data. The sed command substitutes first three spaces in each row with \t. Then the awk works flawlessly, outputting tab delimited data again (you need a reasonably new awk).
With GNU awk for gensub():
$ echo '1 2 3 4 5 6' | awk '{print $3, $2, $1, gensub(/([^ ]+){3}/,"",1)}'
3 2 1 4 5 6
With any awk:
$ echo '1 2 3 4 5 6' | awk '{rest=$0; sub(/([^ ]+ ){3}/,"",rest); print $3, $2, $1, rest}'
3 2 1 4 5 6

Can I have multiple awk actions without inserting newlines?

I'm a newbie with very small and specific needs. I'm using awk to parse something and I need to generate uninterrupted lines of text assembled from several pieces in the original text. But awk inserts a newline in the output whenever I use a semicolon.
Simplest example of what I mean:
Original text:
1 2
awk command:
{ print $1; print $2 }
The output will be:
1
2
The thing is that I need the output to be a single line, and I also need to use the semicolons, because I have to do multiple actions on the original text, not all of them print.
Also, using ORS=" " causes a whole lot of different problems, so it's not an option.
Is there any other way that I can have multiple actions in the same line without newline insertion?
Thanks!
The newlines in the output are nothing to do with you using semicolons to separate statements in your script, they are because print outputs the arguments you give it followed by the contents of ORS and the default value of ORS is newline.
You may want some version of either of these:
$ echo '1 2' | awk '{printf "%s ", $1; printf "%s ", $2; print ""}'
1 2
$
$ echo '1 2' | awk -v ORS=' ' '{print $1; print $2; print "\n"}'
1 2
$
$ echo '1 2' | awk -v ORS= '{print $1; print " "; print $2; print "\n"}'
1 2
$
but it's hard to say without knowing more about what you're trying to do.
At least scan through the book Effective Awk Programming, 4th Edition, by Arnold Robbins to get some understanding of the basics before trying to program in awk or you're going to waste a lot of your time and learn a lot of bad habits first.
You have better control of the output if you use printf, e.g.
awk '{ printf "%s %s\n",$1,$2 }'
awk '{print $1 $2}'
Is the solution in this case
TL;DR
You're getting newlines because print sends OFS to standard output after each print statement. You can format the output in a variety of other ways, but the key is generally to invoke only a single print or printf statement regardless of how many fields or values you want to print.
Use Commas
One way to do this is to use a single call to print using commas to separate arguments. This will insert OFS between the printed arguments. For example:
$ echo '1 2' | awk '{print $1, $2}'
1 2
Don't Separate Arguments
If you don't want any separation in your output, just pass all the arguments to a single print statement. For example:
$ echo '1 2' | awk '{print $1 $2}'
12
Formatted Strings
If you want more control than that, use formatted strings using printf. For example:
$ echo '1 2' | awk '{printf "%s...%s\n", $1, $2}'
1...2
$ echo "1 2" | awk '{print $1 " " $2}'
1 2

creating a ":" delimited list in bash script using awk

I have following lines
380:<CHECKSUM_VALIDATION>
393:</CHECKSUM_VALIDATION>
437:<CHECKSUM_VALIDATION>
441:</CHECKSUM_VALIDATION>
I need to format it as below
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Is it possible to achieve above output using "awk"? [I'm using bash]
Thanks you!
Here you go:
awk -F '[:<>/]+' '{ n = $1; getline; print $2 ":" n ":" $1 }'
Explanation:
Set the field separator with -F to be a sequence of a mix of :<>/ characters, this way the first field will be the number, and the second will be CHECKSUM_VALIDATION
Save the first field in variable n and read the next line (which would overwrite $1)
Print the line: a combination of the number from the previous line, and the fields on the current line
Another approach without using getline:
awk -F '[:<>/]+' 'NR % 2 { n = $1 } NR % 2 == 0 { print $2 ":" n ":" $1 }'
This one uses the record counter NR to determine whether it's time to print: if NR is odd, save the first field in n, if NR is even, then print.
You can try this sed,
sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
Test:
sat:~$ sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Another way:
awk -F: '/<C/ {printf "CHECKSUM_VALIDATION:%d:",$1; next} {print $1}'
Here is one gnu awk
awk -F"[:\n<>]" 'NR==1{print $3,$1,$5;f=$3;next} $3{print f,$3,$7}' OFS=":" RS="</CH" file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Based on Jonas post and avoiding getline, this awk should do:
awk -F '[:<>/]+' '/<C/ {f=$1;next} { print $2,f,$1}' OFS=\: file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441

Join lines based on pattern

I have the following file:
test
1
My
2
Hi
3
i need a way to use cat ,grep or awk to give the following output:
test1
My2
Hi3
How can i achieve this in a single command? something like
cat file.txt | grep ... | awk ...
Note that its always a string followed by a number in the original text file.
sed 'N;s/\n//' file.txt
This should give the desired output when the content is in file.txt
paste -d "" - - < filename
This takes consecutive lines and pastes them together delimited by the empty string.
awk '{printf("%s", $0);} !(NR%2){printf("\n");}' file.txt
EDIT: I just noticed that your question requires the use of cat and grep. Both of those programs are unnecessary to achieve your stated aims. If you have some reason for including them that you haven't mentioned, try this (uselessly inefficient) version of the line I wrote immediately above:
cat file.txt | grep '^' | awk '{printf("%s", $0);} !(NR%2){printf("\n");}'
It is possible that this command uses features not present in the original awk program. You may need to invoke the new awk program, nawk instead.
If your input file is always 1 number then 1 string, and you only want the strings, all you have to do is take every other line.
If you only want the odd lines, you can do awk 'NR % 2' file.txt
If you want the evens, this becomes awk 'NR % 2==0' data
Here is the answer:
cat file.txt | awk 'BEGIN { lno = 0 } { val=$0; if (lno % 2 == 1) {printf "%s\n", $0} else {printf "%s", $0}; ++lno}'

Resources