Shell Script to generate specific columns as separate files

Shell Script to generate specific columns as separate files - bash

I want to print my first column and 2nd column from radius.dat and save it to rad.2.out, first column with 3rd column as rad.3.out, and so on.
However, this script doesn't seem to be working.
#!/bin/bash
for i in {2..30}
do
awk '{print $1, $i}' radius.dat > 'rad.'$i'.out'
done

Using awk you can do:
awk '{for(i=2;i<=NF;i++) print $1, $i > ("rad."i".out")}' radius.dat
The only caveat is that it will lead to many open files, it might not be a problem if you are not on ancient awk.
What we are doing here is basically using an iterator and iterating through columns starting from the second and printing the first column and the iterator during each iteration to an output file using the naming convention as you desire.
Update (based on your comment to your question):
If you notice too many open files error then you can do:
awk '{
for (i=2; i<=NF; i++) {
print $1, $i >> ("rad."i".out");
close("rad."i".out")
}
}' file
Notice in the second option we use >> instead of >. This is due to the fact that we are closing the file after each iteration so we need to make sure we don't overwrite the existing files.

Your quoting is quite off ... awk never gets the column. Try this:
#!/bin/bash
for i in {2..30}; do
awk "{print \$1, \$$i;}" radius.dat > "rad.$i.out"
done

Related

How do I pass a stored value as the column number parameter to edit in awk?

I have a .dat file with | separator and I want to change the value of the column which is defined by a number passed as argument and stored in a var. My code is
awk -v var="$value" -F'|' '{ FS = OFS = "|" } $1=="$id" {$"\{$var}"=8}1'
myfile.dat > tmp && mv tmp myfiletemp.dat
This changes the whole line to 8, obviously doesn't work. I was wondering what is the right way to write this part
{$"\{$var}"=8}1
For example, if I want to change the fourth column to 8 and I have value=4, how do I get {$4=8}?

The other answer is mostly correct, but just wanted to add a couple of notes, in case it wasn't totally clear.
Referring to a variable with a $ in front of it turns it in to a reference to the column. So i=3; print $i; print i will print the third column and then the number 3.
Putting all your variables in the command line will avoid any problems with trying to include bash variables inside your single-quoted awk code, which won't work.
You can let awk do the output to the specific file instead of relying on bash to redirect output and move files.
The -F option on the command line specifies FS for you, so no need to redeclare it in your code.
Here's how I would do this:
#!/bin/bash
column=4
value=8
id=1
awk -v col="$column" -v val="$value" -v id="$id" -F"|" '
BEGIN {OFS="|"}
{$1==id && $col=val; print > "myfiletemp.dat"}
' myfile.dat

you can refer to the awk variable directly by it's name, slight rewrite of your script with correct reference to column number var...
awk -F'|' -v var="$value" 'BEGIN{OFS=FS} $1=="$id"{$var=8}1'
should work as long as $value is a number. If id is another bash variable, pass it the same way as an awk variable
awk -F'|' -v var="$value" -v id="$id" 'BEGIN{OFS=FS} $1==id{$var=8}1'

Not only can you use a number in a variable by putting a $ in front of it, you can also use put a $ in front of an expression!
$ date | tee /dev/stderr | awk '{print $(2+2)}'
Mon Aug 3 12:47:39 CDT 2020
12:47:39

How do I write an awk print command in a loop?

I would like to write a loop creating various output files with the first column of each input file, respectively.
So I wrote
for i in $(\ls -d /home/*paired.isoforms.results)
do
awk -F"\t" {print $1}' $i > $i.transcript_ids.txt
done
As an example if there were 5 files in the home directory named
A_paired.isoforms.results
B_paired.isoforms.results
C_paired.isoforms.results
D_paired.isoforms.results
E_paired.isoforms.results
I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called
A.transcript_ids.txt
B.transcript_ids.txt
C.transcript_ids.txt
D.transcript_ids.txt
E.transcript_ids.txt
or any other name as long as it is 5 different names and I can still link them back to the original files.
I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that.
Is it possible to write a command like this in a loop?

This should do the job:
for file in /home/*paired.isoforms.results
do
base=${file##*/}
base=${base%%_*}
awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt
done
I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.

You can do that entirely in awk:
awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results
If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ).
My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes:
awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results

prevent duplicate variable and print using awk statement

I am iterating through a file and printing a set of values using awk
echo $value | awk ' {print $4}' >> 'some location'
the command works fine , but I want to prevent the duplicate values being stored in the file
Thanks in advance.

Instead of processing the file line by line, you should use a single awk command for the entire file
For example:
awk '!a[$4]++{print $4}' file >> 'some location'
Will only keep the unique values of the fourth column

Using only one instance of awk as suggested by user000001 is certainly the right thing to do, and since very little detail is given in the question this is pure speculation, but the simplest solution may be a trivial refactor of your loop. For example, if the current code is:
while ...; do
...
echo $value | awk ...
...
done
You can simply change it to:
while ...; do
...
echo $value >&5
...
done 5>&1 | awk '!a[$4]++{print $4}' >> /p/a/t/h
Note that although this is a "simple" fix in terms of code to change, it is almost certainly not the correct fix! Removing the while loop completely and just using awk is the right thing to do.

Remove first columns then leave remaining line untouched in awk

I am trying to use awk to remove first three fields in a text file. Removing the first three fields is easy. But the rest of the line gets messed up by awk: the delimiters are changed from tab to space
Here is what I have tried:
head pivot.threeb.tsv | awk 'BEGIN {IFS="\t"} {$1=$2=$3=""; print }'
The first three columns are properly removed. The Problem is the output ends up with the tabs between columns $4 $5 $6 etc converted to spaces.
Update: The other question for which this was marked as duplicate was created later than this one : look at the dates.

first as ED commented, you have to use FS as field separator in awk.
tab becomes space in your output, because you didn't define OFS.
awk 'BEGIN{FS=OFS="\t"}{$1=$2=$3="";print}' file
this will remove the first 3 fields, and leave rest text "untouched"( you will see the leading 3 tabs). also in output the <tab> would be kept.
awk 'BEGIN{FS=OFS="\t"}{print $4,$5,$6}' file
will output without leading spaces/tabs. but If you have 500 columns you have to do it in a loop, or use sub function or consider other tools, cut, for example.

Actually this can be done in a very simple cut command like this:
cut -f4- inFile

If you don't want the field separation altered then use sed to remove the first 3 columns instead:
sed -r 's/(\S+\s+){3}//' file
To store the changes back to the file you can use the -i option:
sed -ri 's/(\S+\s+){3}//' file

awk '{for (i=4; i<NF; i++) printf $i " "; print $NF}'

Processing CSV items one by one using awk

Using the following script to access CSV items.
#!/bin/bash
awk -F "|" 'NR > 0 {print $1}' UserAgents.csv
When running the script I am getting the correct output, i.e. the entire set of values in the first 'column' of the CSV are printed to the terminal. What I would like to add is to read these items one by one and perform some operation on them like concatenate it with a string, and then output them (to file, pipe, or terminal) one by one.

This should make it clear what your awk script is doing:
awk -F '|' '{
print NR, NF, $1, "with some trailing text"
}' UserAgents.csv

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Shell Script to generate specific columns as separate files - bash

I want to print my first column and 2nd column from radius.dat and save it to rad.2.out, first column with 3rd column as rad.3.out, and so on. However, this script doesn't seem to be working. #!/bin/bash for i in {2..30} do awk '{print $1, $i}' radius.dat > 'rad.'$i'.out' done

Your quoting is quite off ... awk never gets the column. Try this: #!/bin/bash for i in {2..30}; do awk "{print \$1, \$$i;}" radius.dat > "rad.$i.out" done

Related

How do I pass a stored value as the column number parameter to edit in awk?

How do I write an awk print command in a loop?

prevent duplicate variable and print using awk statement

Remove first columns then leave remaining line untouched in awk

Processing CSV items one by one using awk

Categories

Resources