Processing CSV items one by one using awk - bash

Using the following script to access CSV items.
#!/bin/bash
awk -F "|" 'NR > 0 {print $1}' UserAgents.csv
When running the script I am getting the correct output, i.e. the entire set of values in the first 'column' of the CSV are printed to the terminal. What I would like to add is to read these items one by one and perform some operation on them like concatenate it with a string, and then output them (to file, pipe, or terminal) one by one.

This should make it clear what your awk script is doing:
awk -F '|' '{
print NR, NF, $1, "with some trailing text"
}' UserAgents.csv

Related

Convert hex to dec using awk

I have a large csv files , where few columns values are in hex. I need to convert them into decimal. The CSV files are very big. So If I process each row , then it takes a lot of time to execute the script. So I want to know how this can be done parallely by using awk command
If I process the code line by line it works.
I process the files like this.
while read -r line;
do
start_time=`echo "$line"|awk -F "," '{ print $1 }'`
end_time=`echo "$line"|awk -F "," '{ print $2 }'`
st_time=$((16#$start_time))
en_time=$((16#$end_time))
Then I echo the required fields to output file.
Sample Input file:
16a91f90539,16a91f931a9,e,0
16a91f90bab,16a91f931a9,e,0
Expected output:
1557227177273,1557227188649,e,0
1557227178923,1557227188649,e,0
I need to know how the statement "((16#$start_time))" , can be used in awk.
I tried
awk -F',' '{OFS=",";}{print '"(($1#16))"','$en_time',$3'
But this syntax does not work.
With GNU awk for strtonum() you don't need to spawn multiple shells on each input line:
$ awk 'BEGIN{FS=OFS=","} {for (i=1;i<=2;i++) $i=strtonum("0x"$i)} 1' file
1557227177273,1557227188649,e,0
1557227178923,1557227188649,e,0
You can execute system calls from withnig awk with system(...). Don't forget to close the command afterwards.
awk -F "," '{ cmd=sprintf("echo $((0x%s))\n", $1); system(cmd); close(cmd); }' input
(for some reason the system call does not work with $((16#...)) on my system, but does work with $((0x...)))
With getline you can assign the echo'ed output to a variable. See https://www.gnu.org/software/gawk/manual/html_node/Getline-Notes.html to get you started.

Join two files with AWK, one file from console [duplicate]

I was wondering how do I get awk to take a string from the pipe output and a file?
I've basically have a chain of commands that eventually will spit out a string. I want to check this string against a csv file (columns separated by commas). Then, I want to find the first row in the file that contains the string in the 7th column of the csv file and print out the contents of the 5th column of that line. Also, I don't know linux command line utilities/awk too well, so feel free to suggest completely different methods. :)
CSV file contents look like this:
col1,col2,col3,col4,col5,etc...
col1,col2,col3,col4,col5,etc...
etc...
My general line of thought:
(rest of commands that will give a string) | awk -F ',' 'if($5 == string){print $7;exit}' filename.txt
Can this be done? If so, how do I tell awk to compare against that string?
I've found some stuff about using a - symbol with ARGV[] before the filename, but couldn't get it working.
As Karoly suggests,
str=$( rest of commands that will give a string )
awk -v s="$str" -F, '$7==s {print $5; exit}' file
If you want to feed awk with a pipe:
cmds | awk -F, 'NR==FNR {str=$0; next}; $7==str {print $5}' - file
I think the first option is more readable.

Shell script copying all columns of text file instead of specified ones

I trying to copy 3 columns from one text file and paste them into a new text file. However, whenever I execute this script, all of the columns in the original text file get copied. Here is the code I used:
cut -f 1,2,6 PROFILES.1.0.profile > compiledfile.txt
paste compiledfile.txt > myNewFile
Any suggestions as to what I'm doing wrong? Also, is there a simpler way to do this? Thanks!
Let's suppose that the input is comma-separated:
$ cat File
1,2,3,4,5,6,7
a,b,c,d,e,f,g
We can extract columns 1, 2, and 6 using cut:
$ cut -d, -f 1,2,6 File
1,2,6
a,b,f
Note the use of option -d, to specify that the column separator is a comma.
By default, cut uses a tab as the column separator. If the separator in your file is anything else, you must use the -d option.
Using awk
awk -vFS=your_delimiter_here -vOFS=your_delimiter_here 'print $1,$2,$6' PROFILES.1.0.profile > compiledfile.txt
should do it.
For comma separated fields the solution would be
awk -vFS=, -vOFS=, '{print $1,$2,$6}' PROFILES.1.0.profile > compiledfile.txt
FS is an awk builtin variable which stands for field-separator.
Similarly OFS stands for output-field-separator.
And the handy -v option with awk helps you assign a value to variable.
You could use awk to this.
awk -F "delimiter" '
{
print $1,$2 ,$3 #Where $1,$2 and so are column numbers
}' filename > newfile

Separate and add numbers from an external file with .sh

Question #1
How can I read a column and add each entry from a file using .sh?
Example file:
10000:max:100:1,2:3,4
10001:jill:50:7,8:3,2
10002:fred:300:5,6:7,8
How to use IFS=':' to read that file with a .sh file line by line and add the third part so that it would output the addition e.g. 450
$ ./myProgram myFile.txt
450
A simple awk one-liner command would do this job.
$ awk -F: '{sum+=$3}END{print sum}' file
450
For each line, awk would add the column 3 value to the variable sum. Printing the variable sum at the end will give you the total count. -F: sets the Field Separator value to colon.
It's simple. Try using awk like:
awk -F':' '{sum+=$3} END {print sum}' myfile.txt
Here -F is delimeter where we say fields are delimeted with colon ":" present in file myfile.txt
We add $3 value to sum. And once that's done, we print the value of sum.

How do I write an awk print command in a loop?

I would like to write a loop creating various output files with the first column of each input file, respectively.
So I wrote
for i in $(\ls -d /home/*paired.isoforms.results)
do
awk -F"\t" {print $1}' $i > $i.transcript_ids.txt
done
As an example if there were 5 files in the home directory named
A_paired.isoforms.results
B_paired.isoforms.results
C_paired.isoforms.results
D_paired.isoforms.results
E_paired.isoforms.results
I would like to print the first column of each of these files into a seperate output file, i.e. I would like to have 5 output files called
A.transcript_ids.txt
B.transcript_ids.txt
C.transcript_ids.txt
D.transcript_ids.txt
E.transcript_ids.txt
or any other name as long as it is 5 different names and I can still link them back to the original files.
I understand, that there is a problem with the double usage of $ in both the awk and the loop command, but I don't know how to change that.
Is it possible to write a command like this in a loop?
This should do the job:
for file in /home/*paired.isoforms.results
do
base=${file##*/}
base=${base%%_*}
awk -F"\t" '{print $1}' $file > $base.transcript_ids.txt
done
I assume that there can be spaces in the first field since you set the delimiter explicitly to tab. This runs awk once per file. There are ways to do it running awk once for all files, but I'm not convinced the benefit is significant. You could consider using cut instead of awk '{print $1}', too. Note that using ls as you did is less satisfactory than using globbing directly; it runs foul of file names with oddball characters (spaces, tabs, etc) in the name.
You can do that entirely in awk:
awk -F"\t" '{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; print $1 > out}' *_paired.isoforms.results
If your input files don't have names as indicated in the question, you'd have to split on something else ( as well as use a different pattern match for the input files ).
My original answer is actually doing extra name resolution every time something is printed. Here's a version that only updates the output filename when FILENAME changes:
awk -F"\t" 'FILENAME!=lf{split(FILENAME,a,"_"); out=a[1]".transcript_ids.txt"; lf=FILENAME} {print $1 > out}' *_paired.isoforms.results

Resources