Separate and add numbers from an external file with .sh - bash

Question #1
How can I read a column and add each entry from a file using .sh?
Example file:
10000:max:100:1,2:3,4
10001:jill:50:7,8:3,2
10002:fred:300:5,6:7,8
How to use IFS=':' to read that file with a .sh file line by line and add the third part so that it would output the addition e.g. 450
$ ./myProgram myFile.txt
450

A simple awk one-liner command would do this job.
$ awk -F: '{sum+=$3}END{print sum}' file
450
For each line, awk would add the column 3 value to the variable sum. Printing the variable sum at the end will give you the total count. -F: sets the Field Separator value to colon.

It's simple. Try using awk like:
awk -F':' '{sum+=$3} END {print sum}' myfile.txt
Here -F is delimeter where we say fields are delimeted with colon ":" present in file myfile.txt
We add $3 value to sum. And once that's done, we print the value of sum.

Related

AWK remove blank lines and append empty columns to all csv files in the directory

Hi I am looking for a way to combine all the below commands together.
Remove blank lines in the csv file (comma delimited)
Add multiple empty columns to each line up to 100th column
Perform action 1 & 2 on all the files in the folder
I am still learning and this is the best I could get:
awk '!/^[[:space:]]*$/' x.csv > tmp && mv tmp x.csv
awk -F"," '($100="")1' OFS="," x.csv > tmp && mv tmp x.csv
They work out individually but I don't know how how to put them together and I am looking for ways to have it run through all the files under the directory.
Looking for concrete AWK code or shell script calling AWK.
Thank you!
An example input would be:
a,b,c
x,y,z
Expected output would be:
a,b,c,,,,,,,,,,
x,y,z,,,,,,,,,,
you can combine in one script without any loops
$ awk 'BEGIN{FS=OFS=","} FNR==1{close(f); f=FILENAME".updated"} NF{$100=""; print > f}' files...
it won't overwrite the original files.
You can pipe the output of the first to the other:
awk '!/^[[:space:]]*$/' x.csv | awk -F"," '($100="")1' OFS="," > new_x.csv
If you wanted to run the above on all the files in your directory, you would do:
shopt -s nullglob
for f in yourdirectory/*.csv; do
awk '!/^[[:space:]]*$/' "${f}" | awk -F"," '($100="")1' OFS="," > new_"${f}"
done
The shopt -s nullglob is so that an empty directory won't give you a literal *. Quoted from a good source for about looping through files
With recent enough GNU awk you could:
$ gawk -i inplace 'BEGIN{FS=OFS=","}/\S/{NF=100;$1=$1;print}' *
Explained:
$ gawk -i inplace ' # using GNU awk and in-place file editing
BEGIN {
FS=OFS="," # set delimiters to a comma
}
/\S/ { # gawk specific regex operator that matches any character that is not a space
NF=100 # set the field count to 100 which truncates fields above it
$1=$1 # edit the first field to rebuild the record to actually get the extra commas
print # output records
}' *
Some test data (the first empty record is empty, the second empty record has a space and a tab, trust me bro):
$ cat file
1,2,3
1,2,3,4,5,6,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
Output of cat file after the execution of the GNU awk program:
1,2,3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100

Shell script copying all columns of text file instead of specified ones

I trying to copy 3 columns from one text file and paste them into a new text file. However, whenever I execute this script, all of the columns in the original text file get copied. Here is the code I used:
cut -f 1,2,6 PROFILES.1.0.profile > compiledfile.txt
paste compiledfile.txt > myNewFile
Any suggestions as to what I'm doing wrong? Also, is there a simpler way to do this? Thanks!
Let's suppose that the input is comma-separated:
$ cat File
1,2,3,4,5,6,7
a,b,c,d,e,f,g
We can extract columns 1, 2, and 6 using cut:
$ cut -d, -f 1,2,6 File
1,2,6
a,b,f
Note the use of option -d, to specify that the column separator is a comma.
By default, cut uses a tab as the column separator. If the separator in your file is anything else, you must use the -d option.
Using awk
awk -vFS=your_delimiter_here -vOFS=your_delimiter_here 'print $1,$2,$6' PROFILES.1.0.profile > compiledfile.txt
should do it.
For comma separated fields the solution would be
awk -vFS=, -vOFS=, '{print $1,$2,$6}' PROFILES.1.0.profile > compiledfile.txt
FS is an awk builtin variable which stands for field-separator.
Similarly OFS stands for output-field-separator.
And the handy -v option with awk helps you assign a value to variable.
You could use awk to this.
awk -F "delimiter" '
{
print $1,$2 ,$3 #Where $1,$2 and so are column numbers
}' filename > newfile

How do I pass a stored value as the column number parameter to edit in awk?

I have a .dat file with | separator and I want to change the value of the column which is defined by a number passed as argument and stored in a var. My code is
awk -v var="$value" -F'|' '{ FS = OFS = "|" } $1=="$id" {$"\{$var}"=8}1'
myfile.dat > tmp && mv tmp myfiletemp.dat
This changes the whole line to 8, obviously doesn't work. I was wondering what is the right way to write this part
{$"\{$var}"=8}1
For example, if I want to change the fourth column to 8 and I have value=4, how do I get {$4=8}?
The other answer is mostly correct, but just wanted to add a couple of notes, in case it wasn't totally clear.
Referring to a variable with a $ in front of it turns it in to a reference to the column. So i=3; print $i; print i will print the third column and then the number 3.
Putting all your variables in the command line will avoid any problems with trying to include bash variables inside your single-quoted awk code, which won't work.
You can let awk do the output to the specific file instead of relying on bash to redirect output and move files.
The -F option on the command line specifies FS for you, so no need to redeclare it in your code.
Here's how I would do this:
#!/bin/bash
column=4
value=8
id=1
awk -v col="$column" -v val="$value" -v id="$id" -F"|" '
BEGIN {OFS="|"}
{$1==id && $col=val; print > "myfiletemp.dat"}
' myfile.dat
you can refer to the awk variable directly by it's name, slight rewrite of your script with correct reference to column number var...
awk -F'|' -v var="$value" 'BEGIN{OFS=FS} $1=="$id"{$var=8}1'
should work as long as $value is a number. If id is another bash variable, pass it the same way as an awk variable
awk -F'|' -v var="$value" -v id="$id" 'BEGIN{OFS=FS} $1==id{$var=8}1'
Not only can you use a number in a variable by putting a $ in front of it, you can also use put a $ in front of an expression!
$ date | tee /dev/stderr | awk '{print $(2+2)}'
Mon Aug 3 12:47:39 CDT 2020
12:47:39

egrep -v match lines containing some same text on each line

So I have two files.
Example of file 1 content.
/n01/mysqldata1/mysql-bin.000001
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000004
/n01/mysqldata1/mysql-bin.000005
/n01/mysqldata1/mysql-bin.000006
Example of file 2 content.
/n01/mysqlarch1/mysql-bin.000004
/n01/mysqlarch1/mysql-bin.000001
/n01/mysqlarch2/mysql-bin.000005
So I want to match based only on mysql-bin.00000X and not the rest of the file path in each file as they differ between file1 and file2.
Here's the command I'm trying to run
cat file1 | egrep -v file2
The output I'm hoping for here would be...
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000006
Any help would be much appreciated.
Just compare based on everything from /:
$ awk -F/ 'FNR==NR {a[$NF]; next} !($NF in a)' f2 f1
/n01/mysqldata1/mysql-bin.000002
/n01/mysqldata1/mysql-bin.000003
/n01/mysqldata1/mysql-bin.000006
Explanation
This reads file2 in memory and then compares with file1.
-F/ set the field separator to /.
FNR==NR {a[$NF]; next} while reading the first file (file2), store every last piece into an array a[]. Since we set the field separator to /, this is the mysql-bin.00000X part.
!($NF in a) when reading the second file (file1) check if the last field (mysql-bin.00000X part) is in the array a[]. If it does not, print the line.
I'm having one problem that I've noticed when testing. If file2 is
empty nothing is returned at all where as I would expected every line
in file1 to be returned. Is this something you could help me with
please? – user2841861.
Then the problem is that FNR==NR matches when reading the second file. To prevent this, just cross check that the "reading into a[] array" action is done on the first file:
awk -F/ 'FNR==NR && argv[1]==FILENAME {a[$NF]; next} !($NF in a)' f2 f1
^^^^^^^^^^^^^^^^^^^^
From man awk:
ARGV
The command-line arguments available to awk programs are stored in an
array called ARGV. ARGC is the number of command-line arguments
present. See section Other Command Line Arguments. Unlike most awk
arrays, ARGV is indexed from zero to ARGC - 1

Processing CSV items one by one using awk

Using the following script to access CSV items.
#!/bin/bash
awk -F "|" 'NR > 0 {print $1}' UserAgents.csv
When running the script I am getting the correct output, i.e. the entire set of values in the first 'column' of the CSV are printed to the terminal. What I would like to add is to read these items one by one and perform some operation on them like concatenate it with a string, and then output them (to file, pipe, or terminal) one by one.
This should make it clear what your awk script is doing:
awk -F '|' '{
print NR, NF, $1, "with some trailing text"
}' UserAgents.csv

Resources