Remove comma using awk command with multiple record - shell

Let says i've record like this.
Input
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
I want to be like this
Output
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
How to remove the comma only on column number 4th ?, i don't want to remove comma on last column

You can use:
awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}'
The int() is what you are looking for I guess.
Example:
$ cat test
1,1,1,1.213,1,1,1.23
2,2,2,2.345,2,2,2.33
3,3,3,3.456,3,3,3.44
$ awk -F"," '{print $1,$2,$3,int($4),$5,$6,$7}' test
1 1 1 1 1 1 1.23
2 2 2 2 2 2 2.33
3 3 3 3 3 3 3.44
Edit (Good suggestion from ccf):
You could use this instead of the long version of awk command above.
$ awk -F',' '{$4=int($4); print}'
1,1,1,1.213,1,1,1.23
1 1 1 1 1 1 1.23

If temp.txt has the input, then
$ cat temp.txt | sed 's/\.[0-9]\+//1'
1,1,1,1,1,1,1.23
2,2,2,2,2,2,2.33
3,3,3,3,3,3,3.44
1 at the end means, only replace the first match.

Related

Bash - Compare rows then print just original rows

I've got files which look like this, (there can be more columns or rows):
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
dif-2-3-4-5.com 1 1 1
And I want to compare these numbers:
1 1 1
1 1 2
1 2 1
2 1 1
1 1 1
And print only those rows which do not repeat, so I get this:
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Another simple approach is sort with uniq using a KEYDEF for fields 2-4 with sort and skipping field 1 with uniq, e.g.
$ sort file.txt -k 2,4 | uniq -f1
Example Use/Output
$ sort file.txt -k 2,4 | uniq -f1
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Keep a running record of the triples already seen and only print the first time they appear:
$ awk '!(($2,$3,$4) in seen) {print; seen[$2,$3,$4]}' file
dif-1-2-3-4.com 1 1 1
dif-1-2-3-5.com 1 1 2
dif-1-2-4-5.com 1 2 1
dif-1-3-4-5.com 2 1 1
Try, the following awk code too:
awk '!a[$2,$3,$4]++' Input_file
Explanation:
Create an array named a and its indexes as $2,$3,$4. The condition here is !a, (which means any line's $2,$3,$4 are NOT present in array a), and then doing 2 things:
Increasing that specific index's value to 1 so that next time that condition will NOT be true for same $2,$3,$4 indexes in array a.
Not specifying an action, (so awk works in the mode of condition and then action), so the default action will be to print the current line. This will go on for all the lines in Input_file, and the last line will not be printed as its $2,$3,$4 are already present in array a.
I hope this helps.
This works with POSIX and gnu awk:
$ awk '{s=""
for (i=2;i<=NF; i++)
s=s $i "|"}
s in seen { next }
++seen[s]' file
Which can be shortened to:
$ awk '{s=""; for (i=2;i<=NF; i++) s=s $i "|"} !seen[s]++' file
Also supports a variable number of columns.
If you want a sort uniq solution that also respects file order (i.e. the first of the set of duplicates is printed, not the later ones) you need to do a decorate, sort, undecorate approach.
You can:
use cat -n to decorate the file with line numbers;
sort -k3 -k1n to sort first on all the fields starting at the 3 though the end of the line then numerically on the line number added;
add -u if your version of sort supports that or use uniq -f3 to only keep the first in the group of dups;
finally use sed -e 's/^[[:space:]]*[0-9]*[[:space:]]*// to remove the added line numbers:
cat -n file | sort -k3 -k1n | uniq -f3 | sed -e 's/^[[:space:]]*[0-9]*[[:space:]]*//'
Awk is easier and faster in this case.

Variable in commands in bash

I wrote program that should write words from example.txt from the longest to the shortest. I don't know how exactly '^.{$v}$' should look like to make it work?
#!/bin/bash
v=30
while [ $v -gt 0 ] ; do
grep -P '^.{$v}$' example.txt
v=$(($v - 1))
done
I tried:
${v}
$v
"$v"
It is my first question, sorry for any mistake :)
What you're doing is not how you'd approach this problem in shell. Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the issues and then this is how you'd really do what you're trying to do in a shell script:
$ cat file
now
is
the
winter
of
our
discontent
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n | cut -f3-
discontent
winter
now
the
our
is
of
To understand what that's doing, look at the awk output:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file
3 1 now
2 2 is
3 3 the
6 4 winter
2 5 of
3 6 our
10 7 discontent
The first number is the length of each line and the second number is the order the lines appeared in the input file so when we come to sort it:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n
10 7 discontent
6 4 winter
3 1 now
3 3 the
3 6 our
2 2 is
2 5 of
we can sort by length (longest first) with -k1rn but retain the order from the input file for lines that are the same length by adding -k2n. Then the cut just removes the 2 leading numbers that awk added for sort to use.
use :
grep -P "^.{$v}$" example.txt

awk space delimiter with empty content

I have a text file which is delimited by space
1 dsfsdf 2
2 3
4 sdfsdf 4
5 sdfsdf 5
When I run
awk -F' ' '{s+=$3} END {print s}' test
It returns 11. It should return 14. I believe awk gets confused about the second line, between two spaces nothing there. How should I modify my command?
Thanks
try
awk -F' {1}' '{s+=$3} END {print s}' test
you get
14
Note
if test file contains
1 dsfsdf 2 1
2 3 1
4 sdfsdf 4 1
5 sdfsdf 5 1
also it works, i use gnu-awk
edit
how, #Ed_Morton and #"(9 )*" says is better to use literal space [ ]
awk -F'[ ]' '{s+=$3} END {print s}' test
this should work too if only the second column has missing values.
awk '{s+=$(NF-1)} END{print s}'

AWK doesn't recognise more than one field when changing value of an element

I have a csv file, a simplified version of which is:
#data
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
1,2,3,4,normal
When I do:
awk -F',' '{print NF}' myfile.csv
I get:
1
5
5
5
5
5
5
5
5
5
5
I am trying to change the 5th element of the 10th line in this dataset but I noticed a strange behavior upon doing so. More specificaly, when I give:
awk -F',' 'NR==10{$5="abnormal"}1' myfile.csv | awk -F',' '{print NF}'
I get:
1
5
5
5
5
5
5
5
5
1
5
Does anyone has an explanation or any thought on this?
Thanks to #EdMorton for the valuable comment that assigning a value to any field causes the record to be recompiled using the OFS value which by default is a space.
Updating as per updated question: #drDoom, you are missing `OFS=','.
See the difference in the two outputs below with your sample data:
/home/amit/$ awk -F',' 'NR==10{$5="abnormal"}1' OFS=',' myfile.csv | awk -F',' '{print NF}'
1
5
5
5
5
5
5
5
5
5
5
/home/amit/$ awk -F',' 'NR==10{$5="abnormal"}1' myfile.csv | awk -F',' '{print NF}'
1
5
5
5
5
5
5
5
5
1
5
For changing the 150th field on a 100th line, you can do as below
awk -F',' 'NR==100{ $150 = "NewValue"}1' OFS=',' myfile.csv
Any or all of these are the issue:
a) Your csv file was created on Windows and so has extraneous control-M characters in it.
b) Your separator is not a comma [on every line].
c) You are miscounting which line is the 100th one.
Do this and update your question with the output:
dos2unix file
awk -F',' -v OFS=':' 'NR>98 && NR<102{print NR, NF, $1, $0}' file
Note that I said update your question with the output - do NOT post the output as a comment, as we will not be able to see the format.

Computing differences between columns of tab delimited file

I have a tab delimited file of 4 columns and n number of rows.
I want to find the difference in values present in column 3 and 2 and want to store them in another file.
This is what I am doing
cat filename | awk '{print $3 - $2}'>difference
and it is not working. How can I improve the code?
Solution:
I was missing the closing single quotation, and my eyes were so tuned to the screen that I couldn't figure it out in 35 lines code what was going wrong...and out of frustration I wrote the question on forum ... and [to complete] the comedy of errors, the syntax I wrote here [in the] question is correct (as it contains both single quotes).
Thank you all for your help.
Set the field separator if you have other whitespace in the lines.
BEGIN {
FS="\t"
}
Try using -F to force the delimiter as tab and enclose your
cat filename | awk -F"\t" '{print $3 - $2}' > difference
Does anyone test before they give their answers/ awk breaks on white space and not just spaces.
I just did this:
awk '{print $3 - $2}' temp.txt
And it works perfectly.
Here's my file:
1 2 7 4
11 12 13 14
1 12 3 4
1 2 3 4
1 2 3 4
And here's my results:
$ awk '{print $3 - $2}' temp.txt
5
1
-9
1
1
$
In fact, I used your command, and got the same results?
Can you explain what's not working for you? What data are you using, and what results are you getting?
Try this:
cat filename | awk -F '^T' '{print $3 - $4}' > difference
where ^T is tab delimiter (get it by pressing Ctrl+V+T)

Resources