How does value matching in awk work? - macos

I have got a csv file, it has various columns, one of these columns is a Date column, so for a given date, I want to fetch all of the records from the csv, the reason that I cant use grep is that that date might be there in some other column and I dont want that.
So far this is what I have got,
Date is this format:
sed 's/\"//g' kk.csv | awk -F ',' '{print $4}' '$4 ~ /^2012.*/'
First I am removing all of the "" quotes, then I have specified the separator ',' then I am applying the condition on the 4th column of the files which is the date column, but It is not working, I am doing exactly what the book says.
Can anyone point out what I am doing wrong, is it a quote related issue?

awk only accepts one script, while you're giving it two ({printf $4} and $4 ~ /^2012.*/). Join them into one awk script using awk's condition { commands } syntax:
sed 's/\"//g' kk.csv | awk -F ',' '$4 ~ /^2012.*/ {print $4}'

no need to use sed. Just awk is enough to search
awk -F, '$4 ~ "^\"2012"' kk.csv
This will print all line which have 4th column starting with "2012
If you want formatted output(removing quotes etc) awk do that also

Related

awk: remove multiple tabs between each fields and output a line where each field is separated by a single tab

I have a file whose 11th line should in theory have 1011 columns yet it looks like there are more than 1 tabs between each of its field. More specifically,
If I use
awk '{print NF}' file
then I can see that the 11th line has the same number of fields as all the rest (except for the first ten lines, which have a different format. That's expected).
But if I use
awk 'BEGIN{FS="\t"} {print NF}' file
I can see that the 11th line has 2001 fields. Based on that, I suspect some of its fields are separted by more than one whitespaces.
I'd like to have each field separated by 1 tab only, so I tried
awk 'BEGIN{OFS="\t"} {print}' file > file.modified
However, this doesn't solve the problem as
awk 'BEGIN{FS="\t"} {print NF}' file.modified
still indicates that the 11th line has 2001 fields.
Can anyone point out a way to achieve my goal? Thanks a lot! I have put the first 100 lines of my file in the following google drive link.
https://drive.google.com/file/d/1qOjzjUnJKJpc4VpDxwKPBcqMS7MUuyKy/view?usp=sharing
To squeeze multiple tabs to one tab, you could use tr:
tr -s '\t' <file >file.modified
This might help with GNU awk:
awk 'BEGIN{FS="\t+"; OFS="\t"} {$1=$1; print}' file
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

Add quotes to strings between commas shell

I have a string like
1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Now, I am trying to achieve below string
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
So, I am trying to replace everything after 8th occurrence of comma (,) from start and before 12th occurrence of comman (,) from end to be in quotes.
I tried some options of awk but unable to achieve it . Anyway to get this done .
Thanks in advance .
try:
awk -v s1="\"" -F, '{$9=s1 $9;$(NF-12)=$(NF-12) s1} 1' OFS=, Input_file
So here I am making a variable which is " and making field separator as comma. Then I am re-creating 9th field as per your requirement with s1 and $9. Then re-creating 13th field from last(point to be noted no hardcoding of field number here so it may have any number of fields) and adding s1's value in last of it's current value. Then mentioning 1 will print the line. Setting OFS(output field separator) as comma too.
x='1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P'
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' <<< "$x"
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Explanation: Here FS and OFS are set to comma as the input stream is CSV.double quote is stored in a variable named q. Then the value of the desired columns are altered to get the desired results. You can change the values of columns to get any other results.
For files:
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' inputfile
$ awk -v FS=',' -v OFS=',' '{$9="\"" $9;$11=$11"\""; print}' your_file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
This might work for you (GNU sed):
sed 's/,/&"/8;s/,/"&/11' file
Insert " after and before ' eight and eleven.
awk '{sub(/\^A,-C,-T/,"\42^A,-C,-T\42")}1' file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
The fine point here is to escape the caret.

Get only part of file using sed or awk

I have a file which contains text as follows:
Directory /home/user/ "test_user"
bunch of code
another bunch of code
How can I get from this file only the /home/user/ part?
I've managed to use awk -F '"' 'NR==1{print $1}' file.txt to get rid of rest of the file and I'm gettig output like this:
Directory /home/user/
How can I change this command to get only /home/user/ part? I'd like to make it as simple as possible. Unfortunately, I can't modify this file to add/change the content.
this should work the fastest, noticeable if your file is large
awk '{print $2; exit}' file
it will print the second field of the first line and stop processing the rest of the file.
With awk it should be:
awk 'NR==1{print $2}' file.txt
Setting the field delimiter to " was wrong Since it splits the line into these fields:
$1 = 'Directory /home/user/'
$2 = 'test_user'
$3 = '' (empty)
The default record separator, which is [[:space:]]+, splits like this:
$1 = 'Directory'
$2 = '/home/user/'
$3 = '"test_user"'
As an alternate, you can use head and cut:
$ head -n 1 file | cut -d' ' -f2
Not sure why you are using the -F" as that changes the delimiter. If you remove that, then $2 will get you what you want.
awk 'NR==1{print $2}' file.txt
You can also use awk to execute the print when the line contains /home/user instead of counting records:
awk '/\home\/user\//{print $2}' file.txt
In this case, if the line were buried in the file, or if you had multiple instances, you would get the name for every occurrence wherever it was.
Adding some grep
grep Directory file.txt|awk '{print $2}'

Add blank column using awk or sed

I have a file with the following structure (comma delimited)
116,1,89458180,17,FFFF,0403254F98
I want to add a blank column on the 4th field such that it becomes
116,1,89458180,,17,FFFF,0403254F98
Any inputs as to how to do this using awk or sed if possible ?
thank you
Assuming that none of the fields contain embedded commas, you can restate the task as replacing the third comma with two commas. This is just:
sed 's/,/,,/3'
With the example line from the file:
$ echo "116,1,89458180,17,FFFF,0403254F98" | sed 's/,/,,/3'
116,1,89458180,,17,FFFF,0403254F98
You can use this awk,
awk -F, '$4="," $4' OFS=, yourfile
(OR)
awk -F, '$4=FS$4' OFS=, yourfile
If you want to add 6th and 8th field,
awk -F, '{$4=FS$4; $1=FS$1; $6=FS$6}1' OFS=, yourfile
Through awk
$ echo '116,1,89458180,17,FFFF,0403254F98' | awk -F, -v OFS="," '{print $1,$2,$3,","$4,$5,$6}'
116,1,89458180,,17,FFFF,0403254F98
It prints a , after third field(delimited) by ,
Through GNU sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed -r 's/^([^,]*,[^,]*,[^,]*)(.*)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
It captures all the characters upto the third command and stored it into a group. Characters including the third , upto the last are stored into another group. In the replacement part, we just add an , between these two captured groups.
Through Basic sed,
Through Basic sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed 's/^\([^,]*,[^,]*,[^,]*\)\(.*\)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
echo 116,1,89458180,17,FFFF,0403254F98|awk -F',' '{print $1","$2","$3",,"$4","$5","$6}'
Non-awk
t="116,1,89458180,17,FFFF,0403254F98"
echo $(echo $t|cut -d, -f1-3),,$(echo $t|cut -d, -f4-)
You can use bellow awk command to achieve that.Replace the $3 with what ever the column that you want to make it blank.
awk -F, '{$3="" FS $3;}1' OFS=, filename
sed -e 's/\([^,]*,\)\{4\}/&,/' YourFile
replace the sequence of 4 [content (non comma) than comma ] by itself followed by a comma

Unix cut: Print same Field twice

Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. Voilà!
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'

Resources