Related
I have a file whose 11th line should in theory have 1011 columns yet it looks like there are more than 1 tabs between each of its field. More specifically,
If I use
awk '{print NF}' file
then I can see that the 11th line has the same number of fields as all the rest (except for the first ten lines, which have a different format. That's expected).
But if I use
awk 'BEGIN{FS="\t"} {print NF}' file
I can see that the 11th line has 2001 fields. Based on that, I suspect some of its fields are separted by more than one whitespaces.
I'd like to have each field separated by 1 tab only, so I tried
awk 'BEGIN{OFS="\t"} {print}' file > file.modified
However, this doesn't solve the problem as
awk 'BEGIN{FS="\t"} {print NF}' file.modified
still indicates that the 11th line has 2001 fields.
Can anyone point out a way to achieve my goal? Thanks a lot! I have put the first 100 lines of my file in the following google drive link.
https://drive.google.com/file/d/1qOjzjUnJKJpc4VpDxwKPBcqMS7MUuyKy/view?usp=sharing
To squeeze multiple tabs to one tab, you could use tr:
tr -s '\t' <file >file.modified
This might help with GNU awk:
awk 'BEGIN{FS="\t+"; OFS="\t"} {$1=$1; print}' file
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
I have a string like
1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Now, I am trying to achieve below string
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
So, I am trying to replace everything after 8th occurrence of comma (,) from start and before 12th occurrence of comman (,) from end to be in quotes.
I tried some options of awk but unable to achieve it . Anyway to get this done .
Thanks in advance .
try:
awk -v s1="\"" -F, '{$9=s1 $9;$(NF-12)=$(NF-12) s1} 1' OFS=, Input_file
So here I am making a variable which is " and making field separator as comma. Then I am re-creating 9th field as per your requirement with s1 and $9. Then re-creating 13th field from last(point to be noted no hardcoding of field number here so it may have any number of fields) and adding s1's value in last of it's current value. Then mentioning 1 will print the line. Setting OFS(output field separator) as comma too.
x='1,2,A,N,53,3,R,R,^A,-C,-T,2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P'
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' <<< "$x"
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
Explanation: Here FS and OFS are set to comma as the input stream is CSV.double quote is stored in a variable named q. Then the value of the desired columns are altered to get the desired results. You can change the values of columns to get any other results.
For files:
awk -F, -v OFS=, -v q='"' '{$9=q $9;$11=$11 q}1' inputfile
$ awk -v FS=',' -v OFS=',' '{$9="\"" $9;$11=$11"\""; print}' your_file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
This might work for you (GNU sed):
sed 's/,/&"/8;s/,/"&/11' file
Insert " after and before ' eight and eleven.
awk '{sub(/\^A,-C,-T/,"\42^A,-C,-T\42")}1' file
1,2,A,N,53,3,R,R,"^A,-C,-T",2,S,9,l,8,8Z,sl,138Z,l,Y,75680,P
The fine point here is to escape the caret.
I have a file which contains text as follows:
Directory /home/user/ "test_user"
bunch of code
another bunch of code
How can I get from this file only the /home/user/ part?
I've managed to use awk -F '"' 'NR==1{print $1}' file.txt to get rid of rest of the file and I'm gettig output like this:
Directory /home/user/
How can I change this command to get only /home/user/ part? I'd like to make it as simple as possible. Unfortunately, I can't modify this file to add/change the content.
this should work the fastest, noticeable if your file is large
awk '{print $2; exit}' file
it will print the second field of the first line and stop processing the rest of the file.
With awk it should be:
awk 'NR==1{print $2}' file.txt
Setting the field delimiter to " was wrong Since it splits the line into these fields:
$1 = 'Directory /home/user/'
$2 = 'test_user'
$3 = '' (empty)
The default record separator, which is [[:space:]]+, splits like this:
$1 = 'Directory'
$2 = '/home/user/'
$3 = '"test_user"'
As an alternate, you can use head and cut:
$ head -n 1 file | cut -d' ' -f2
Not sure why you are using the -F" as that changes the delimiter. If you remove that, then $2 will get you what you want.
awk 'NR==1{print $2}' file.txt
You can also use awk to execute the print when the line contains /home/user instead of counting records:
awk '/\home\/user\//{print $2}' file.txt
In this case, if the line were buried in the file, or if you had multiple instances, you would get the name for every occurrence wherever it was.
Adding some grep
grep Directory file.txt|awk '{print $2}'
I have a file with the following structure (comma delimited)
116,1,89458180,17,FFFF,0403254F98
I want to add a blank column on the 4th field such that it becomes
116,1,89458180,,17,FFFF,0403254F98
Any inputs as to how to do this using awk or sed if possible ?
thank you
Assuming that none of the fields contain embedded commas, you can restate the task as replacing the third comma with two commas. This is just:
sed 's/,/,,/3'
With the example line from the file:
$ echo "116,1,89458180,17,FFFF,0403254F98" | sed 's/,/,,/3'
116,1,89458180,,17,FFFF,0403254F98
You can use this awk,
awk -F, '$4="," $4' OFS=, yourfile
(OR)
awk -F, '$4=FS$4' OFS=, yourfile
If you want to add 6th and 8th field,
awk -F, '{$4=FS$4; $1=FS$1; $6=FS$6}1' OFS=, yourfile
Through awk
$ echo '116,1,89458180,17,FFFF,0403254F98' | awk -F, -v OFS="," '{print $1,$2,$3,","$4,$5,$6}'
116,1,89458180,,17,FFFF,0403254F98
It prints a , after third field(delimited) by ,
Through GNU sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed -r 's/^([^,]*,[^,]*,[^,]*)(.*)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
It captures all the characters upto the third command and stored it into a group. Characters including the third , upto the last are stored into another group. In the replacement part, we just add an , between these two captured groups.
Through Basic sed,
Through Basic sed
$ echo 116,1,89458180,17,FFFF,0403254F98| sed 's/^\([^,]*,[^,]*,[^,]*\)\(.*\)$/\1,\2/'
116,1,89458180,,17,FFFF,0403254F98
echo 116,1,89458180,17,FFFF,0403254F98|awk -F',' '{print $1","$2","$3",,"$4","$5","$6}'
Non-awk
t="116,1,89458180,17,FFFF,0403254F98"
echo $(echo $t|cut -d, -f1-3),,$(echo $t|cut -d, -f4-)
You can use bellow awk command to achieve that.Replace the $3 with what ever the column that you want to make it blank.
awk -F, '{$3="" FS $3;}1' OFS=, filename
sed -e 's/\([^,]*,\)\{4\}/&,/' YourFile
replace the sequence of 4 [content (non comma) than comma ] by itself followed by a comma
Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. Voilà!
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'