Bash conversion file - bash

I have a .csv file and i wanto to convert it in a .txt file through a bash script
The file appears like .csv file
I want to obtain a txt file with this format
velocity
List<vector>
300 // number of point
(
(U0 U1 U2)
(U0 U1 U2)
...
...
...
)
Many thanks for your help

This is a little ugly, but it does the trick I think:
wc -l < Inlet.csv | cat - Inlet.csv | awk -F, 'BEGIN{printf "velocity\nList<vector>\n"} NR==1{printf "%s\n(\n",$1} NR>2{print "("$1 $2 $3")"}END{print ")"}'
This does the following:
Gets the record count from wc and pipes it to
cat which concats the number with the file. Number first and pipes it to
Awk which splits by comma -F,
Prints out the junk at the top of your file 'BEGIN{printf "velocity\nList<vector>\n"}
If it's on the first record (the count) NR==1 it prints it followed by a line feed followed by a parantheses followed by another line feed {printf "%s\n(\n",$1}
If we are past the header (Record is greater than 2) NR>2 then we print out the first three fields surrounded by parantheses {print "("$1 $2 $3")"}
Finally at the end of the processing we print out a final parantheses to close that one we printed out when we were at record 1 in step 5 END{print ")"}

Related

Searching for a string between two characters

I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help
With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899
Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899

Padding columns of csv

I have a csv file which contains a large number of csv seperated lines of data. i want to find the maximum length of the line then need to print NO in a new column
file.csv
1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11
abc,def,ghi,jkl
expected result
1,2,3,4,,,,,,,,,,,,,,,,N0
1,4,7,8,9,10,11,13,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,1,NO
abc,def,ghi,jkl,,,,,,,,NO
cat file | cat > file.csv
echo "N0" >> file.csv
output obtained
1,2,3,4,NO
1,4,7,8,9,10,11,13,NO
1,2,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,NO
You need to read the file twice, once to get the maximum number of columns, once to print the output:
awk -F, 'NR==FNR{if(m<=NF)m=NF;next} # Runs only on first iteration
{printf "%s",$0;for(i=0;i<=(m-NF);i++)printf ",";print "NO"}' file file
filename twice -----^
Output (12 columns in each row):
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,NO
It's hard to imagine why you'd want to pad the lines with commas so here's what I think you really want which is to make every line have the same number of fields:
$ awk 'BEGIN{FS=OFS=","} NR==FNR{m=(m>NF?m:NF);next} {$(m+1)="NO"} 1' file file
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,NO
and here's what you said you want anyway:
$ awk '{n=length()} NR==FNR{m=(m>n?m:n);next} {p=sprintf("%*s",m-n+1,""); gsub(/ /,",",p); $0=$0 p "NO"} 1' file file
1,2,3,4,,,,,,,,,,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,,NO
awk -F, 'BEGIN{m=0}
{if(NF>m)m=NF;ar[NR]=$0;ars[NR]=NF;}
END{for(i=1;i<=NR;i++)
{for(j=ars[i];j<m;j++){ar[i]=ar[i]","}ar[i]=ar[i]"NO";
print ar[i]}}' <<<'1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11,12
abc,def,ghi,jkl
a,b'
output:
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,12NO
abc,def,ghi,jkl,,,,,,,,NO
a,b,,,,,,,,,,NO
if lines must be have same size:
awk -F, 'BEGIN{m=0}
{if(length($0)>m)m=length($0);ar[NR]=$0;ars[NR]=length($0);}
END{for(i=1;i<=NR;i++)
{for(j=ars[i];j<m;j++)
{ar[i]=ar[i]","}ar[i]=ar[i]"NO";
print ar[i]}}' <<<'1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11,12
abc,def,ghi,jkl
a,b'
output:
1,2,3,4,,,,,,,,,,,,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,12NO
abc,def,ghi,jkl,,,,,,,,,,,NO
a,b,,,,,,,,,,,,,,,,,,,,,,,NO
if you want also comma after max length line run for loop until m+1;

filtering a complex text file in bash

I have a text file like this:
#M00872:408:000000000-D31AB:1:1102:15653:1337 1:N:0:ATCACG
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAAGCA
+
BCCBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHHHHH
#M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH
every 4 lines are belong one group and the first line of each group starts with #.
the 2nd line of each group is important for me so I would like to filter out the groups based on 2nd line. in fact if this specific sequence "GATCAGACGTGGCGAC" is present in the 2nd line, I want to remove the whole group and make a new file containing other groups.
so the result for this example is:
#M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH
I tried the following command but it returns only the 2nd line and only the ones which contain this piece of sequence. but I want the whole group and if the 2nd line does not contain this sequnce.
grep -i GATCAGACGTGGCGAC myfile.txt > output.txt
do you know how to fix it?
Single awk solution:
awk -v kw='GATCAGACGTGGCGAC' '/^#/{if (txt !~ kw) printf "%s", txt; n=4; txt=""} n-->0{
txt=txt $0 RS} END{if (txt !~ kw) printf "%s", txt}' file
#M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH
Alternative grep + gnu awk solution:
grep -A 3 '^#' file | awk -v RS='--\n' -v ORS= '!/GATCAGACGTGGCGAC/'
#M00872:408:000000000-D31AB:1:1102:15388:1343 1:N:0:ATCACG
CGCGACCTCATGAATTTAAGGGCGACCCGCTGAATTTAAGCA
+
CBBBGGGGGGGGGGHHHHGGGGGGGGGGGGGGGHHHHHGHHH

How do you grep a file and get the next 2 lines with a tab?

I could grep the lines containing the pattern "genes" and the next 2 consecutive lines using the following shell command from a text file.
grep -A 2 ' gene' file
output:
gene 1..1515
/locus_tag="MSMEI_RS00005"
/old_locus_tag="MSMEI_0001"
gene 2109..3302
/locus_tag="MSMEI_RS00010"
/old_locus_tag="MSMEI_0003"
now my aim is to print the consecutive lines with a tab like
gene 1..1515 /locus_tag="MSMEI_RS00005" /old_locus_tag="MSMEI_0003"
gene 2109..3302 /locus_tag="MSMEI_RS00010" /old_locus_tag="MSMEI_0003"
how do I do this with the same grep command in shell?
With sed:
sed -nEe '/^ gene/{N;N;s/\n */\t/g;p}' file
/ gene/ works on lines matching that pattern, then we read two Next lines, substitute newlines and following spaces with tabs, then print.
You don't need grep for it, Awk can do this manipulation using its getline() function to get the next two subsequent lines,
awk 'BEGIN{OFS="\t"}/gene/{getline n1; getline n2; print $0, n1,n2}' file
(or) if you are bothered with the leading spaces, strip them using gsub() function,
awk 'BEGIN{OFS="\t"}/gene/{gsub(/^[[:space:]]+/,"",$0); getline n1; getline n2; print $0, n1,n2}' file

Need an awk script or any other way to do this on unix

i have small file with around 50 lines and 2 fields like below
file1
-----
12345 8373
65236 7376
82738 2872
..
..
..
i have some around 100 files which are comma"," separated as below:
file2
-----
1,3,4,4,12345,,,23,3,,,2,8373,1,1
each file has many lines similar to the above line.
i want to extract from all these 100 files whose
5th field is eqaul to 1st field in the first file and
13th field is equal to 2nd field in the first file
I want to search all the 100 files using that single file?
i came up with the below in case of a single comma separated file.i am not even sure whether this is correct!
but i have multiple comma separated files.
awk -F"\t|," 'FNR==NR{a[$1$2]++;next}($5$13 in a)' file1 file2
can anyone help me pls?
EDIT:
the above command is working fine in case of a single file.
Here is another using an array, avoiding multiple work files:
#!/bin/awk -f
FILENAME == "file1" {
keys[$1] = ""
keys[$2] = ""
next
}
{
split($0, fields, "," )
if (fields[5] in keys && fields[13] in keys) print "*:",$0
}
I am using split because the field seperator in the two files are different. You could swap it around if necessary. You should call the script thus:
runit.awk file1 file2
An alternative is to open the first file explicitly (using "open") and reading it (readline) in a BEGIN block.
Here is a simple approach. Extract each line from the small file, split it into fields and then use awk to print lines from the other files which match those fields:
while read line
do
f1=$(echo $line | awk '{print $1}')
f2=$(echo $line | awk '{print $2}')
awk -v f1="$f1" -v f2="$f2" -F, '$5==f1 && $13==f2' file*
done < small_file

Resources