Combine multiple text files (row wise) into columns - bash

I have multiple text files that I want to merge columnwise.
For example:
File 1
0.698501 -0.0747351 0.122993 -2.13516
File 2
-5.27203 -3.5916 -0.871368 1.53945
I want the output file to be like:
0.698501, -5.27203
-0.0747351, -3.5916
0.122993, -0.871368
-2.13516, 1.53945
Is there a one line bash common that can accomplish this?
I'll appreciate any help.
---Lyndz

With awk:
awk '{if(NR==1) {split($0,a1," ")} else {split($0,a2," ")}} END{for(i in a2) print a1[i] ", " a2[i]}' file1 file2
Output:
0.698501, -5.27203
-0.0747351, -3.5916
0.122993, -0.871368
-2.13516, 1.53945

paste <(cat file1 | sed -E 's/ +/&,\n/g') <(cat file2 | sed -E 's/ +/&\n/g') | column -s $',' -t | sed -E 's/\s+/, /g' | sed -E 's/, $//g'
It got a bit complicated, but I guess it can be done in a bit simpler way also.
P.S: Please lookup for the man pages of each command to see what they do.

Related

Genebank files manipulation with bash

I have this genebank file. And I need your help in manipulating it
Iam picking a random part of the file
CDS complement(1750..1956)
/gene="MAMA_L4"
/note="similar to MIMI_L9"
/codon_start=1
/product="hypothetical protein"
/protein_id="AEQ60146.1"
/translation="MHFLDDDNDESNNCFDDKEKARDKIIIDMLNLIIGKKKTSYKCL
DYILSEQEYKFAILSIVENSIFLF"
misc_feature complement(2020..2235)
/note="MAMA_L5; similar to replication origin binding
protein (fragment)"
gene complement(2461..2718)
/gene="MAMA_L6"
CDS complement(2461..2718)
/gene="MAMA_L6"
/codon_start=1
/product="T5orf172 domain-containing protein"
/protein_id="AEQ60147.1"
/translation="MSNNLAFYIITTNYHQSQNIYKIGIHTGNPYDLITRYITYFPDV
IITYFQYTDKAKKVESDLKEKLSKCRITNIKGNLSEWIVID"
My target is to "extract" the info of /translation= and /product= like following
T5orf172 domain-containing protein
MSNNLAFYIITTNYHQSQNIYKIGIHTGNPYDLITRYITYFPDVIITYFQYTDKAKKVESDLKEKLSKCRITNIKGNLSEWIVID
*with bold I highlighted the issue that I had.
I am trying to write a bash script so I was thinking to apply something like:
grep -w /product= genebank.file |cut -d= -f2| sed 's/"//'g > File1
grep -w /translation= genebank.file |cut -d= -f2| sed 's/"//'g > File2
paste File1 File2
T the problem is that in the translation entries when I use grep I got only the first line. So it prints until the bold line like
T5orf172 domain-containing protein MSNNLAFYIITTNYHQSQNIYKIGIHTGNPYDLITRYITYFPDV
Can anybody help me to step over this issue? Thank you in advance!
With GNU sed:
sed -En '/^\s*\/(product|translation)="/{
s///
:a
/"$/! { N; s/\n\s*//; ba; }
s/"$//p
}' file |
sed 'N; s/\n/\t/'
Note: This assumes the second occurrence of the delimiter " is immediately followed by a newline in the input file.
I haven't fully tested this but if you add -A1 to your grep command you'll get one line after the match.
grep -w /product= genebank.file |cut -d= -f2| sed 's/"//'g > File1
grep -A1 -w /translation= genebank.file |cut -d= -f2| sed 's/^ *//g' > File2
paste File1 File2
You would need to delete that extra newline but that should get you close.

How to merge in one file, two files in bash line by line [duplicate]

What's the easiest/quickest way to interleave the lines of two (or more) text files? Example:
File 1:
line1.1
line1.2
line1.3
File 2:
line2.1
line2.2
line2.3
Interleaved:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Sure it's easy to write a little Perl script that opens them both and does the task. But I was wondering if it's possible to get away with fewer code, maybe a one-liner using Unix tools?
paste -d '\n' file1 file2
Here's a solution using awk:
awk '{print; if(getline < "file2") print}' file1
produces this output:
line 1 from file1
line 1 from file2
line 2 from file1
line 2 from file2
...etc
Using awk can be useful if you want to add some extra formatting to the output, for example if you want to label each line based on which file it comes from:
awk '{print "1: "$0; if(getline < "file2") print "2: "$0}' file1
produces this output:
1: line 1 from file1
2: line 1 from file2
1: line 2 from file1
2: line 2 from file2
...etc
Note: this code assumes that file1 is of greater than or equal length to file2.
If file1 contains more lines than file2 and you want to output blank lines for file2 after it finishes, add an else clause to the getline test:
awk '{print; if(getline < "file2") print; else print ""}' file1
or
awk '{print "1: "$0; if(getline < "file2") print "2: "$0; else print"2: "}' file1
#Sujoy's answer points in a useful direction. You can add line numbers, sort, and strip the line numbers:
(cat -n file1 ; cat -n file2 ) | sort -n | cut -f2-
Note (of interest to me) this needs a little more work to get the ordering right if instead of static files you use the output of commands that may run slower or faster than one another. In that case you need to add/sort/remove another tag in addition to the line numbers:
(cat -n <(command1...) | sed 's/^/1\t/' ; cat -n <(command2...) | sed 's/^/2\t/' ; cat -n <(command3) | sed 's/^/3\t/' ) \
| sort -n | cut -f2- | sort -n | cut -f2-
With GNU sed:
sed 'R file2' file1
Output:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Here's a GUI way to do it: Paste them into two columns in a spreadsheet, copy all cells out, then use regular expressions to replace tabs with newlines.
cat file1 file2 |sort -t. -k 2.1
Here its specified that the separater is "." and that we are sorting on the first character of the second field.

How to remove symbols and add file name to fasta headers

I have several fasta files with the following headers:
M01498:408:000000000-BLBYD:1:1101:11790:1823 1:N:0:1
I want to remove all symbols (colon, dash, and space), and add "barcodelabel=FILENAME;"
I can do it for one file using:
cat A1.fasta |sed s/-//g | sed s/://g| sed s/\ //g|sed 's/^>/>barcodelabel=A1;/g' >A1.renamed.fasta
How can I do this but for all of my files at once? I tried the code below but it didn't work:
for i in {A..H}{1..6}; do cat ${i}.fasta |sed s/-//g | sed s/://g| sed s/\ //g | sed 's/^>/>barcodelabel=${i};/g' >${i}.named.fasta; done
any help would be appreciated !
Considering that you want to substitute -,: or space with null and want to add string at last of the first line then following may help you on same:
awk 'FNR==1{gsub(/:|-| +/,"");print $0,"barcodelabel=FILENAME";next} 1' Input_file
In case you want to save output in to same Input_file then add following in above code too > temp_file && mv temp_file Input_file
I figured it out. First, I reduced the number of sed to simplify the code. The mistake was in the final sed I had simple quotation marks and it should have been double so it can read the ${i}. final code is:
for i in {A..H}{1..6}; do cat ${i}.fasta |
sed 's/[-: ]//g' |
sed "s/^>/>barcodelabel=${i};/g" > ${i}.final4.fasta; done

Extracting multiple lines of data between two delimiters

I have a log file containing multiple lines of data. I need to extract and the all the lines between the delimiters and save it to the output file
input.log
Some data
<delim_begin>ABC<delim_end>
some data
<delim_begin>DEF<delim_end>
some data
The output.log file should look like
ABC
DEF
I tried this code but it does not work, it prints all the content of input.log
sed 's/<delim_begin>\(.*\)<delim_end>/\1/g' input.log > output.log
Using awk you can do it using custom field separator:
awk -F '<(delim_begin|delim_end)>' 'NF>2{print $2}' file
ABC
DEF
Using grep -P (PCRE):
grep -oP '(?<=<delim_begin>).*(?=<delim_end>)' file
ABC
DEF
sed alternative
$ sed -nr 's/<delim_begin>(.*)<delim_end>/\1/p' file
ABC
DEF
This should do it:
cat file | awk -F '<(delim_begin|delim_end)>' '{print $2}'
You can use this command -
cat file | grep "<delim_begin>.*<delim_end>" | sed 's/<delim_begin>//g' | sed 's/<delim_end>//' > output.log

How to get word from text file BASH

I want to get only one word from this txt file: http://pastebin.com/jFDu0Le5 . The word is from last row: WER: 45.67% Correct: 65.87% Acc: 54.33%
I want to get only the value: 45.67 to save it to the file value.txt..I want to create BASH script to get this value. Can you give me an example how to do it??? I am new in Bash and I need it for school. The whole .txt file is saved on my server as text file file.txt.
Try this:
grep WER file.txt | awk '{print $2}' | uniq | sed -e 's/%//' > value.txt
Note that this will overwrite value.txt each time you run the command.
You want grep "WER:" value.txt | cut -???
I have ??? because I do not know the structure of the file. Tab delimited? Fixed Width?
Do man cut an you can get the arguments you need.
There a many ways and instruments to do the task:
sed
tac file.txt | sed -n '/^WER: /{s///;s/%.*//;p;q}' > value.txt
awk
tac file.txt | awk -F'[ %]' '/^WER:/{print $2;exit}' > value.txt
bash
while read a b c
do
if [ $a = "WER:" ]
then
b=${b%\%*}
echo ${b#* }
break
fi
done < <(tac file.txt) > value.txt
If the format is as you said, then this also works
awk -F'[: %]' '/^WER/{print $3}' file.txt > value.txt
Explanation
-F specifies the field separator as one of [: %]
/<PATTERN>/ {<ACTION>} refers to: if a line matches some PATTERN, then do some ACTION
in my case,
the PATTERN is: starts with ^ the string WER
the ACTION is: print field $3 (as split by the -F field separators)
> sends the output to value.txt

Resources