select a column from a text file using windows batch - windows

I have a text file which has below data
162 y 1 0 518 home47 1
163 y 1 0 520 home41 1
164 y 1 0 522 home43 1
165 y 1 0 524 home45 1
166 y 1 0 526 home46 1
169 y 1 0 531 home50 1
170 y 1 0 533 home52 1
171 y 1 0 535 home54 1
172 y 1 0 537 home56 1
173 y 1 0 539 home58 1
I would like to copy 6th column data from below (home47 to home58) into another text file using windows batch file. How can I perform that
I have tried with below command which is mentioned in another questions, but not working for me
CMD /f:off
FOR /f "tokens=6 delims= " %B in (TabFile.txt) do #echo %B >> 2ColFile.txt
CMD /f:on

#echo off
break>2ColFile.txt
for /f "tokens=6 delims= " %%c in (TabFile.txt) do (
echo %%c
)>>2ColFile.txt
EDIT Have on mind that delimiters are delims=<tab><space> and the could be changed by stackoverflow formater.

Related

How to merge files depending on a string in a specific column

I have two files that I need to merge together based on what string they contain in a specific column.
File 1 looks like this:
1 1655 1552 189
1 1433 1552 185
1 1623 1553 175
1 691 1554 182
1 1770 1554 184
1 1923 1554 182
1 1336 1554 181
1 660 1592 179
1 743 1597 179
File 2 looks like this:
1 1552 0 0 2 -9 G A A A
1 1553 0 0 2 -9 A A G A
1 1554 0 751 2 -9 A A A A
1 1592 0 577 1 -9 G A A A
1 1597 0 749 2 -9 A A G A
1 1598 0 420 1 -9 A A A A
1 1600 0 0 1 -9 A A G G
1 1604 0 1583 1 -9 A A A A
1 1605 0 1080 2 -9 G A A A
I am wanting to match column 3 from file 1 to column 2 on file 2, with my output looking like:
1 1655 1552 189 0 0 2 -9 G A A A
1 1433 1552 185 0 0 2 -9 G A A A
1 1623 1553 175 0 0 2 -9 A A G A
1 691 1554 182 0 751 2 -9 A A A A
1 1770 1554 184 0 751 2 -9 A A A A
1 1923 1554 182 0 751 2 -9 A A A A
1 1336 1554 181 0 751 2 -9 A A A A
1 660 1592 179 0 577 1 -9 G A A A
1 743 1597 179 0 749 2 -9 A A G A
I am not interested in keeping any lines in file 2 that are not in file 1. Thanks in advance!
Thanks to #Abelisto I managed to figure something out 4 hours later!
sort -k 3,3 File1.txt > Pheno1.txt
awk '($2 >0)' File2.ped > Ped1.ped
sort -k 2,2 Ped1.ped > Ped2.ped
join -1 3 -2 2 Pheno1.txt Ped2.ped > Ped3.txt
cut -d ' ' -f 1,4,5 --complement Ped3.txt > Output.ped
My real File2 actually contained negative values in the 2nd column (thankfully my real File1 didn't have any negatives) hence the use of awk to remove those rows
Using awk:
awk 'NR == FNR { arr[$2]=$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10 } NR != FNR { print $1" "$2" "$3" "$4" "arr[$3] }' file2 file1
Process file2 first (NR==FNR) Set up an array called arr with the 3rd space delimited field as the index and the 3rd to 10th fields as values separated with a space. Then when processing the first file (NR!=FNR) print the 1st to the 4th space delimited fields followed by the contents of arr, index field 3.
Since $1 seems like constant 1 and I have no idea about rowcounts of either file (800,000 columns in file2 sounded a lot) I'm hashing file1 instead:
$ awk '
NR==FNR {
a[$3]=a[$3] (a[$3]==""?"":ORS) $2 OFS $3 OFS $4
next
}
($2 in a) {
n=split(a[$2],t,ORS)
for(i=1;i<=n;i++) {
$2=t[i]
print
}
}' file1 file2
Output:
1 1655 1552 189 0 0 2 -9 G A A A
1 1433 1552 185 0 0 2 -9 G A A A
1 1623 1553 175 0 0 2 -9 A A G A
1 691 1554 182 0 751 2 -9 A A A A
1 1770 1554 184 0 751 2 -9 A A A A
1 1923 1554 182 0 751 2 -9 A A A A
1 1336 1554 181 0 751 2 -9 A A A A
1 660 1592 179 0 577 1 -9 G A A A
1 743 1597 179 0 749 2 -9 A A G A
When posting a question, please add details such as row and column counts to it. Better requirements yield better answers.

how to add text to next line in tab separated file from other file?

I have a set of files contain tab separated values, at the last but third line, I have my desired values. I have extracted that value with
cat result1.tsv | tail -3 | head -1 > final1.tsv
cat resilt2.tsv | tail -3 | head -1 >final2.tsv
..... so on (I have almost 30-40 files)
I want the content of final tsv files in next line in a new single file.
I tried
cat final1.tsv final2.tsv > final.tsv
but this works for the limited amount of files difficult to write the name of all files.
I tried to put the file names in a loop as variables but not worked.
final1.tsv contains:
270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final2.tsv contains:
1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
all the files (final1.tsv,final2.tsv,final3.tsv,final5..... contains same number of columns but different values)
I want the rows of each file merged in new file like
final.tsv
final1 270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final2 1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
final3 270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final4 1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
here you go...
for f in final{1..4}.tsv;
do
echo -en $f'\t' >> final.tsv;
cat $f >> final.tsv;
done
Try this:
rm final.tsv
for FILE in result*.tsv
do
tail -3 $FILE | head -1 >> final.tsv
done
As long as the files aren't enormous, it's simplest to read each file into an array and select the third record from the end
This solves your problem for you. It looks for all files in the current directory that match result*.tsv and writes the required line from each of them to final.tsv
use strict;
use warnings 'all';
my #results = sort {
my ($aa, $bb) = map /(\d+)/, ($a, $b);
$aa <=> $bb;
} glob 'result*.tsv';
open my $out_fh, '>', 'final.tsv';
for my $result_file ( #results ) {
open my $fh, '<', $result_file or die qq({Unable to open "$result_file" for input: $!};
my #data = <$fh>;
next unless #data >= 3;
my ($name) = $result_file =~ /([^.]+)/;
print { $out_fh } "$name\t$data[-3]";
}

How to insert pipes into (edited) FASTA sequence headers to restore original format?

I have a file that contains (edited) sequence headers in the first column, e.g.:
gi399604265gbAKZP01155332.1 10 255 L1-1_STu 1 -
gi399594056gbCM001217.1 19 203 L1-4_VC 1 -
gi399591950gbKE558403.1 1 185 L1-4_VC 1 +
gi399591329gbAKZP01168266.1 4 285 L1-1_STu 1 +
gi399589894gbAKZP01169701.1 28 502 L1-3_NV 1 +
I'd like to convert these seq headers back to their original format, which
includes pipes, e.g. desired output:
gi|399604265|gb|AKZP01155332.1| 10 255 L1-1_STu 1 -
gi|399594056|gb|CM001217.1| 19 203 L1-4_VC 1 -
gi|399591950|gb|KE558403.1| 1 185 L1-4_VC 1 +
gi|399591329|gb|AKZP01168266.1| 4 285 L1-1_STu 1 +
gi|399589894|gb|AKZP01169701.1| 28 502 L1-3_NV 1 +
The pipes always occur after the first "gi", before "gb", after "gb", and at the end. Is there an easy way to automate this for all my files? (some of which are very large!)
Many thanks for any suggestions :)
A simple one liner will do:
perl -pe 's/(gi)(\d+)(gb)(\S+)/$1|$2|$3|$4|/' file.txt
Output
gi|399604265|gb|AKZP01155332.1| 10 255 L1-1_STu 1 -
gi|399594056|gb|CM001217.1| 19 203 L1-4_VC 1 -
gi|399591950|gb|KE558403.1| 1 185 L1-4_VC 1 +
gi|399591329|gb|AKZP01168266.1| 4 285 L1-1_STu 1 +
gi|399589894|gb|AKZP01169701.1| 28 502 L1-3_NV 1 +

Maths in a while loop causing random negative numbers

So I have done this in both python and bash, and the code I am about to post probably has a world of things wrong with it but it is generally very basic and I cannot see a reason that it would cause this 'bug' which I will explain soon.. I have done the same in Python, but much more professionally and cleanly and it also causes this error (at some point, the maths generates a negative number, which makes no sense.)
#!/bin/bash
while [ 1 ];
do
zero=0
ARRAY=()
ARRAY2=()
first=`command to generate a list of numbers`
sleep 1
second=`command to generate a list of numbers`
# so now we have two data sets, 1 second between the capture of each.
for i in $first;
do
ARRAY+=($i)
done
for i in $second;
do
ARRAY2+=($i)
done
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
ARRAY=()
ARRAY2=()
zero=0
c=0
first=``
second=``
math=''
done
So the script grabs a set of data, waits 1 second, grabs it again, does math on the two sets to get the difference, that difference is printed. It's very simple, and I have done it elegantly in Python too - no matter how I would do it every now and then, could be anywhere from 3 loops in to 30 loops in, we will get negative numbers.. like so:
START 0 0 0 0 0 19 10 563 0
-34 19 14 2 0
-1302 1198
-532 639
-1078 1119 1 0 0
-843 33 880 0 5
-8
-13508 8773 4541 988 181
-12
-205 217
-9 7 1
-360 303 60 1 0 0
-12
-96 98 3
-870 904
-130
-2105 2264 6
-3084 1576 1650
-939 971
-2249 1150 1281
-693 9 513 142 76 expr: syntax error
Please help, I simply can't find anything about this.
Sample OUTPUT as requested:
ARRAY1 OUTPUT
1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 781 947 1 1 206 9 1 3 2 81 2602 7 158 1 1 43 91 1 120 6589 6 2534 1092 1 6014 7 2 2 37 1 1 1 80 2 1 1270 15448 66 1 10238 1 10794 16061 4 1 1 1 9754 5617 1123 926 3 24 10 16
ARRAY2 OUTPUT
1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47 787 947 1 1 206 9 1 3 2 81 2602 7 159 1 1 43 91 1 120 6869 6 2534 1092 1 6044 7 2 2 37 1 1 1 80 2 1 1270 15563 66 1 10293 1 10804 16134 4 1 1 1 9755 5633 1135 928 3 24 10 16
START
The answer lies in Russell Uhl's comment above. Your loop runs one time to many(this is your code):
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
To fix, you need to change the test condition from c <= ${#ARRAY2[#]} to c < ${#ARRAY2[#]}:
for (( c=$zero; c < ${#ARRAY2[#]}; c++ ))
do
echo $((${ARRAY2[$c]} - ${ARRAY[$c]}))
done
I've also changed the expr to use arithmetic evaluation builtin $((...)).
The test script (sum.sh):
#!/bin/bash
zero=0
ARRAY=()
ARRAY2=()
first="1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 7
second="1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47
for i in $first; do
ARRAY+=($i)
done
# Alternately as chepner suggested:
ARRAY2=($second)
for (( c=$zero; c < ${#ARRAY2[#]}; c++ )); do
echo -n $((${ARRAY2[$c]} - ${ARRAY[$c]})) " "
done
Running it:
samveen#precise:/tmp$ echo $BASH_VERSION
4.2.25(1)-release
samveen#precise:/tmp$ bash sum.sh
0 0 0 0 0 0 0 0 14 6 476 0 0 0 4 4 0 0 0 0 0 0 0 16 4 0 0 0 48 0 0 0 0 27 0 0 0 0 0 16 0 0 0 0 501 62 36 0 8 0 0 0 5 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 280 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 115 0 0 55 0 10 73 0 0 0 0 1 16 12 2 0 0 0 0
EDIT:
* Added improvements from suggestions in comments.
I think the problem has to be when the two arrays don't have the same size. It's easy to reproduce that syntax error -- one of the operands for the minus operator is an empty string:
$ a=5; b=3; expr $a - $b
2
$ a=""; b=3; expr $a - $b
expr: syntax error
$ a=5; b=""; expr $a - $b
expr: syntax error
$ a=""; b=""; expr $a - $b
-
Try
ARRAY=( $(command to generate a list of numbers) )
sleep 1
ARRAY2=( $(command to generate a list of numbers) )
if (( ${#ARRAY[#]} != ${#ARRAY2[#]} )); then
echo "error: different size arrays!"
echo "ARRAY: ${#ARRAY[#]} (${ARRAY[*]})"
echo "ARRAY2: ${#ARRAY2[#]} (${ARRAY2[*]})"
fi
"The error occurs whenever the first array is smaller than the second" -- of course. You're looping from 0 to the array size of ARRAY2. When ARRAY has fewer elements, you'll eventually try to access an index that does not exist in the array. When you try to reference an unset variable, bash gives you the empty string.
$ a=(1 2 3)
$ b=(4 5 6 7)
$ i=2; expr ${a[i]} - ${b[i]}
-3
$ i=3; expr ${a[i]} - ${b[i]}
expr: syntax error

File handling in Batch programming?

I have a text file that has a number in every new line and all are in ascending order.
Contents are like :
1
13
25
37
49
97
109
121
I want to extract only those numbers who have difference greater than 12, with the previous number. I wish to use batch program for this....
How can I do that ?
I would have liked to see you make an attempt but anyway I had a go and this is the closest I could get
c:\temp>type test.txt
1 line 1
10 line 1a
13 line 2
25 line 3
22 line 3a
37 line 4
49 line 5
97 line 6
109 line 7
121 line 8
c:\temp>test.bat
25 line 3
37 line 4
49 line 5
97 line 6
109 line 7
121 line 8
c:\temp>
using this code in test.bat:
#echo off
SETLOCAL ENABLEDELAYEDEXPANSION
set /a cur="0"
for /f "tokens=1,* delims= " %%a in ('type test.txt') do (
set line=%%a %%b
set /a num="%%a"
set /a dif="!num!-!cur!"
if !dif! geq 12 #echo !line!
set /a cur="%%a"
)

Resources