Aternating columns in two files and make one file - shell

I have two text files (tsv format), which each have 240 columns and 100 lines. I would like to sort the columns alternately and make one file (480 columns and 100 lines). How could I achieve this goal with standard command line tools in Linux?
Example (in case of a single line) :
FileA:
1 2 3 4 5 ・・・
FileB:
001 002 003 004 005 ・・・
Expected Result:
1 001 2 002 3 003 ・・・

just awk with "getline"
==> file1 <==
a b c d e f g h i j k l m
n o p q r s t u v w x y z
==> file2 <==
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26
$ awk '{split($0,f1);
getline < "file2";
for(i=1;i<=NF;i++) printf "%s%s%s%s", f1[i], OFS, $i, (i==NF?ORS:OFS)}' file1
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13
n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
if space is not the required output delimiter set OFS accordingly...
ps. getline use is normally discouraged for any non-trivial script, and usually should be avoided by beginners. See here for example for more explanation.

paste + awk solution:
Sample file1:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Sample file2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
paste file1 file2 \
| awk '{ len=NF/2;
for (i=1; i<=len; i++)
printf "%s %s%s", $i, $(i+len),(i==len? ORS:OFS)
}'
The output:
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26

Use bash to make some dummy files that match the spec, along with some letter-line suffixes to tell them apart:
for f in {A..z} {A..j} ; do echo $( seq -f '%g'"$f" 240 ) ; done > FileA
for f in {z..A} {j..A} ; do echo $( seq -f '%03.3g'"$f" 240 ) ; done > FileB
Use bash, paste and xargs:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo
Since the output of that is a bit unweildy, show first ten lines, with both the first and last six columns:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo |
head | cut -d' ' -f1-6,476-480
1A 001z 2A 002z 3A 003z 238z 239A 239z 240A 240z
1B 001y 2B 002y 3B 003y 238y 239B 239y 240B 240y
1C 001x 2C 002x 3C 003x 238x 239C 239x 240C 240x
1D 001w 2D 002w 3D 003w 238w 239D 239w 240D 240w
1E 001v 2E 002v 3E 003v 238v 239E 239v 240E 240v
1F 001u 2F 002u 3F 003u 238u 239F 239u 240F 240u
1G 001t 2G 002t 3G 003t 238t 239G 239t 240G 240t
1H 001s 2H 002s 3H 003s 238s 239H 239s 240H 240s
1I 001r 2I 002r 3I 003r 238r 239I 239r 240I 240r
1J 001q 2J 002q 3J 003q 238q 239J 239q 240J 240q

Related

Regularly spaced numbers between bounds without jot

I want to generate a sequence of integer numbers between 2 included bounds. I tried with seq, but I could only get the following:
$ low=10
$ high=100
$ n=8
$ seq $low $(( (high-low) / (n-1) )) $high
10
22
34
46
58
70
82
94
As you can see, the 100 is not included in the sequence.
I know that I can get something like that using jot:
$ jot 8 10 100
10
23
36
49
61
74
87
100
But the server I use does not have jot installed, and I do not have permission to install it.
Is there a simple method that I could use to reproduce this behaviour without jot?
If you don't mind launching an extra process (bc) and if it's available on that machine, you could also do it like this:
$ seq -f'%.f' 10 $(bc <<<'scale=2; (100 - 10) / 7') 100
10
23
36
49
61
74
87
100
Or, building on oguz ismail's idea (but using a precision of 4 decimal places):
$ declare -i low=10
$ declare -i high=100
$ declare -i n=8
$ declare incr=$(( (${high}0000 - ${low}0000) / (n - 1) ))
$
$ incr=${incr::-4}.${incr: -4}
$
$ seq -f'%.f' "$low" "$incr" "$high"
10
23
36
49
61
74
87
100
You can try this naive implementation of jot:
jot_naive() {
local -i reps=$1 begin=${2}00 ender=${3}00
local -i x step='(ender - begin) / (reps - 1)'
for ((x = begin; x <= ender; x += step)); do
printf '%.f\n' ${x::-2}.${x: -2}
done
}
You could use awk for that:
awk -v reps=8 -v begin=10 -v end=100 '
BEGIN{
step = (end - begin) / (reps-1);
for ( f = i = begin; i <= end; i = int(f += step) )
print i
}
'
10
22
35
48
61
74
87
100
UPDATE 1 ::: fixed double-printing of final row due to difference less than tiny value of epsilon
to maintain directional consistency, rounding is performed based on sign of final :
—- e.g. if final is negative, then any rounding is done as if the current step value (CurrSV) is negative, regardless of sign of CurrSV
———————————————
while i haven't tested every single possible edge case, i believe this version of the code should handle both positive and negative rounding properly, for the most part.
that said, this isn't a jot replacement at all - it only implements a very small subset of the steps counting feature instead of being a full blown clone of it:
{m,g}awk '
function __________(_) {
return -_<+_?_:-_
}
BEGIN {
CONVFMT = "%.250g"; OFMT = "%.13f"
_____ = (_+=_^=_______=______="")^-_^!_
} {
____ = (((_=$(__=(___=$NF)^(_<_)))^(_______=______="")*___\
)-(__=$++__))/--_
_________ = (_____=(-(_^=_<_))^(________=+____<-____\
)*(_/++_))^(++_^_++-+_*_--+-_)
if (-___<=+___) {
_____=__________(_____)
_________=__________(_________)
}
do { print ______,
++_______, int(__+_____), -____+(__+=____)
} while(________? ___<(__-_________) : (__+_________)<___)
print ______, ++_______, int(___+_____), ___, ORS
}' <<< $'8 -3 -100\n8 10 100\n5 -15 -100\n5 15 100\n11 100 11\n10 100 11'
|
1 -3 -3
2 -17 -16.8571428571429
3 -31 -30.7142857142857
4 -45 -44.5714285714286
5 -58 -58.4285714285714
6 -72 -72.2857142857143
7 -86 -86.1428571428572
8 -100 -100
1 10 10
2 23 22.8571428571429
3 36 35.7142857142857
4 49 48.5714285714286
5 61 61.4285714285714
6 74 74.2857142857143
7 87 87.1428571428572
8 100 100
1 -15 -15
2 -36 -36.2500000000000
3 -58 -57.5000000000000
4 -79 -78.7500000000000
5 -100 -100
1 15 15
2 36 36.2500000000000
3 58 57.5000000000000
4 79 78.7500000000000
5 100 100
1 100 100
2 91 91.1000000000000
3 82 82.2000000000000
4 73 73.3000000000000
5 64 64.4000000000000
6 55 55.5000000000000
7 47 46.6000000000000
8 38 37.7000000000000
9 29 28.8000000000000
10 20 19.9000000000000
11 11 11
1 100 100
2 90 90.1111111111111
3 80 80.2222222222222
4 70 70.3333333333333
5 60 60.4444444444445
6 51 50.5555555555556
7 41 40.6666666666667
8 31 30.7777777777778
9 21 20.8888888888889
10 11 11

How to check whether one number range from one file is the subset of other number range from other file?

I'm trying to find out whether range1 numbers [both columns a and b] are the subset or lying between range2's columns [both columns b and c].
range1
a b
15 20
8 10
37 44
32 37
range2
a b c
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
output:
a b c
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
I wanted to compare range1[a] with range2[b] and range1[b] with range2[c]. One to all comparison.
For example in the first run: the first row of range-1 with all other rows of range-2. But range1[a] should be compared only with range2[b] and similarly, range1[b] should be compared only with range2[c]. Based on this only I have written a criteria :
lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i]
r1[a] r2[b] r1[b] r2[c]
15 > 6 20 < 12 False
15 > 13 20 < 21 True
15 > 31 20 < 35 False
15 > 36 20 < 45 False
I have tried to learn from this code [which is working if we wanted to check if a single number is lying in a specific range], therefore I tried modifying the same for two both numbers. But did not work, I'm feeling I'm not able to read the second file properly.
Code: [reference but little modified]
#!/bin/bash
awk -F'\t' '
# 1st pass (fileB): read the lower and upper range bounds
FNR==NR { lbs[++count] = $2+0; ubs[count] = $3+0; next }
# 2nd pass (fileA): check each line against all ranges.
{ lbsf1[++countf1] = $1+0; ubsf1[countf1] = $2+0;
for(i=1;i<=count;++i)
{
for(j=1;j<=countf1;++j)
{
if (lbsf1[j] >= lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i])
{ print lbs[i]"\t"ubs[i]"\t"lbsf1[j]"\t"ubsf1[j] ; next }
}
}
}
' range2 range1
This code gave me output:
6 12 8 10
6 12 8 10
6 12 8 10
Thank you.
Assumptions:
input files do not have a b nor a b c as the first line (we can modify the proposed code if these lines really do exist in the data)
lines in range2 do not have leading white space (as shown in the provided sample)
while not demonstrated by the small sample provided, going to assume that a row from range1 may 'match' with multiple rows from range2 and that we want to print all matches (we can modify the proposed code if we need to stop processing a range1 row once we find the first 'match')
Sample data:
$ cat range1
15 20
8 10
37 44
32 37
$ cat range2
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
chr15 36 67 # added to demonstrate multi-match for range1 [ 37 , 44 ]
Issues with current code:
loads the range1 data into an array and then loops over this (ever growing array) for each line read from range1; this array is unnecessary as we just need to process the current row from range1
the dual loop logic is aborted (; next) upon printing the first matching set of records; this premature cancellation means we only see the first match ... over and over; the ; next can be removed
the range2[a] column is not captured during range2 input processing so we're unable to display this column in the final output
Updating OP's current code to address these issues:
awk '
BEGIN { FS=OFS="\t" }
FNR==NR { chromo[++count]=$1
lbs[count]=$2
ubs[count]=$3
next
}
{ lb=$1
ub=$2
for (i=1;i<=count;++i)
if ( lb >= lbs[i] && lb <= ubs[i] && ub >= lbs[i] && ub <= ubs[i] )
print chromo[i],lbs[i],ubs[i],lb,ub
}
' range2 range1
This generates:
chr2 13 21 15 20
chr1 6 12 8 10
chr4 36 45 37 44
chr15 36 67 37 44
If the output needs to be sorted we could modify the awk code to store the results in another array and then during END {...} processing sort and print the array. But for simplicity sake we'll just pipe the output to sort, eg:
$ awk ' BEGIN { FS=OFS="\t" } FNR==NR ....' range2 range1 | sort -V
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
chr15 36 67 37 44

Sed capitalize first letter of word in key-value pair

I'm currently working on my karaoke files and i see a lot of Non capitalized words.
The .txt files are structured as a key-value pair and i was wondering how to capitalize the first letter of every value word.
Example txt:
#TITLE:fire and Water
#ARTIST:Some band
#CREATOR:yunho
#LANGUAGE:Korean
#EDITION:UAS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16 1
I'd like to capitalize the words after the keys TITLE, ARTISTS, LANGUAGE and EDITION
so for the example txt:
#TITLE:**F**ire **A**nd **W**ater
#ARTIST:**S**ome **B**and
#CREATOR:yunho
#LANGUAGE:**K**orean
#EDITION:**U**AS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16 1
Another thing is that i have loads of these txt's files all in designated directories. I want to run the program from the parent recursive for all *.txt files
Example directories:
Library/Some Band/Some Band - Some Song/some txt file.txt
Library/Some Band2/Some Band2 - Some Song/sometxtfile.txt
Library/Some Band3/Some Band3 - Some Song/some3333 txt file.txt
I've tried to do so with find . -name '*.txt' -exec sed -i command {} +
but i got stuck on the search and replace with sed... anyone care to help me out?
You can use this gnu-sed command to uppercase starting letter for matching lines:
sed -E '/^#(TITLE|ARTIST|LANGUAGE|EDITION):/s/\b([a-z])/\u\1/g' file
#TITLE:Fire And Water
#ARTIST:Some Band
#CREATOR:yunho
#LANGUAGE:Korean
#EDITION:UAS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16
For find + sed command use:
find . -name '*.txt' -exec \
sed -E -i '/^#(TITLE|ARTIST|LANGUAGE|EDITION):/s/\b([a-z])/\u\1/g' {} +

Moving average with successive elements using awk

I am trying to write a script in which each row element will give the average of next N rows (including itself). I know how to do it with preceding rows like the Nth row will give the average of the preceding N rows. Here is the script for that
awk '
BEGIN{
N = 5;
}
{
x = $2;
i = NR % N;
aveg += (x - X[i]) / N;
X[i] = x;
print $1, $2, aveg;
}' < file > aveg.txt
where file looks like this
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
I want that the first row has average of the next 5 elements i.e.
(1+2+3+4+5)/5=3
second row (2+3+4+5+6)/5=4
third row (3+4+5+6+7)/5=5
and so on. The rows should look like
1 1 3
2 2 4
3 3 5
4 4 6 ...
Can it be done as simply as the script shown above? I was thinking of assigning the row value as the value of nth row below and then proceeding with the above script. But, unfortunately I am unable to assign the row value to some value down the file. Can someone help me to write this script and find the moving average. I am open to other commands in shell as well.
$ cat test.awk
BEGIN {
N=5 # the window size
}
{
n[NR]=$1 # store the value in an array
}
NR>=N { # for records where NR >= N
x=0 # reset the sum variable
delete n[NR-N] # delete the one out the window of N
for(i in n) # all array elements
x+=n[i] # ... must be summed
print n[NR-(N-1)],x/N # print the row from the beginning of window
} # and the related window average
Try it:
$ for i in {1..36}; do echo $i $i >> test.in ; done
$ awk -f test.awk test.in
1 3
2 4
3 5
...
30 32
31 33
32 34
It could be done in running sum, add current and subtract n[NR-N], like this:
BEGIN {
N=5
}
{
n[NR]=$1
x+=$1-n[NR-N]
}
NR>=N {
delete n[NR-N]
print n[NR-(N-1)],x/N
}
Using a N-sized array
BEGIN { N=5 }
{
s+=array[i++]=$1
if (i>=N) i=0
}
NR>=N {
print array[i], s/N
s-=array[i]
}
$ cat tst.awk
BEGIN { OFS="\t"; range=5 }
{ recs[NR%range] = $0 }
NR >= range {
sum = 0
for (i in recs) {
split(recs[i],flds)
sum += flds[2]
}
print recs[(NR+1-range)%range], sum / range
}
.
$ awk -f tst.awk file
1 1 3
2 2 4
3 3 5
4 4 6
5 5 7
6 6 8
7 7 9
8 8 10
9 9 11
10 10 12
11 11 13
12 12 14
13 13 15
14 14 16
15 15 17
16 16 18
17 17 19
18 18 20
19 19 21
20 20 22
21 21 23
22 22 24
23 23 25
24 24 26
25 25 27
26 26 28
27 27 29
28 28 30
29 29 31
30 30 32
31 31 33
32 32 34
33 33 35
34 34 36
35 35 37
36 36 38

Appending matching strings to specific lines (sed/bash)

Using bash/sed, I am trying to search for matching string and when a match is found it appends that variable to the end of the applicable line.
Two lists:
[linuxbox tmp]$ cat lista
a 23
c 4
e 55
b 2
f 44
d 74
[linuxbox tmp]$ cat listb
a 3
e 34
c 84
b 1
f 500
d 666666
#!/bin/bash
rm -rf listc
cat listb |while read rec
do
var1="$(echo $rec | awk '{ print $1 }')"
var2="$(echo $rec | awk '{ print $2 }')"
if egrep "^$var1" lista; then
sed "/^$var1/ s/$/ $var2/1" lista >> listc
fi
done
when I run it I get:
[linuxbox tmp]$ ./blah.sh
a 23
e 55
c 4
b 2
f 44
d 74
[linuxbox tmp]$ cat listc
a 23 3
c 4
e 55
b 2
f 44
d 74
a 23
c 4
e 55 34
b 2
f 44
d 74
a 23
c 4 84
e 55
b 2
f 44
d 74
a 23
c 4
e 55
b 2 1
f 44
d 74
a 23
c 4
e 55
b 2
f 44 500
d 74
a 23
c 4
e 55
b 2
f 44
d 74 666666
The output i'm trying to get to is:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
What am I doing wrong here? Is there a better way to accomplish this?
Thank you in advance.
If you don't mind getting a sorted output:
join <(sort lista) <(sort listb)
One way using awk:
awk 'FNR==NR { array[$1]=$2; next } { if ($1 in array) print $1, array[$1], $2 }' lista listb
Results:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
Based on your input files (no duplicate keys in a single file), the following will do the trick:
>> for key in $(awk '{print $1}' lista) ; do
+> echo $key $(awk -vK=$key '$1==K{$1="";print}' lista listb)
+> done
a 23 3
c 4 84
e 55 34
b 2 1
f 44 500
d 74 666666

Resources