Regularly spaced numbers between bounds without jot - bash

I want to generate a sequence of integer numbers between 2 included bounds. I tried with seq, but I could only get the following:
$ low=10
$ high=100
$ n=8
$ seq $low $(( (high-low) / (n-1) )) $high
10
22
34
46
58
70
82
94
As you can see, the 100 is not included in the sequence.
I know that I can get something like that using jot:
$ jot 8 10 100
10
23
36
49
61
74
87
100
But the server I use does not have jot installed, and I do not have permission to install it.
Is there a simple method that I could use to reproduce this behaviour without jot?

If you don't mind launching an extra process (bc) and if it's available on that machine, you could also do it like this:
$ seq -f'%.f' 10 $(bc <<<'scale=2; (100 - 10) / 7') 100
10
23
36
49
61
74
87
100
Or, building on oguz ismail's idea (but using a precision of 4 decimal places):
$ declare -i low=10
$ declare -i high=100
$ declare -i n=8
$ declare incr=$(( (${high}0000 - ${low}0000) / (n - 1) ))
$
$ incr=${incr::-4}.${incr: -4}
$
$ seq -f'%.f' "$low" "$incr" "$high"
10
23
36
49
61
74
87
100

You can try this naive implementation of jot:
jot_naive() {
local -i reps=$1 begin=${2}00 ender=${3}00
local -i x step='(ender - begin) / (reps - 1)'
for ((x = begin; x <= ender; x += step)); do
printf '%.f\n' ${x::-2}.${x: -2}
done
}

You could use awk for that:
awk -v reps=8 -v begin=10 -v end=100 '
BEGIN{
step = (end - begin) / (reps-1);
for ( f = i = begin; i <= end; i = int(f += step) )
print i
}
'
10
22
35
48
61
74
87
100

UPDATE 1 ::: fixed double-printing of final row due to difference less than tiny value of epsilon
to maintain directional consistency, rounding is performed based on sign of final :
—- e.g. if final is negative, then any rounding is done as if the current step value (CurrSV) is negative, regardless of sign of CurrSV
———————————————
while i haven't tested every single possible edge case, i believe this version of the code should handle both positive and negative rounding properly, for the most part.
that said, this isn't a jot replacement at all - it only implements a very small subset of the steps counting feature instead of being a full blown clone of it:
{m,g}awk '
function __________(_) {
return -_<+_?_:-_
}
BEGIN {
CONVFMT = "%.250g"; OFMT = "%.13f"
_____ = (_+=_^=_______=______="")^-_^!_
} {
____ = (((_=$(__=(___=$NF)^(_<_)))^(_______=______="")*___\
)-(__=$++__))/--_
_________ = (_____=(-(_^=_<_))^(________=+____<-____\
)*(_/++_))^(++_^_++-+_*_--+-_)
if (-___<=+___) {
_____=__________(_____)
_________=__________(_________)
}
do { print ______,
++_______, int(__+_____), -____+(__+=____)
} while(________? ___<(__-_________) : (__+_________)<___)
print ______, ++_______, int(___+_____), ___, ORS
}' <<< $'8 -3 -100\n8 10 100\n5 -15 -100\n5 15 100\n11 100 11\n10 100 11'
|
1 -3 -3
2 -17 -16.8571428571429
3 -31 -30.7142857142857
4 -45 -44.5714285714286
5 -58 -58.4285714285714
6 -72 -72.2857142857143
7 -86 -86.1428571428572
8 -100 -100
1 10 10
2 23 22.8571428571429
3 36 35.7142857142857
4 49 48.5714285714286
5 61 61.4285714285714
6 74 74.2857142857143
7 87 87.1428571428572
8 100 100
1 -15 -15
2 -36 -36.2500000000000
3 -58 -57.5000000000000
4 -79 -78.7500000000000
5 -100 -100
1 15 15
2 36 36.2500000000000
3 58 57.5000000000000
4 79 78.7500000000000
5 100 100
1 100 100
2 91 91.1000000000000
3 82 82.2000000000000
4 73 73.3000000000000
5 64 64.4000000000000
6 55 55.5000000000000
7 47 46.6000000000000
8 38 37.7000000000000
9 29 28.8000000000000
10 20 19.9000000000000
11 11 11
1 100 100
2 90 90.1111111111111
3 80 80.2222222222222
4 70 70.3333333333333
5 60 60.4444444444445
6 51 50.5555555555556
7 41 40.6666666666667
8 31 30.7777777777778
9 21 20.8888888888889
10 11 11

Related

How to check whether one number range from one file is the subset of other number range from other file?

I'm trying to find out whether range1 numbers [both columns a and b] are the subset or lying between range2's columns [both columns b and c].
range1
a b
15 20
8 10
37 44
32 37
range2
a b c
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
output:
a b c
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
I wanted to compare range1[a] with range2[b] and range1[b] with range2[c]. One to all comparison.
For example in the first run: the first row of range-1 with all other rows of range-2. But range1[a] should be compared only with range2[b] and similarly, range1[b] should be compared only with range2[c]. Based on this only I have written a criteria :
lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i]
r1[a] r2[b] r1[b] r2[c]
15 > 6 20 < 12 False
15 > 13 20 < 21 True
15 > 31 20 < 35 False
15 > 36 20 < 45 False
I have tried to learn from this code [which is working if we wanted to check if a single number is lying in a specific range], therefore I tried modifying the same for two both numbers. But did not work, I'm feeling I'm not able to read the second file properly.
Code: [reference but little modified]
#!/bin/bash
awk -F'\t' '
# 1st pass (fileB): read the lower and upper range bounds
FNR==NR { lbs[++count] = $2+0; ubs[count] = $3+0; next }
# 2nd pass (fileA): check each line against all ranges.
{ lbsf1[++countf1] = $1+0; ubsf1[countf1] = $2+0;
for(i=1;i<=count;++i)
{
for(j=1;j<=countf1;++j)
{
if (lbsf1[j] >= lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i])
{ print lbs[i]"\t"ubs[i]"\t"lbsf1[j]"\t"ubsf1[j] ; next }
}
}
}
' range2 range1
This code gave me output:
6 12 8 10
6 12 8 10
6 12 8 10
Thank you.
Assumptions:
input files do not have a b nor a b c as the first line (we can modify the proposed code if these lines really do exist in the data)
lines in range2 do not have leading white space (as shown in the provided sample)
while not demonstrated by the small sample provided, going to assume that a row from range1 may 'match' with multiple rows from range2 and that we want to print all matches (we can modify the proposed code if we need to stop processing a range1 row once we find the first 'match')
Sample data:
$ cat range1
15 20
8 10
37 44
32 37
$ cat range2
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
chr15 36 67 # added to demonstrate multi-match for range1 [ 37 , 44 ]
Issues with current code:
loads the range1 data into an array and then loops over this (ever growing array) for each line read from range1; this array is unnecessary as we just need to process the current row from range1
the dual loop logic is aborted (; next) upon printing the first matching set of records; this premature cancellation means we only see the first match ... over and over; the ; next can be removed
the range2[a] column is not captured during range2 input processing so we're unable to display this column in the final output
Updating OP's current code to address these issues:
awk '
BEGIN { FS=OFS="\t" }
FNR==NR { chromo[++count]=$1
lbs[count]=$2
ubs[count]=$3
next
}
{ lb=$1
ub=$2
for (i=1;i<=count;++i)
if ( lb >= lbs[i] && lb <= ubs[i] && ub >= lbs[i] && ub <= ubs[i] )
print chromo[i],lbs[i],ubs[i],lb,ub
}
' range2 range1
This generates:
chr2 13 21 15 20
chr1 6 12 8 10
chr4 36 45 37 44
chr15 36 67 37 44
If the output needs to be sorted we could modify the awk code to store the results in another array and then during END {...} processing sort and print the array. But for simplicity sake we'll just pipe the output to sort, eg:
$ awk ' BEGIN { FS=OFS="\t" } FNR==NR ....' range2 range1 | sort -V
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
chr15 36 67 37 44

Remove rows that have a specific numeric value in a field

I have a very bulky file about 1M lines like this:
4001 168991 11191 74554 60123 37667 125750 28474
8 145 25 101 83 51 124 43
2985 136287 4424 62832 50788 26847 89132 19184
3 129 14 101 88 61 83 32 1 14 10 12 7 13 4
6136 158525 14054 100072 134506 78254 146543 41638
1 40 4 14 19 10 35 4
2981 112734 7708 54280 50701 33795 75774 19046
7762 339477 26805 148550 155464 119060 254938 59592
1 22 2 12 10 6 17 2
6 136 16 118 184 85 112 56 1 28 1 5 18 25 40 2
1 26 2 19 28 6 18 3
4071 122584 14031 69911 75930 52394 89733 30088
1 9 1 3 4 3 11 2 14 314 32 206 253 105 284 66
I want to remove rows that have a value less than 100 in the second column.
How to do this with sed?
I would use awk to do this. Example:
awk ' $2 >= 100 ' file.txt
this will only display every row from file.txt that has a column $2 greater than 100.
Use the following approach:
sed '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' -E /tmp/test.txt
(replace /tmp/test.txt with your current file path)
([0-9]{1,2}|[0][0-9]+) - will match either digits from 0 to 99 OR a digits with leading zero (ex. 012, 00982)
d - delete the pattern space;
-E(--regexp-extended) - Use extended regular expressions rather than basic regular expressions
To remove matched lines in place use -i option:
sed -i -E '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' /tmp/test.txt

convert comma separated list in text file into columns in bash

I've managed to extract data (from an html page) that goes into a table, and I've isolated the columns of said table into a text file that contains the lines below:
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
Each bracketed list of numbers represents a column. What I'd like to do is turn these lists into actual columns that I can work with in different data formats. I'd also like to be sure to include that blank parts of these lists too (i.e., "[,,,]")
This is basically what I'm trying to accomplish:
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
. . . .
. . . .
. . . .
I'm parsing data from a web page, and ultimately planning to make the process as automated as possible so I can easily work with the data after I output it to a nice format.
Anyone know how to do this, have any suggestions, or thoughts on scripting this?
Since you have your lists in python, just do it in python:
l=[["30", "30", "32"], ["28","6","6"], ["-7", "", ""], ["0", "", ""]]
for i in zip(*l):
print "\t".join(i)
produces
30 28 -7 0
30 6
32 6
awk based solution:
awk -F, '{gsub(/\[|\]/, ""); for (i=1; i<=NF; i++) a[i]=a[i] ? a[i] OFS $i: $i}
END {for (i=1; i<=NF; i++) print a[i]}' file
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
..........
..........
Another solution, but it works only for file with 4 lines:
$ paste \
<(sed -n '1{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '2{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '3{s,\[,,g;s,\],,g;s|,|\n|g;p}' t) \
<(sed -n '4{s,\[,,g;s,\],,g;s|,|\n|g;p}' t)
30 28 -7 0
30 6
32 6
35 50 43 3
34 58 71 5
43 56 30 1.5
52 64 23 1
68 87 28 1.5
88 99 13 0.5
97 110 13 0.5
105 116 10 0
107 119 11 0.5
107 120 12 0.5
105 117 11 0.5
101 114 13 0.5
93 113 22 1
88 103 17 0.5
80 82 3 0
69 6 -0.5
55 47 -15 -0.5
-20 2.5
38
71
Updated: or another version with preprocessing:
$ sed 's|\[||;s|\][,]\?||' t >t2
$ paste \
<(sed -n '1{s|,|\n|g;p}' t2) \
<(sed -n '2{s|,|\n|g;p}' t2) \
<(sed -n '3{s|,|\n|g;p}' t2) \
<(sed -n '4{s|,|\n|g;p}' t2)
If a file named data contains the data given in the problem (exactly as defined above), then the following bash command line will produce the output requested:
$ sed -e 's/\[//' -e 's/\]//' -e 's/,/ /g' <data | rs -T
Example:
cat data
[30,30,32,35,34,43,52,68,88,97,105,107,107,105,101,93,88,80,69,55],
[28,6,6,50,58,56,64,87,99,110,116,119,120,117,114,113,103,82,6,47],
[-7,,,43,71,30,23,28,13,13,10,11,12,11,13,22,17,3,,-15,-20,,38,71],
[0,,,3,5,1.5,1,1.5,0.5,0.5,0,0.5,0.5,0.5,0.5,1,0.5,0,-0.5,-0.5,2.5]
$ sed -e 's/[//' -e 's/]//' -e 's/,/ /g' <data | rs -T
30 28 -7 0
30 6 43 3
32 6 71 5
35 50 30 1.5
34 58 23 1
43 56 28 1.5
52 64 13 0.5
68 87 13 0.5
88 99 10 0
97 110 11 0.5
105 116 12 0.5
107 119 11 0.5
107 120 13 0.5
105 117 22 1
101 114 17 0.5
93 113 3 0
88 103 -15 -0.5
80 82 -20 -0.5
69 6 38 2.5
55 47 71

Replace repeated elements in a list with unique identifiers

I have a list like the below:
1 . Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 . Sam 3 4 56 6 89
3 . Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 . Pig 2 5 67 2 21
(except the real list is 40 million lines long).
There are repeated elements in the second column (i.e. the ".")
I want to replace these with unique identifers (e.g. ".1", ".2", ".3"...".n")
I tried to do this with a bash loop / sed combination, but it didn't work...
Failed attempt:
for i in 1..4
do
sed -i "s_//._//."$i"_"$i""
done
(Essentially, I was trying to get sed to replace each n th "." with ".n", but this didn't work).
Here's a way to do it with awk (assuming your file is called input:
$ awk '$2=="."{$2="."++counter}{print}' input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
The awk program replaces the second column ($2) by a string formed by concatenating . and a pre-incremented counter (++counter) if the second column was exactly .. It then prints out all the columns it got (with $2 modified or not) ({print}).
Plain bash alternative:
c=1
while read -r a b line ; do
if [ "$b" == "." ] ; then
echo "$a ."$((c++))" $line"
else
echo "$a $b $line"
fi
done < input
Since your question is tagged sed and bash, here are a few examples for completeness.
Bash only
Use parameter expansion. The second column will be unique, but not sequential:
i=1; while read line; do echo ${line/\./.$((i++))}; done < input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .3 Sam 3 4 56 6 89
3 .4 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .6 Pig 2 5 67 2 21
Bash + sed
sed cannot increment variables, it has to be done externally.
For each line, increment $i if line contains a ., then let sed append $i after the .
i=0
while read line; do
[[ $line == *.* ]] && i=$((i+1))
sed "s#\.#.$i#" <<<"$line"
done < input
Output:
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
you can use this command:
awk '{gsub(/\./,c++);print}' filename
Output:
1 0 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 2 Sam 3 4 56 6 89
3 3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 5 Pig 2 5 67 2 21

Multiply variable ranges with Bash brace expansion

I've a question extending the code in this question: Can you multiply two variable ranges in Bash using brace expansion (not seq) and not using loops?
This is what I've tried so far
Work out how variable boundary ranges work (finally, a good use of eval):
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10
$ boundary=10
$ echo {1..$boundary}
{1..10}
$ eval echo {1..$boundary}
1 2 3 4 5 6 7 8 9 10
But how can you multiply two variable boundary ranges?
$ echo $(({1..10}*{1..10}))
1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20 24 28 32 36 40 5 10 15 20 25 30 35 40 45 50 6 12 18 24 30 36 42 48 54 60 7 14 21 28 35 42 49 56 63 70 8 16 24 32 40 48 56 64 72 80 9 18 27 36 45 54 63 72 81 90 10 20 30 40 50 60 70 80 90 100
$ boundary=10
$ echo $(({1..$boundary}*{1..$boundary}))
bash: {1..10}*{1..10}: syntax error: operand expected (error token is "{1..10}*{1..10}")
$ eval echo $(({1..$boundary}*{1..$boundary}))
bash: {1..10}*{1..10}: syntax error: operand expected (error token is "{1..10}*{1..10}")
this seems to work, just escaped the $ and [] to delay their evaluation (so that they are echoed, then evaluated)
eval echo \$\[{1..$boundary}*{1..$boundary}\]
That said I now need to go lookup what $[] does ;-)
Second answer, with non deprecated $[] syntax (but two evals)
eval eval echo "\$\(\("{1..$boundary}*{1..$boundary}"\)\)"
or
eval eval echo \\\$\\\(\\\({1..$boundary}*{1..$boundary}\\\)\\\)

Resources