So I am working on a large text file made up of rows and rows of numbers,
below is just an short excerpt to help with the question, but it could even be a series of random incrementing numbers with all sorts of numerical gaps between each row.
267
368
758
936
1248
1415
1739
1917
I am looking for a way to set a fixed numerical gap between every second pair of numbers starting on the 2nd and 3rd number. such as 100 whist maintaining the numerical difference within each pair but this difference could be any number.
Such that if the numerical gap was set to 100 the above example would become:
267
368
# gap of 100
468
646
# gap of 100
746
913
# gap of 100
1013
1191
would anybody know of a possible one liner to do this in terminal of a shell script. Thanks
A little bit clumsy, but for starters this would do, I guess. It reads the list of numbers from stdin (i.e. start with cat numbers.txt or the like), then pipe it into the rest.
paste - - | {
read -r x m; echo $x; echo $m
while read -r x y; do echo $((m+=100)); echo $((m+=y-x)); done
}
267
368
468
646
746
913
1013
1191
Explanation: paste - - lets us read two numbers at once, so we can read line by line (two numbers). The first pair is printed out unchanged, but the subsequent pairs only serve as the base for calculating the diffrence, which is added onto a running variable, which also increments by 100 on each iteration.
Here's a rewrite as parametrized function, and without the use of paste:
addgaps() {
read -r m; echo $m; read -r m; echo $m
while read -r x; read -r y; do echo $((m+=$1)); echo $((m+=y-x)); done;
}
cat numbers.txt | addgaps 100
i want to create a file with only 2 bytes and having as a text "00"
when I run the command echo -n ok > myFile to have just 2 bytes in 'myFile' I get 6 bytes after running the ls -l
In fact it shows me in the byte section that I have 6 bytes but I want to just 2 bytes.
Use printf:
printf '%s' ok >myFile
...or more simply (but less reliably, as this doesn't always work right if your ok is instead a string with %s, literal backslashes, etc):
printf ok >myFile
You can use dd utility
dd if=/dev/zero of=twobytes.txt count=2 bs=1
2+0 records in
2+0 records out
2 bytes copied, 0.000128232 s, 15.6 kB/s
cat twobytes.txt | hexdump
0000000 0000
0000002
ls -l twobytes.txt
-rw-r--r-- 1 lmc users 2 Aug 4 17:39 twobytes.txt
This would also work (thanks #CharlesDuffy)
tr -d '\n' <<<'00' > twobytes.txt
Not recommended as echo behavior might differ in some implementation (see comments below)
echo -ne "\060\060" > twobytes.txt
I'm researching the rhythmic elements of prime number sequences in binary. I have multiple sets of files containing vertical lists, and I want to apply bitwise logic operators between any two of them line-by-line.
i.e.
$cat binary-superprimes.txt
11
101
1011
10001
11111
101001
111011
1000011
1010011
1101101
$cat binary-perniciousprimes.txt
11
101
111
1011
1101
10001
10011
11111
100101
101001
I'm looking for commands, a script, or an application (I'd prefer commands/a script but its not deal-breaking) that will let me and/or/xor/etc. these outputs in order line-by-line, in much the same style/similar to the output of diff or comm.
Using CentOS 7/Ubuntu 18.04/MacOS 10.15.
edit
Expected output (binary expansion of XORing each entry above in decimal):
0
0
1100
11010
10010
111000
101000
1011100
1110110
1000100
As for what I've tried, as I said I've played around with for loops, but I don't know how (or if its possible) two iterate two lists for comparison in this context (i.e. two "for i in"'s with a single "done" - using $i and $x as inputs for a basic "echo (($x^$i))"
I've also tried a program called "bitwise" but its output is too verbose and it cannot seem to read files, only values.
Assuming your bash version is >= 4.0 and supports mapfile,
would you try the following:
mapfile -t x < "binary-superprimes.txt"
mapfile -t y < "binary-perniciousprimes.txt"
for (( i=0; i<${#x[#]}; i++ )); do
echo "obase=2;" $(( 2#${x[i]} ^ 2#${y[i]} )) | bc
done
Output:
0
0
1100
11010
10010
111000
101000
1011100
1110110
1000100
In case your bash does not support mapfile command, please try the alternative:
while read -r line; do
x+=($line)
done < "binary-superprimes.txt"
while read -r line; do
y+=($line)
done < "binary-perniciousprimes.txt"
for (( i=0; i<${#x[#]}; i++ )); do
echo "obase=2;" $(( 2#${x[i]} ^ 2#${y[i]} )) | bc
done
Hope this helps.
You can use bc for this purpose. First you create file
xor.bc
define xor(x,y) {
auto n,z,t,a,b,c,os,qx,qy;
os=scale;scale=0
n=0;x/=1;y/=1
if(x<0){x=-1-x;n=!n}
if(y<0){y=-1-y;n=!n}
z=0;t=1;while(x||y){
qx=x/4;qy=y/4;
c=(a=x-4*qx)+(b=y-4*qy)
if(!c%2)c=a+4-b
z+=t*(c%4)
t*=4;x=qx;y=qy
}
if(n)z=-1-z
scale=os;return (z)
}
Then you create loop to get the numbers one by one. And you can exec XOR by this:
paste binary-superprimes.txt binary-perniciousprimes.txt |while read var1 var2;
do
echo "ibase=2;obase=2;xor($var1;$var)|bc -l xor.bc
done
I have a text file which has thousands of number values like
1
2
3
4
5
.
.
.
.
n
I know we can use awk to separate these values. But is there a way in which one can fetch first 10,20,40,80,160....,n values in different text files.
I was using python to do so but it takes a lot of time to separate these files.Here is the sample code for python
import numpy as np
from itertools import islice
data = np.loadtxt('ABC.txt',
unpack=True,
delimiter=',',
skiprows=1)
n = 10
iterator = list(islice(data[0], n))
for item in range(n):
np.savetxt('output1.txt',iterator,delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*2))
for item in iterator:
np.savetxt('output2.txt', iterator, delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*4))
for item in iterator:
np.savetxt('output3.txt', iterator, delimiter=',',fmt='%10.5f')
iterator = list(islice(data[0], n*8))
for item in iterator:
np.savetxt('output4.txt', iterator, delimiter=',',fmt='%10.5f')
and so on.
Is there a better way to do this in bash or in python. Thank you in advance!
an inefficient but quick to implement apprach
s=5; for i in {1..10}; do ((s*=2)); head -$s file > sub$i; done
since the files are overlapping, there will be better ways, but based on the size of the file and how many times it needs to be repeated this might be good enough.
You didn't provide any sample input and expected output and the text of your questions is ambiguous so this is just a guess but this MAY be what you're looking for:
$ seq 1000 | awk -v c=10 'NR==c{print; c=2*c}'
10
20
40
80
160
320
640
If not then edit your question to clarify.
SED is your friend:
$ numlines=$( wc -l big_text_file.txt | cut -d' ' -f1 )
$ step=100
$ echo $numlines
861
$ for (( ii=1; ii<=$numlines; ii+=$step )); do echo $ii,$(( ii+step-1 ))w big_text_file.${ii}.txt; done > break.sed
$ cat break.sed
1,100w big_text_file.1.txt
101,200w big_text_file.101.txt
201,300w big_text_file.201.txt
301,400w big_text_file.301.txt
401,500w big_text_file.401.txt
501,600w big_text_file.501.txt
601,700w big_text_file.601.txt
701,800w big_text_file.701.txt
801,900w big_text_file.801.txt
$ sed -n -f break.sed big_text_file.txt
$ wc -l big_text_file*.txt
100 big_text_file.101.txt
100 big_text_file.1.txt
100 big_text_file.201.txt
100 big_text_file.301.txt
100 big_text_file.401.txt
100 big_text_file.501.txt
100 big_text_file.601.txt
100 big_text_file.701.txt
61 big_text_file.801.txt
861 big_text_file.txt
1722 total
I need to loop some values,
for i in $(seq $first $last)
do
does something here
done
For $first and $last, I need it to be of fixed length 5. So if the input is 1, I need to add zeros in front such that it becomes 00001. It loops till 99999 for example, but the length has to be 5.
E.g.: 00002, 00042, 00212, 12312 and so forth.
Any idea on how I can do that?
In your specific case though it's probably easiest to use the -f flag to seq to get it to format the numbers as it outputs the list. For example:
for i in $(seq -f "%05g" 10 15)
do
echo $i
done
will produce the following output:
00010
00011
00012
00013
00014
00015
More generally, bash has printf as a built-in so you can pad output with zeroes as follows:
$ i=99
$ printf "%05d\n" $i
00099
You can use the -v flag to store the output in another variable:
$ i=99
$ printf -v j "%05d" $i
$ echo $j
00099
Notice that printf supports a slightly different format to seq so you need to use %05d instead of %05g.
Easier still you can just do
for i in {00001..99999}; do
echo $i
done
If the end of sequence has maximal length of padding (for example, if you want 5 digits and command is seq 1 10000), than you can use -w flag for seq - it adds padding itself.
seq -w 1 10
would produce
01
02
03
04
05
06
07
08
09
10
use printf with "%05d" e.g.
printf "%05d" 1
Very simple using printf
[jaypal:~/Temp] printf "%05d\n" 1
00001
[jaypal:~/Temp] printf "%05d\n" 2
00002
Use awk like this:
awk -v start=1 -v end=10 'BEGIN{for (i=start; i<=end; i++) printf("%05d\n", i)}'
OUTPUT:
00001
00002
00003
00004
00005
00006
00007
00008
00009
00010
Update:
As pure bash alternative you can do this to get same output:
for i in {1..10}
do
printf "%05d\n" $i
done
This way you can avoid using an external program seq which is NOT available on all the flavors of *nix.
I pad output with more digits (zeros) than I need then use tail to only use the number of digits I am looking for. Notice that you have to use '6' in tail to get the last five digits :)
for i in $(seq 1 10)
do
RESULT=$(echo 00000$i | tail -c 6)
echo $RESULT
done
If you want N digits, add 10^N and delete the first digit.
for (( num=100; num<=105; num++ ))
do
echo ${num:1:3}
done
Output:
01
02
03
04
05
Other way :
zeroos="000"
echo
for num in {99..105};do
echo ${zeroos:${#num}:${#zeroos}}${num}
done
So simple function to convert any number would be:
function leading_zero(){
local num=$1
local zeroos=00000
echo ${zeroos:${#num}:${#zeroos}}${num}
}
One way without using external process forking is string manipulation, in a generic case it would look like this:
#start value
CNT=1
for [whatever iterative loop, seq, cat, find...];do
# number of 0s is at least the amount of decimals needed, simple concatenation
TEMP="000000$CNT"
# for example 6 digits zero padded, get the last 6 character of the string
echo ${TEMP:(-6)}
# increment, if the for loop doesn't provide the number directly
TEMP=$(( TEMP + 1 ))
done
This works quite well on WSL as well, where forking is a really heavy operation. I had a 110000 files list, using printf "%06d" $NUM took over 1 minute, the solution above ran in about 1 second.
This will work also:
for i in {0..9}{0..9}{0..9}{0..9}
do
echo "$i"
done
If you're just after padding numbers with zeros to achieve fixed length, just add the nearest multiple of 10
eg. for 2 digits, add 10^2, then remove the first 1 before displaying output.
This solution works to pad/format single numbers of any length, or a whole sequence of numbers using a for loop.
# Padding 0s zeros:
# Pure bash without externals eg. awk, sed, seq, head, tail etc.
# works with echo, no need for printf
pad=100000 ;# 5 digit fixed
for i in {0..99999}; do ((j=pad+i))
echo ${j#?}
done
Tested on Mac OSX 10.6.8, Bash ver 3.2.48
TL;DR
$ seq 1 10 | awk '{printf("%05d\n", $1)}'
Input(Pattern 1. Slow):
$ seq 1 10 | xargs -n 1 printf "%05d\n"
Input(Pattern 2. Fast):
$ seq 1 10 | awk '{printf("%05d\n", $1)}'
Output(same result in each case):
00001
00002
00003
00004
00005
00006
00007
00008
00009
00010
Explanation
I'd like to suggest the above patterns. These implementations can be used as a command so that we can use them again with ease. All you have to care about in these commands is the length of the numbers after being converted.(like changing the number %05d into %09d.) Plus, it's also applicable to other solutions such as the following. The example is too dependent on my environment, so your output might be different, but I think you can tell the usefulness easily.
$ wc -l * | awk '{printf("%05d\n", $1)}'
00007
00001
00001
00001
00013
00017
00001
00001
00001
00043
And like this:
$ wc -l * | awk '{printf("%05d\n", $1)}' | sort | uniq
00001
00007
00013
00017
00043
Moreover, if you write in this manner, we can also execute the commands asynchronously. (I found a nice article:
https://www.dataart.com/en/blog/linux-pipes-tips-tricks)
disclaimer: I'm not sure of this, and I am not a *nix expert.
Performance test:
Super Slow:
$ time seq 1 1000 | xargs -n 1 printf "%09d\n" > test
seq 1 1000 0.00s user 0.00s system 48% cpu 0.008 total
xargs -n 1 printf "%09d\n" > test 1.14s user 2.17s system 84% cpu 3.929 total
Relatively Fast:
for i in {1..1000}
do
printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test 0.01s user 0.01s system 74% cpu 0.021 total
for i in {1..1000000}
do
printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test 7.10s user 1.52s system 99% cpu 8.669 total
Fast:
$ time seq 1 1000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000 0.00s user 0.00s system 47% cpu 0.008 total
awk '{printf("%09d\n", $1)}' > test 0.00s user 0.00s system 52% cpu 0.009 total
$ time seq 1 1000000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000000 0.27s user 0.00s system 28% cpu 0.927 total
awk '{printf("%09d\n", $1)}' > test 0.92s user 0.01s system 99% cpu 0.937 total
If you have to implement the higher performance solution, probably it may require other techniques, not using the shell script.
1.) Create a sequence of numbers 'seq' from 1 to 1000, and fix the width '-w' (width is determined by length of ending number, in this case 4 digits for 1000).
2.) Also, select which numbers you want using 'sed -n' (in this case, we select numbers 1-100).
3.) 'echo' out each number. Numbers are stored in the variable 'i', accessed using the '$'.
Pros: This code is pretty clean.
Cons: 'seq' isn't native to all Linux systems (as I understand)
for i in `seq -w 1 1000 | sed -n '1,100p'`;
do
echo $i;
done
you don't need awk for that — either seq or jot alone suffices :
% seq -f '%05.f' 6 # bsd-seq
00001
00002
00003
00004
00005
00006
% gseq -f '%05.f' 6 # gnu-seq
00001
00002
00003
00004
00005
00006
% jot -w '%05.f' 6
00001
00002
00003
00004
00005
00006
…… unless you're going into bigint territory :
% gawk -Mbe '
function __(_,___) {
return +_<+___?___:_
}
BEGIN {
_+=_^=_<_
____="%0*.f\n"
} {
___=__($--_, !+$++_)
_____=__(++_+--_, length(______=+$NF))
do {
printf(____,_____,___)
} while (___++<______)
}' <<< '999999999999999999996 1000000000000000000003'
0999999999999999999996
0999999999999999999997
0999999999999999999998
0999999999999999999999
1000000000000000000000
1000000000000000000001
1000000000000000000002
1000000000000000000003
——————————————————————————————————————————————————
If you need to print out a HUGE range of numbers, then this approach maybe a bit faster -
printing out every integer from 1 to 1 million, left-zero-padded to 9-digits wide, in 0.049s
*caveat : I didn't have the spare time to make it cover all input ranges :: it's just a proof of concept accepting increments of powers of 10
——————————————————————————————————————————————————
( time ( LC_ALL=C mawk2 '
function jot(____,_______,_____,_,__,___,______) {
if(____==(____^!____)) {
return +____<+_______\
? sprintf("%0*.f",_______,____)\
: +____
}
_______= (_______-=____=length(____)-\
(_=!(_<_)))<+_ \
? "" \
: sprintf("%0*.f",_______,!_)
__=_= (!(__=_+=_+_))(__=(-+--_)+(__+=_)^++_)\
(__+=_=(((_--^_--+_++)^++_-_^!_)/_))(__+_)
_____= "."
gsub(_____,"\\&&",__)
____—-
do {
gsub(_____,__,_)
_____=_____"."
} while(—____)
gsub(_____,(_______)"&\n",_)
sub("^[^\n]+[\n]","",_)
sub(".$",""~"",_______)
return \
(_)(_______)\
sprintf("%0*.f",length(_____),__<__)
} { print jot($1,$2) }' <<< '10000000 9'
) | pvE9 ) |xxh128sum |ggXy3 | lgp3
sleep 2
( time ( LC_ALL=C jot 1000000 |
LC_ALL=C mawk2 '{ printf("%09.f\n", $1) }'
) | pvE9 ) |xxh128sum |ggXy3 | lgp3
out9: 9.54MiB 0:00:00 [ 275MiB/s] [ 275MiB/s] [<=> ]
( LC_ALL=C mawk2 <<< '1000000 9'; )
0.04s user 0.01s system 93% cpu 0.049 total
e0491043bdb4c8bc16769072f3b71f98 stdin
out9: 9.54MiB 0:00:00 [36.5MiB/s] [36.5MiB/s] [ <=> ]
( LC_ALL=C jot 1000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )
0.43s user 0.01s system 158% cpu 0.275 total
e0491043bdb4c8bc16769072f3b71f98 stdin
By the time you're doing 10 million, the time differences become noticeable :
out9: 95.4MiB 0:00:00 [ 216MiB/s] [ 216MiB/s] [<=> ]
( LC_ALL=C mawk2 <<< '10000000 9'; )
0.38s user 0.06s system 95% cpu 0.458 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin
out9: 95.4MiB 0:00:02 [36.3MiB/s] [36.3MiB/s] [ <=> ]
( LC_ALL=C jot 10000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )
4.30s user 0.04s system 164% cpu 2.638 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin
out9: 95.4MiB 0:00:02 [35.2MiB/s] [35.2MiB/s] [ <=> ]
( LC_ALL=C python3 -c '__=1; ___=10**7;
[ print("{0:09d}".format(_)) for _ in range(__,___+__) ]'
) | pvE9 ) | xxh128sum |ggXy3 | lgp3 ; )
2.68s user 0.04s system 99% cpu 2.725 total
be3ed6c8e9ee947e5ba4ce51af753663 stdin