ELF go binaries default byte alignment - go

I empirically see that go ELF binaries use 16 bytes alignment. For example:
$ wget https://github.com/gardener/gardenctl/releases/download/v0.24.2/gardenctl-linux-amd64
$ readelf -W -s gardenctl-linux-amd64 | grep -E "FUNC" | wc -l
44746
$ readelf -W -s gardenctl-linux-amd64 | grep -E "0[ ]+[0-9]* FUNC" | wc -l
44744
so vast majority have 0 in their least significant byte. Is it always like that in go binaries?

This depends on the platform. If you have a source repo checked out:
% cd go/src/cmd/link/internal
% grep "funcAlign =" */*.go
amd64/l.go: funcAlign = 32
arm/l.go: funcAlign = 4 // single-instruction alignment
arm64/l.go: funcAlign = 16
mips64/l.go: funcAlign = 8
ppc64/l.go: funcAlign = 16
riscv64/l.go: funcAlign = 8
s390x/l.go: funcAlign = 16
x86/l.go: funcAlign = 16
the alignment for amd64 may go back down to 16 in the future; it is 32 for a while because of https://github.com/golang/go/issues/35881

Related

How to convert file size to human readable and print with other columns?

I want to convert the 5th column in this command output to human readable format.
For ex if this is my input :
-rw-rw-r-- 1 bhagyaraj bhagyaraj 280000 Jun 17 18:34 demo1
-rw-rw-r-- 1 bhagyaraj bhagyaraj 2800000 Jun 17 18:34 demo2
-rw-rw-r-- 1 bhagyaraj bhagyaraj 28000000 Jun 17 18:35 demo3
To something like this :
-rw-rw-r-- 280K demo1
-rw-rw-r-- 2.8M demo2
-rw-rw-r-- 28M demo3
I tried this command, but this will return only the file size column.
ls -l | tail -n +2 |awk '{print $5 | "numfmt --to=si"}'
ls is just for example my use case is very huge and repeated execution must be avoided
Any help would be appreciated :)
Just use -h --si
-h, --human-readable with -l and -s, print sizes like 1K 234M 2G etc.
--si likewise, but use powers of 1000 not 1024
So the command would be
ls -lh --si | tail -n +2
If you don't use ls and the command you intend to run doesn't have an option similar to -h --si in ls then numfmt already has the --field option to specify which column you want to format. For example
$ df | LC_ALL=en_US.UTF-8 numfmt --header --field 2-4 --to=si
Filesystem 1K-blocks Used Available Use% Mounted on
udev 66M 0 66M 0% /dev
tmpfs 14M 7.2K 14M 1% /run
/dev/mapper/vg0-lv--0 4.1G 3.7G 416M 90% /
tmpfs 5.2K 4 5.2K 1% /run/lock
/dev/nvme2n1p1 524K 5.4K 518K 2% /boot/efi
Unfortunately although numfmt does try to preserve the columnation, it fails if there are some large variation in the line length after inserting group separators like you can see above. So sometimes you might still need to reformat the table with column
df | LC_ALL=en_US.UTF-8 numfmt --header --field 2-4 --to=si | column -t -R 2,3,4,5
The -R 2,3,4,5 option is for right alignment, but some column versions like the default one in Ubuntu don't support it so you need to remove that
Alternatively you can also use awk to format only the columns you want, for example column 5 in case of ls
$ ls -l demo* | awk -v K=1e3 -v M=1e6 -v G=1e9 'func format(v) {
if (v > G) return v/G "G"; else if (v > M) return v/M "M";
else if (v > K) return v/K "K"; else return v
} { $5 = format($5); print $0 }' | column -t
-rw-rw-r-- 1 ph ph 280K Jun 18 09:23 demo1
-rw-rw-r-- 1 ph ph 2.8M Jun 18 09:24 demo2
-rw-rw-r-- 1 ph ph 28M Jun 18 09:23 demo3
-rw-rw-r-- 1 ph ph 2.8G Jun 18 09:30 demo4
And column 2, 3, 4 in case of df
# M=1000 and G=1000000 because df output is 1K-block, not bytes
$ df | awk -v M=1000 -v G=1000000 'func format(v) {
if (v > G) return v/G "G"; else if (v > M) return v/M "M"; else return v
}
{
# Format only columns 2, 3 and 4, ignore header
if (NR > 1) { $2 = format($2); $3 = format($3); $4 = format($4) }
print $0
}' OFS="\t" | column -t
Filesystem 1K-blocks Used Available Use% Mounted on
udev 65.8273G 0 65.8273G 0% /dev
tmpfs 13.1772G 7M 13.1702G 1% /run
/dev/mapper/vg0-lv--0 4073.78G 3619.05G 415.651G 90% /
tmpfs 65.8861G 0 65.8861G 0% /dev/shm
tmpfs 5.12M 4 5.116M 1% /run/lock
tmpfs 65.8861G 0 65.8861G 0% /sys/fs/cgroup
/dev/nvme2n1p2 999.32M 363.412M 567.096M 40% /boot
UPDATE 1 :
if you need just a barebones module for byte size formatting (it's setup for base-2 now but modifying it for --si should be trivial):
{m,g}awk '
BEGIN { OFS="="
_____=log(__=(_+=_+=_^=_)^(____=++_))
} gsub("[.]0+ bytes",
" -bytes-",
$(!__*($++NF = sprintf("%#10.*f %s" ,____,
(_ = $!!__) / __^(___=int(log(_)/_____)),
!___ ? "bytes" : substr("KMGTPEZY",___,!!__)"iB"))))^!__'
=
734 734 -bytes-
180043 175.82324 KiB
232819 227.36230 KiB
421548373 402.01986 MiB
838593829 799.74540 MiB
3739382399 3.48257 GiB
116601682159 108.59378 GiB
147480014471 137.35147 GiB
11010032230111 10.01357 TiB
19830700070261 18.03592 TiB
111120670776601 101.06366 TiB
15023323323323321 13.34339 PiB
85255542224555233 75.72213 PiB
444444666677766611 394.74616 PiB
106941916666944416909 92.75733 EiB
111919999919911191991 97.07513 EiB
767777766776776776777767 650.33306 ZiB
5558888858993555888686669 4.59821 YiB
========================
this is probably waaaay overkill, but I wrote it a while back, which can calculate the human-readable value, as well as comma formatted of raw byte value, supporting everything from kilo-bit to yotta-byte
with options for :
base 2 or base 10 (enter 10 or "M/m" for metric)
bytes (B) or bits (b)
The only thing that needs to be hard coded in are the letters themselves, since they grow linearly upon either
every 3rd power of 10 (1,000), or
every 5th power of 4 (1,024)
.
{m,g}awk '
BEGIN {
1 FS = OFS = "="
}
2302812 $!NF = substr(bytesformat($2, 10, "B"), 1, 15)\
substr(bytesformat($2, 2, "B"), 1, 15)\
bytesformat($2, 2, "b")'
# Functions, listed alphabetically
6908436 function bytesformat(_,_______,________,__, ___, ____, _____, ______)
{
6908436 _____=__=(____^=___*=((____=___+=___^= "")/___)+___+___)
6908436 ___/=___
6908436 sub("^0+","",_)
6908436 ____=_____-= substr(_____,index(_____,index(_____,!__))) * (_______~"^(10|[Mm])$")
6908436 _______=length((____)____)^(________~"^b(it)?$")
6908436 if ((____*__) < (_______*_)) { # 6906267
24438981 do {
24438981 ____*=_____
24438981 ++___
} while ((____*__) < (_______*_))
}
6908436 __=_
6908436 sub("(...)+$", ",&", __)
6908436 gsub("[^#-.][^#-.][^#-.]", "&,", __)
6908436 gsub("[,]*$|^[,]+", "", __)
6908436 sub("^[.]", "0&", __)
6908436 return \
sprintf("%10.4f %s%s | %s byte%.*s",
_=="" ? +_:_/(_____^___)*_______,
substr("KMGTPEZY", ___, _^(_<_)),
--_______?"b":"B",__==""?+__:__,(_^(_<_))<_,"s")
}
|
In this sample, it's showing metric bytes, binary bytes, binary bits, and raw input byte value :
180.0430 KB | 175.8232 KB | 1.3736 Mb | 180,043 bytes
232.8190 KB | 227.3623 KB | 1.7763 Mb | 232,819 bytes
421.5484 MB | 402.0199 MB | 3.1408 Gb | 421,548,373 bytes
838.5938 MB | 799.7454 MB | 6.2480 Gb | 838,593,829 bytes
3.7394 GB | 3.4826 GB | 27.8606 Gb | 3,739,382,399 bytes
116.6017 GB | 108.5938 GB | 868.7502 Gb | 116,601,682,159 bytes
147.4800 GB | 137.3515 GB | 1.0731 Tb | 147,480,014,471 bytes
11.0100 TB | 10.0136 TB | 80.1085 Tb | 11,010,032,230,111 bytes
19.8307 TB | 18.0359 TB | 144.2873 Tb | 19,830,700,070,261 bytes
111.1207 TB | 101.0637 TB | 808.5093 Tb | 111,120,670,776,601 bytes
15.0233 PB | 13.3434 PB | 106.7471 Pb | 15,023,323,323,323,321 bytes
85.2555 PB | 75.7221 PB | 605.7771 Pb | 85,255,542,224,555,233 bytes
444.4447 PB | 394.7462 PB | 3.0840 Eb | 444,444,666,677,766,611 bytes
106.9419 EB | 92.7573 EB | 742.0586 Eb | 106,941,916,666,944,416,909 bytes
111.9200 EB | 97.0751 EB | 776.6010 Eb | 111,919,999,919,911,191,991 bytes
767.7778 ZB | 650.3331 ZB | 5.0807 Yb | 767,777,766,776,776,776,777,767 bytes
5.5589 YB | 4.5982 YB | 36.7856 Yb | 5,558,888,858,993,555,888,686,669 bytes

Why would a cygwin program lose path prefix when running under gdb?

I just came to a really strange problem with gdb with a relatively large software build in Cygwin64 on Windows 10, which I cannot really reproduce with a minimal example.
So, let me first provide a minimal example that works fine (and does not reproduce the error): basically, the below code (test_cygwin.cpp) just wants to create exclusive access file in /var/run.
Let's recall that /var/run is a Unix path, and as such does not exist in Windows; otherwise it is mapped to the Windows filesystem through a directory in the Cygwin installation:
$ cygpath -w /var/run
C:\cygwin64\var\run
Here is test_cygwin.cpp:
// compile with:
// g++ test_cygwin.cpp -g -o test_cygwin.exe
#include <iostream> // cout
#include <errno.h> // errno
#include <string.h> // strerror
#include <fcntl.h> // open
#include <unistd.h> // close
int main(void) {
std::string filepath("/var/run/test");
std::cout << "opts: " << O_RDWR << " | " << O_CREAT << " | " << O_EXCL << " , " << S_IRUSR << " | " << S_IWUSR << " | " << S_IRGRP << " | " << S_IWGRP << " | " << S_IROTH << " | " << S_IWOTH << std::endl;
int my_fd = open(filepath.c_str(), O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
if (my_fd < 0) {
std::cout << "Failed to open " << filepath << ": " << strerror(errno) << std::endl;
return 1;
}
std::cout << "Opened " << filepath << std::endl;
close(my_fd);
return 0;
}
So, basically, if I run this program and the file does not exist, the file is created; if the file exists, the program reports an error - in Cygwin's bash shell:
user#DESKTOP-COMPUTER /tmp
$ ls -la /var/run/test
ls: cannot access '/var/run/test': No such file or directory
user#DESKTOP-COMPUTER /tmp
$ ./test_cygwin.exe
opts: 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2
Opened /var/run/test
user#DESKTOP-COMPUTER /tmp
$ ls -la /var/run/test
-rw-r--r-- 1 user None 0 Oct 8 14:38 /var/run/test
user#DESKTOP-COMPUTER /tmp
$ ./test_cygwin.exe
opts: 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2
Failed to open /var/run/test: File exists
user#DESKTOP-COMPUTER /tmp
$ rm /var/run/test && ls -la /var/run/test
ls: cannot access '/var/run/test': No such file or directory
Ok, so far so good. Now, the opts are printed out, so that I could re-run the open command from within a gdb session; again in Cygwin bash:
$ gdb --args ./test_cygwin.exe
GNU gdb (GDB) (Cygwin 10.2-1) 10.2
...
Reading symbols from ./test_cygwin.exe...
(gdb) b test_cygwin.cpp:13
Breakpoint 1 at 0x1004011f2: file test_cygwin.cpp, line 13.
(gdb) r
Starting program: /tmp/test_cygwin.exe
[New Thread 12044.0x467c]
[New Thread 12044.0x4f8]
[New Thread 12044.0x280]
[New Thread 12044.0x9b4]
opts: 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2
Thread 1 "test_cygwin" hit Breakpoint 1, main () at test_cygwin.cpp:13
13 int my_fd = open(filepath.c_str(), O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
(gdb) p (int)open("/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
[New Thread 12044.0x42a0]
[New Thread 12044.0x2ff8]
$1 = 3
(gdb) p (int)open("/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
$2 = -1
(gdb) c
Continuing.
Failed to open /var/run/test: File exists
[Thread 12044.0x1c14 exited with code 1]
[Thread 12044.0x9b4 exited with code 1]
[Thread 12044.0x280 exited with code 1]
[Thread 12044.0x4f8 exited with code 1]
[Thread 12044.0x42a0 exited with code 1]
[Thread 12044.0x2ff8 exited with code 1]
Program terminated with signal SIGHUP, Hangup.
The program no longer exists.
(gdb) quit
So, in the above snippet, the breakpoint halts the program right before it attempts to open the file, and the file is created manually by running p (int)open("/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2) in the gdb shell.
The first invocation of this command in the gdb shell succeeds (3 is returned, as the numeric file descriptor of the newly opened file); the second fails (-1 is returned) - and when the program proceeds, it clearly fails, as the requested file exists already.
So far, so good - all is as expected.
Now, here is the problem I have in my actual build - which I cannot reproduce here:
When I run the program normally (e.g. ./myprogram.exe --arg1=1 ...), the open call succeeds
When I run the program via gdb (that is, gdb --args ./myprogram.exe --arg1=1 ...), the open call always fails, with "No such file or directory"
So, similar to the above, I placed a breakpoint right before that open command in the actual program, and I've tried (making sure I've deleted rm /var/run/test at start):
(gdb) p (int)open("/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
$27 = -1
(gdb) p (int)open("/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
$28 = -1
Yup, so it fails already at very start; however, if I now add the cygdrive prefix:
(gdb) p (int)open("/cygdrive/c/cygwin64/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
$29 = 26
(gdb) p (int)open("/cygdrive/c/cygwin64/var/run/test", 2 | 512 | 2048 , 256 | 128 | 32 | 16 | 4 | 2)
$30 = -1
... then it succeeds, from within gdb?!?! And indeed, the Cygwin bash shell also sees this file afterwards:
$ ls -la /var/run/test
-rw-r--r-- 1 user None 0 Oct 8 14:50 /var/run/test
Recalling the error message from the first example: if I want to create a file (/var/run/test) with open, and I get "No such file or directory" error - I interpret that as the parent directory (here /var/run) not existing, at least from the perspective of the program.
So, in this particular case, when gdb debugs this particular myprogram.exe, it has somehow "lost" the reference to root Unix paths (I've confirmed that also creating a file in /tmp fails in this case); or in other words, the program in gdb cannot see these paths (and so these paths don't exist for this program) - however, it can still access the same folder locations, and successfully open/create a file, by having the Cygwin installation path prefixed.
The strange thing being, that /cygdrive is also a Unix path formulation, albeit specific to Cygwin?! Also, interestingly, even when debugging this program, I can "see" these directories from gdb itself:
(gdb) cd /var/run
Working directory /var/run.
(gdb) pwd
Working directory /var/run.
... so, gdb in itself still "sees" these directories - even if the debuggee program does not?!
So, my question is: while the minimal example clearly shows that this is not a general issue with gdb in Cygwin - has anyone ever experienced, that a Cygwin program "loses reference" to (or "cannot see") Unix system paths when running in gdb; but otherwise sees these paths fine when running directly from the Cygwin bash shell?
If so, would anyone have an explanation why does this situation occur - and how to rectify it (that is, make the open succeed, also when the program runs under gdb - just as well as when the program runs standalone in the Cygwin bash shell)?
(just as a clarification note on why I want to know this: the actual program I'm debugging segfaults after the open call, which I'd like to catch in gdb; however, since the open call fails in gdb, the program under gdb currently cannot even get to the point that otherwise segfaults).

Debugging why SPI Master is Reading Arbitary Values

I have an SPI bus between a MAX V device and an AM335x processor.
The MAX V device has an SPI setup to repeatedly send a STD_LOGIC_VECTOR defined as "x0100".
This seems to work fine. The output on a scope is repeatedly the same value.
In Linux, I seem to get either shifted data, or some random data. Using spi-tools from here https://github.com/cpb-/spi-tools
When these tools are used, I get the following:
# spi-config -d /dev/spidev0.0 -m 1 -s 10000000
# spi-pipe -d /dev/spidev0.0 -b 2 -n 1 < /dev/urandom | hexdump
0000000 0202
0000002
# spi-pipe -d /dev/spidev0.0 -b 2 -n 1 < /dev/urandom | hexdump
0000000 0a0a
0000002
# spi-pipe -d /dev/spidev0.0 -b 2 -n 1 < /dev/urandom | hexdump
0000000 2a2a
0000002
# spi-pipe -d /dev/spidev0.0 -b 2 -n 1 < /dev/urandom | hexdump
0000000 aaaa
0000002
# spi-pipe -d /dev/spidev0.0 -b 2 -n 1 < /dev/urandom | hexdump
0000000 aaaa
0000002
You can see how the device is configured there. On the scope, the MISO pin is clearly outputting "00000010 00000000" for every 16 clock cycles on SCLK. What is happening here? How can I repeatedly get the correct value from the device?
For clarity, here is the relevant parts of the device tree and the kernel configuration.
Kernel
CONFIG_SPI=y
CONFIG_SPI_MASTER=y
CONFIG_SPI_GPIO=y
CONFIG_SPI_BITBANG=y
CONFIG_SPI_OMAP24XX=y
CONFIG_SPI_TI_QSPI=y
CONFIG_SPI_SPIDEV=y
CONFIG_REGMAP_SPI=y
CONFIG_MTD_SPI_NOR=y
CONFIG_SPI_CADENCE_QUADSPI=y
Device Tree
&spi1 {
/* spi1 bus is connected to the CPLD only on CS0 */
status = "okay";
pinctrl-names = "default";
pinctrl-0 = <&spi1_pins>;
ti,pindir-d0-out-d1-in;
cpld_spidev: cpld_spidev#0 {
status = "okay";
compatible = "linux,spidev";
spi-max-frequency = <1000000>;
reg = <0>;
};
};
Also here is a screengrab of the waveforms produced.
Really the end goal is an app to report the version stated as the STD_LOGIC_VECTOR, on the MAX V device. So 0100 is intended to be version 1.00.
Use the uboot_overlay in /boot/uEnv.txt called BB-SPIDEV0-00A0.dtbo.
If you need any more info, please ask. Oh! And there is a fellow, Dr. Molloy, that had produced a book a while back.
chp08/spi/ is the location of the file you will need to test the SPI Device.
The command is simply spidev_test

Optimised random number generation in bash

I'd like to generate a lot of integers between 0 and 1 using bash.
I tried shuf but the generation is very slow. Is there another way to generate numbers ?
This will output an infinite stream of bytes, written in binary and separated by a space :
cat /dev/urandom | xxd -b | cut -d" " -f 2-7 | tr "\n" " "
As an example :
10100010 10001101 10101110 11111000 10011001 01111011 11001010 00011010 11101001 01111101 10100111 00111011 10100110 01010110 11101110 01000011 00101011 10111000 01010110 10011101 01000011 00000010 10100001 11000110 11101100 11001011 10011100 10010001 01000111 01000010 01001011 11001101 11000111 11110111 00101011 00111011 10110000 01110101 01001111 01101000 01100000 11011101 11111111 11110001 10001011 11100001 11100110 10101100 11011001 11010100 10011010 00010001 00111001 01011010 00100101 00100100 00000101 10101010 00001011 10101101 11000001 10001111 10010111 01000111 11011000 01111011 10010110 00111100 11010000 11110000 11111011 00000110 00011011 11110110 00011011 11000111 11101100 11111001 10000110 11011101 01000000 00010000 00111111 11111011 01001101 10001001 00000010 10010000 00000001 10010101 11001011 00001101 00101110 01010101 11110101 10111011 01011100 00110111 10001001 00100100 01111001 01101101 10011011 00100001 01101101 01001111 01101000 00100001 10100011 00011000 01000001 00100100 10001101 10110110 11111000 01110111 10110111 11001000 00101000 01101000 01001100 10000001 11011000 11101110 11001010 10001101 00010011^C
If you don't want spaces between bytes (thanks #Chris):
cat /dev/urandom | xxd -b | head | cut -d" " -f 2-7 | tr -d "\n "
1000110001000101011111000010011011011111111001000000011000000100111101000001110110011011000000001101111111011000000100101001001110110001111000010100100100010110110000100111111110111011111100101000011000010010111010010001001001111000010101000110010010011011110000000011100110000000100111010001110000000011001011010101111001
tr -dc '01' < /dev/urandom is a quick and dirty way to do this.
If you're on OSX, tr can work a little weird, so you can use perl instead: perl -pe 'tr/01//dc' < /dev/urandom
Just for fun --
A native-bash function to print a specified number of random bits, extracted from the smallest possible number of evaluations of $RANDOM:
randbits() {
local x x_bits num_bits
num_bits=$1
while (( num_bits > 0 )); do
x=$RANDOM
x_bits="$(( x % 2 ))$(( x / 2 % 2 ))$(( x / 4 % 2 ))$(( x / 8 % 2 ))$(( x / 16 % 2 ))$(( x / 32 % 2 ))$(( x / 64 % 2 ))$(( x / 128 % 2 ))$(( x / 256 % 2 ))$(( x / 512 % 2 ))$(( x / 1024 % 2 ))$(( x / 2048 % 2 ))$(( x / 4096 % 2))$(( x / 8192 % 2 ))$(( x / 16384 % 2 ))"
if (( ${#x_bits} < $num_bits )); then
printf '%s' "$x_bits"
(( num_bits -= ${#x_bits} ))
else
printf '%s' "${x_bits:0:num_bits}"
break
fi
done
printf '\n'
}
Usage:
$ randbits 64
1011010001010011010110010110101010101010101011101100011101010010
Because this uses $RANDOM, its behavior can be made reproducible by assigning a seed value to $RANDOM before invoking it. This can be handy if you want to be able to reproduce bugs in software that uses "random" inputs.
Since the question asks for integers between 1 and 0, there is this extremely random and very fast method. A good one-liner for sure:
echo "0.$(printf $(date +'%N') | md5sum | tr -d '[:alpha:][:punct:]')"
This command will give you an output similar to this when thrown inside a for loop with 10 iterations:
0.97238535471032972041395
0.8642459339189067551494
0.18109959700829495487820
0.39135471514800072505703651
0.624084503017958530984255
0.41997456791539740171
0.689027289676627803
0.22698852059605560195614
0.037745437519184791498537
0.428629619193662260133
And if you need to print random strings of 1's and 0's, as others have assumed, you can make a slight change to the command like this:
printf $(date +'%N') | sha512sum | tr -d '[2-9][:alpha:][:punct:]'
Which will yield an output of random 0's and 1's similar to this when thrown into a for loop with 10 iterations:
011101001110
001110011011
0010100010111111
0000001101101001111011111111
1110101100
00010110100
1100101101110010
101100110101100
1100010100
0000111101100010001001
To my knowledge, and from what I have found online, this is the closest to true randomness we can get in bash. I have even made a game of dice (where the dice has 10 sides 0-9) to test the randomness, using this method for generating a single number from 0 to 9. Out of 100 dice throws, each side lands almost a perfect 10 times. Out of 1000 throws, each side hits around 890-1100 times. The variation of what side lands doesn't change much after 1000 throws. So you can be very sure that this method is highly ideal, at least for bash tools generating pseudo-random numbers, for the job.
And if you need just an absolute mind-blowingly ridiculous amount of randomness, the simple md5sum checksum command can be compounded upon itself many, many times and still be very fast. As an example:
printf $(date +'%N') | md5sum | md5sum | md5sum | tr -d '[:punct:][:space:]'
This will have a not-so-random number, obtained from printing the date command's nanosecond option, piped into md5sum. Then that md5 hash is piped into md5sum and then "that" hash is sent into md5sum for a last time. The output is a completely randomized hash that you can use tools like awk, sed, grep, and tr to control what you want printed.
Hope this helps.

To Split into fixed sequences and leave extra out

I would like to limit all files to be of the same fixed length but the last item can be any variable size but not more than 557.
This means that the file amount can be more than determined by the flag -n of the command split.
Code 1 (ok)
$ seq -w 1 1671 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao
557 xaa
557 xao
where xaa is the first file of the sequence, while xao the last one.
I increase the sequence by one unit but it causes 5 unit increase (557->562) in the last file xao which I do not understand:
$ seq -w 1 1672 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao
557 xaa
562 xao
Why does the increase of one-unit in sequence increase the last item (xao) by 5 units?
Code 2
$ seq -w 1 1671 | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
445 xaa
455 xao
$ seq -w 1 1672 | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
445 xaa
459 xao
so increasing the whole length by one sequence (4 characters) leads to 4 character increase (455 -> 459), in contrast to the first code where increase is 5 characters.
Code 3
Let's now keep each unit of sequence fixed to 4 characters by seq -w 0 0.0001 1 | gsed 's/\.//g':
$ seq -w 0 0.0001 1 | gsed 's/\.//g' | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
3333 xaa
3344 xao
$ seq -w 0 0.0001 1.0001 | gsed 's/\.//g' | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
3334 xaa
3335 xao
so increasing the sequence by one characters increases xaa by unit but decreases xao by 9 units.
This behavior is what I do not keep so logical.
How can you limit the sequence length first, for instance to be fixed at 557 and later determine the amount of files of successful files?
Original answer — for Code 1
Because seq -w 1 1671 generates 5 characters per number — 4 digits and 1 newline. So adding one number to the output adds 5 bytes to the output.
Extra answer — for Code 2
You've asked GNU split (aka gsplit) to split the file input into 15 chunks. It does its best to even the values out. But there's a limit to what it can do when the total number of bytes is not a multiple of 15. There are options to control what happens.
However, in the basic form, the -n 15 option means that the first 14 output files each get 445 characters, and the last gets 455 because there are 6685 = 445 * 15 + 10 characters in the output file. When you add another 4 characters to the file (because you delete the newlines), then the last file gets an additional 4 characters (because 6689 = 445 * 15 + 14).
Extra answer — for Code 3
First of all, the output from seq -w 0 0.0001 1 looks like:
0.0000
0.0001
0.0002
…
0.9998
0.9999
1.0000
So after the output is edited with the first sed, the numbers from 00000 to 10000 are present, one per line, with 6 characters per line (including the newline). The second sed eliminates the newlines, again.
There are 50006 bytes in /tmp/k on one line. That's equal to 15 * 3333 + 11, hence the first output. The second variant has 50011 bytes in /tmp/k, which is 15 * 3334 + 1. Hence the difference of only one.

Resources