show the difference in the opcodes for different architectures - gcc

I have the script which (as intended) allows me to see the difference in the opcodes generated for the different architectures (especially intrested in x87 instructions on x86 vs x86_64).
#!/usr/bin/env bash
cat - <<__EOF> a.cpp
int main()
{
double const x(1.0);
asm volatile ("fldl (%0)" : : "a" (&x));
return 0;
}
__EOF
EXEC_STR='g++ a.cpp -c -O0 -o /dev/null -m${BITNESS} -mfpmath=387 -Wa,-adhlns="${BITNESS}.lst"'
FILTER_STR='awk "/\/APP/, /\/NO_APP/" ${BITNESS}.lst | cut -f 2- > ${BITNESS}_.lst'
BITNESS=32 bash -c "${EXEC_STR} && ${FILTER_STR}"
BITNESS=64 bash -c "${EXEC_STR} && ${FILTER_STR}"
diff -w -B 32_.lst 64_.lst
Output:
3c3
< fldl (%eax)
---
> fldl (%rax)
But cut -f 2- cut out the column with opcode too (this is udesirable), in addition to cutting out the first column.
Are there other ways to get the desired result? And how to filter out the text?

Related

Which efficient & portable shell statement on GNU/Linux can zero-pad piped bytes to word boundary?

I need to pad NUL bytes at the end of a byte stream exceeding available storage & memory, so output length is divisible by N. Context of the function I am implementing:
#!/bin/sh
generate_arbitrary_length | paddingN | work_with_padded
Working code for N=8192:
padding8192(){ dd status=none bs=8192 conv=sync ; }
But reducing copy block size is orders of magnitude slower for small N, this did not finish:
padding4(){ dd status=none bs=4 conv=sync ; }
I can express the counting & padding using wc and dd, after duplicating the input stream:
padding4(){ { { tee /dev/fd/3 >&2 ; } 3>&1 | wc -c | { read -r isize ; pad=$(( 4 - isize % 4)) ; [ 0 -lt $pad ] && dd status=none if=/dev/zero bs=$pad count=1 >&2 ; } } 2>&1 ; }
Much faster already. But very difficult to read - who could even tell why padding ends up at EOF?
Any better approach?
Though I only need to keep as much state as needed to store byte count modulo word size, I cannot think of a simple yet performant implementation using shell builtins. Dependencies should remain minimal: using GNU coreutils/cpio/tar, no compiler/perl/features that would differ between busybox/dash/bash. I have not come up with an awk solution as I failed to make it perform well (G/s) on binary input not evenly NL/NUL-separated into lines.
Since you mention there's a compiler available, here's a tiny, portable C program. It does not get any faster and memory-economic. It's even readable for most people in the programming community. If not, you can always sprinkle /* Comments! */. :-)
#!/bin/sh
#
# pad.sh - pad input, reading in large blocks from stdin, writing stdout.
# padding $1:padchar $2:alignment $3:blocksize
padding () {
aout="./a$$.out"
cc -x c -o "$aout" - <<EOF
#include <stdio.h>
int main (void) {
size_t align = $2, nwritten = 0, nread;
char buffer[$3];
while ((nread = fread (buffer, 1, sizeof buffer, stdin)) > 0)
nwritten += fwrite (buffer, 1, nread, stdout);
if ((nwritten % align) != 0)
for (align -= nwritten % align; align != 0; --align)
putchar ($1);
return 0;
}
EOF
"$aout" && rm "$aout"
}
printf '%s' 123456789 | padding 0 4 16384 | od -c
printf '%s' abcdefghi | padding "'\n'" 16 BUFSIZ | od -c
printf '%s' PAGE_SIZE | padding 65 32 "$(getconf PAGE_SIZE)" | od -c
In action:
$ ./pad.sh
0000000 1 2 3 4 5 6 7 8 9 \0 \0 \0
0000014
0000000 a b c d e f g h i \n \n \n \n \n \n \n
0000020
0000000 P A G E _ S I Z E A A A A A A A
0000020 A A A A A A A A A A A A A A A A
0000040
If you are concerned about the non-POSIXly compiler option -x c you can easily write the C program to pad.c and compile it from there. Advanced error handling for fwrite, fread and putchar left to the reader.
Note how the here-document avoids main having to parse arguments. You can even pass strings like PAGE_SIZE if your stdio makes them available by default.
I just realized that compiling C like this is not much different from a nifty awk script -- awk also compiles an internal program and then executes it. What's better than compiling to the machine's CPU and running the executable?
The POSIX thing to do would be to use a temporary file.
padding() (
tmpf=$(mktemp) &&
trap 'rm "$tmpf"' EXIT &&
tee "$tmpf" &&
isize=$(wc -c <"$tmpf") &&
pad=$(( $1 - isize % $1 )) &&
if [ "$pad" -ne 0 ]; then
dd status=none if=/dev/zero bs="$pad" count=1
fi
)

Remove last argument in shell script (POSIX)

I am currently working on a language that aims to compile to POSIX shell languages and I want to introduce a pop feature. Just like how you can use "shift" to remove the first argument passed to a function:
f() {
shift
printf '%s' "$*"
}
f 1 2 3 #=> 2 3
I want some code that when introduced below can remove the last argument.
g() {
# pop
printf '%s' "$*"
}
g 1 2 3 #=> 1 2
I am aware of the array method as detailed in (Remove last argument from argument list of shell script (bash)), but I want something portable that will work in at least the following shells: ash, dash, ksh (Unix), bash, and zsh. I also want something reasonably speedy; something that opens external processes/subshells would be too heavy for small argument counts, thought if you have a creative solution I wouldn't mind seeing it regardless (and they can still be used as a fallback for large argument counts). Something as fast as those array methods would be ideal.
This is my current answer:
pop() {
local n=$(($1 - ${2:-1}))
if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
POP_EXPR='set -- "${#:1:'$n'}"'
elif [ $n -ge 500 ]; then
POP_EXPR="set -- $(seq -s " " 1 $n | sed 's/[0-9]\+/"${\0}"/g')"
else
local index=0
local arguments=""
while [ $index -lt $n ]; do
index=$((index+1))
arguments="$arguments \"\${$index}\""
done
POP_EXPR="set -- $arguments"
fi
}
Note that local is not POSIX, but since all major sh shells support it (and specifically the ones I asked for in my question) and not having it can cause serious bugs, I decided to include it in this leading function. But here's a fully compliant POSIX version with obfuscated arguments to reduce the chance of bugs:
pop() {
__pop_n=$(($1 - ${2:-1}))
if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
POP_EXPR='set -- "${#:1:'$__pop_n'}"'
elif [ $__pop_n -ge 500 ]; then
POP_EXPR="set -- $(seq -s " " 1 $__pop_n | sed 's/[0-9]\+/"${\0}"/g')"
else
__pop_index=0
__pop_arguments=""
while [ $__pop_index -lt $__pop_n ]; do
__pop_index=$((__pop_index+1))
__pop_arguments="$__pop_arguments \"\${$__pop_index}\""
done
POP_EXPR="set -- $__pop_arguments"
fi
}
Usage
pop1() {
pop $#
eval "$POP_EXPR"
echo "$#"
}
pop2() {
pop $# 2
eval "$POP_EXPR"
echo "$#"
}
pop1 a b c #=> a b
pop1 $(seq 1 1000) #=> 1 .. 999
pop2 $(seq 1 1000) #=> 1 .. 998
pop_next
Once you've created the POP_EXPR variable with pop, you can use the following
function to change it to omit further arguments:
pop_next() {
if [ -n "$BASH_VERSION" -o -n "$ZSH_VERSION" ]; then
local np="${POP_EXPR##*:}"
np="${np%\}*}"
POP_EXPR="${POP_EXPR%:*}:$((np == 0 ? 0 : np - 1))}\""
return
fi
POP_EXPR="${POP_EXPR% \"*}"
}
pop_next is a much simpler operation than pop in posix shells (though it's
slightly more complex than pop on zsh and bash)
It's used like this:
main() {
pop $#
pop_next
eval "$POP_EXPR"
}
main 1 2 3 #=> 1
POP_EXPR and variable scope
Note that if you're not going to be using eval "$POP_EXPR" immediately after
pop and pop_next, if you're not careful with scoping some function call
inbetween the operations could change the POP_EXPR variable and mess things
up. To avoid this, simply put local POP_EXPR at the start of every function
that uses pop, if it's available.
f() {
local POP_EXPR
pop $#
g 1 2
eval "$POP_EXPR"
printf '%s' "f=$*"
}
g() {
local POP_EXPR
pop $#
eval "$POP_EXPR"
printf '%s, ' "g=$*"
}
f a b c #=> g=1, f=a b
popgen.sh
This particular function is good enough for my purposes, but I did create a
script to generate further optimized functions.
https://gist.github.com/fcard/e26c5a1f7c8b0674c17c7554fb0cd35c#file-popgen-sh
One of the ways to improve performance without using external tools here is
to realize that having several small string concatenations is slow, so doing
them in batches makes the function considerably faster. calling the script
popgen.sh -gN1,N2,N3 creates a pop function that handles the operations
in batches of N1, N2, or N3 depending on the argument count. The script also
contains other tricks, exemplified and explained below:
$ sh popgen \
> -g 10,100 \ # concatenate strings in batches\
> -w \ # overwrite current file\
> -x9 \ # hardcode the result of the first 9 argument counts\
> -t1000 \ # starting at argument count 1000, use external tools\
> -p posix \ # prefix to add to the function name (with a underscore)\
> -s '' \ # suffix to add to the function name (with a underscore)\
> -c \ # use the command popsh instead of seq/sed as the external tool\
> -# \ # on zsh and bash, use the subarray method (checks on runtime)\
> -+ \ # use bash/zsh extensions (removes runtime check from -#)\
> -nl \ # don't use 'local'\
> -f \ # use 'function' syntax\
> -o pop.sh # output file
An equivalent to the above function can be generated with popgen.sh -t500 -g1 -#.
In the gist containing popgen.sh you will find a popsh.c file that can be
compiled and used as a specialized, faster alternative to the default shell
external tools, it will be used by any function generated with
popgen.sh -c ... if it's accessible as popsh by the shell.
Alternatively, you can create any function or tool named popsh and use
it in its place.
Benchmark
Benchmark functions:
The script I used for benchmarking can be found on this gist:
https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh
The benchmark functions are found in these lines:
https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh-L233-L301
The script can be used as such:
$ sh popbench.sh \
> -s dash \ # shell used by the benchmark, can be dash/bash/ash/zsh/ksh.\
> -f posix \ # function to be tested\
> -i 10000 \ # number of times that the function will be called per test\
> -a '\0' \ # replacement pattern to model arguments by index (uses sed)\
> -o /dev/stdout \ # where to print the results to (concatenates, defaults to stdout)\
> -n 5,10,1000 # argument sizes to test
It will output a time -p style sheet with a real, user and sys time values,
as well as an int value, for internal, that is calculated inside the benchmark
process using date.
Times
The following are the int results of calls to
$ sh popbench.sh -s $shell -f $function -i 10000 -n 1,5,10,100,1000,10000
posix refers to the second and third clauses, subarray refers to the first,
while final refers to the whole.
value count 1 5 10 100 1000 10000
---------------------------------------------------------------------------------------
dash/final 0m0.109s 0m0.183s 0m0.275s 0m2.270s 0m16.122s 1m10.239s
ash/final 0m0.104s 0m0.175s 0m0.273s 0m2.337s 0m15.428s 1m11.673s
ksh/final 0m0.409s 0m0.557s 0m0.737s 0m3.558s 0m19.200s 1m40.264s
bash/final 0m0.343s 0m0.414s 0m0.470s 0m1.719s 0m17.508s 3m12.496s
---------------------------------------------------------------------------------------
bash/subarray 0m0.135s 0m0.179s 0m0.224s 0m1.357s 0m18.911s 3m18.007s
dash/posix 0m0.171s 0m0.290s 0m0.447s 0m3.610s 0m17.376s 1m8.852s
ash/posix 0m0.109s 0m0.192s 0m0.285s 0m2.457s 0m14.942s 1m10.062s
ksh/posix 0m0.416s 0m0.581s 0m0.768s 0m4.677s 0m18.790s 1m40.407s
bash/posix 0m0.409s 0m0.739s 0m1.145s 0m10.048s 0m58.449s 40m33.024s
On zsh
For large argument counts setting set -- ... with eval is very slow on zsh no
matter no matter the method, save for eval 'set -- "${#:1:$# - 1}"'. Even as
simple a modification as changing it to eval "set -- ${#:1:$# - 1}"
(ignoring that it doesn't work for arguments with spaces) makes it two orders
of magnitude slower.
value count 1 5 10 100 1000 10000
---------------------------------------------------------------------------------------
zsh/subarray 0m0.203s 0m0.227s 0m0.233s 0m0.461s 0m3.643s 0m38.396s
zsh/final 0m0.399s 0m0.416s 0m0.441s 0m0.722s 0m4.205s 0m37.217s
zsh/posix 0m0.718s 0m0.913s 0m1.182s 0m6.200s 0m46.516s 42m27.224s
zsh/eval-zsh 0m0.419s 0m0.353s 0m0.375s 0m0.853s 0m5.771s 32m59.576s
More benchmarks
For more benchmarks, including only using external tools, the c popsh tool or the naive algorithm, see this file:
https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-benchmarks-md
It's generated like this:
$ git clone https://gist.github.com/f4aec7e567da2a8e97962d5d3f025ad4.git popbench
$ cd popbench
$ sh popgen_run.sh
$ sh popbench_run.sh --fast # or without --fast if you have a day to spare
$ sh poptable.sh -g >benchmarks.md
Conclusion
This has been the result of a week-long research on the subject, and I thought
I'd share it. Hopefully it's not too long, I tried to trim it to the main
information with links to the gist. This was initially made as an answer to
(Remove last argument from argument list of shell script (bash)) but I felt the focus on POSIX
made it off topic.
All the code in the gists linked here is licensed under the MIT license.
alias pop='set -- $(eval printf '\''%s\\n'\'' $(seq $(expr $# - 1) | sed '\''s/^/\$/;H;$!d;x;s/\n/ /g'\'') )'
EDIT:
this is a POSIX shell solution that use aliases instead of functions; if called in a function, this gives the desired effect (it resets the function arguments by using the same number of arguments minus the last; being an alias, and with eval, it can change the values of the enclosing function):
func () {
pop
pop
echo "$#"
}
func a b c d e # prints a b c
pop () {
i=0
while [ $((i+=1)) -lt $# ]; do
set -- "$#" "$1"
shift
done # 1 2 3 -> 3 1 2
printf '%s' "$1" # last argument
shift # $# is now without last argument
}

How to calculate crc32 checksum from a string on linux bash

I used crc32 to calculate checksums from strings a long time ago, but I cannot remember how I did it.
echo -n "LongString" | crc32 # no output
I found a solution [1] to calculate them with Python, but is there not a direct way to calculate that from a string?
# signed
python -c 'import binascii; print binascii.crc32("LongString")'
python -c 'import zlib; print zlib.crc32("LongString")'
# unsigned
python -c 'import binascii; print binascii.crc32("LongString") % (1<<32)'
python -c 'import zlib; print zlib.crc32("LongString") % (1<<32)'
[1] How to calculate CRC32 with Python to match online results?
I came up against this problem myself and I didn't want to go to the "hassle" of installing crc32. I came up with this, and although it's a little nasty it should work on most platforms, or most modern linux anyway ...
echo -n "LongString" | gzip -1 -c | tail -c8 | hexdump -n4 -e '"%u"'
Just to provide some technical details, gzip uses crc32 in the last 8 bytes and the -c option causes it to output to standard output and tail strips out the last 8 bytes. (-1 as suggested by #MarkAdler so we don't waste time actually doing the compression).
hexdump was a little trickier and I had to futz about with it for a while before I came up with something satisfactory, but the format here seems to correctly parse the gzip crc32 as a single 32-bit number:
-n4 takes only the relevant first 4 bytes of the gzip footer.
'"%u"' is your standard fprintf format string that formats the bytes as a single unsigned 32-bit integer. Notice that there are double quotes nested within single quotes here.
If you want a hexadecimal checksum you can change the format string to '"%08x"' (or '"%08X"' for upper case hex) which will format the checksum as 8 character (0 padded) hexadecimal.
Like I say, not the most elegant solution, and perhaps not an approach you'd want to use in a performance-sensitive scenario but an approach that might appeal given the near universality of the commands used.
The weak point here for cross-platform usability is probably the hexdump configuration, since I have seen variations on it from platform to platform and it's a bit fiddly. I'd suggest if you're using this you should try some test values and compare with the results of an online tool.
EDIT As suggested by #PedroGimeno in the comments, you can pipe the output into od instead of hexdump for identical results without the fiddly options. ... | od -t x4 -N 4 -A n for hex ... | od -t d4 -N 4 -A n for decimal.
Or just use the process substitution:
crc32 <(echo "LongString")
Your question already has most of the answer.
echo -n 123456789 | python -c 'import sys;import zlib;print(zlib.crc32(sys.stdin.read())%(1<<32))'
correctly gives 3421780262
I prefer hex:
echo -n 123456789 | python -c 'import sys;import zlib;print("%08x"%(zlib.crc32(sys.stdin.read())%(1<<32)))'
cbf43926
Be aware that there are several CRC-32 algorithms:
http://reveng.sourceforge.net/crc-catalogue/all.htm#crc.cat-bits.32
On Ubuntu, at least, /usr/bin/crc32 is a short Perl script, and you can see quite clearly from its source that all it can do is open files. It has no facility to read from stdin -- it doesn't have special handling for - as a filename, or a -c parameter or anything like that.
So your easiest approach is to live with it, and make a temporary file.
tmpfile=$(mktemp)
echo -n "LongString" > "$tmpfile"
crc32 "$tmpfile"
rm -f "$tmpfile"
If you really don't want to write a file (e.g. it's more data than your filesystem can take -- unlikely if it's really a "long string", but for the sake for argument...) you could use a named pipe. To a simple non-random-access reader this is indistinguishable from a file:
fifo=$(mktemp -u)
mkfifo "$fifo"
echo -n "LongString" > "$fifo" &
crc32 "$fifo"
rm -f "$fifo"
Note the & to background the process which writes to fifo, because it will block until the next command reads it.
To be more fastidious about temporary file creation, see: https://unix.stackexchange.com/questions/181937/how-create-a-temporary-file-in-shell-script
Alternatively, use what's in the script as an example from which to write your own Perl one-liner (the presence of crc32 on your system indicates that Perl and the necessary module are installed), or use the Python one-liner you've already found.
Here is a pure Bash implementation:
#!/usr/bin/env bash
declare -i -a CRC32_LOOKUP_TABLE
__generate_crc_lookup_table() {
local -i -r LSB_CRC32_POLY=0xEDB88320 # The CRC32 polynomal LSB order
local -i index byte lsb
for index in {0..255}; do
((byte = 255 - index))
for _ in {0..7}; do # 8-bit lsb shift
((lsb = byte & 0x01, byte = ((byte >> 1) & 0x7FFFFFFF) ^ (lsb == 0 ? LSB_CRC32_POLY : 0)))
done
((CRC32_LOOKUP_TABLE[index] = byte))
done
}
__generate_crc_lookup_table
typeset -r CRC32_LOOKUP_TABLE
crc32_string() {
[[ ${#} -eq 1 ]] || return
local -i i byte crc=0xFFFFFFFF index
for ((i = 0; i < ${#1}; i++)); do
byte=$(printf '%d' "'${1:i:1}") # Get byte value of character at i
((index = (crc ^ byte) & 0xFF, crc = (CRC32_LOOKUP_TABLE[index] ^ (crc >> 8)) & 0xFFFFFFFF))
done
echo $((crc ^ 0xFFFFFFFF))
}
printf 'The CRC32 of: %s\nis: %08x\n' "${1}" "$(crc32_string "${1}")"
# crc32_string "The quick brown fox jumps over the lazy dog"
# yields 414fa339
Testing:
bash ./crc32.sh "The quick brown fox jumps over the lazy dog"
The CRC32 of: The quick brown fox jumps over the lazy dog
is: 414fa339
I use cksum and convert to hex using the shell builtin printf:
$ echo -n "LongString" | cksum | cut -d\ -f1 | xargs echo printf '%0X\\n' | sh
5751BDB2
The cksum command first appeared on 4.4BSD UNIX and should be present in all modern systems.
You can try to use rhash.
http://rhash.sourceforge.net/
https://github.com/rhash/RHash
http://manpages.ubuntu.com/manpages/bionic/man1/rhash.1.html
Testing:
## install 'rhash'...
$ sudo apt-get install rhash
## test CRC32...
$ echo -n 123456789 | rhash --simple -
cbf43926 (stdin)

how to determine object code size on Linux when "size" gives the wrong answer?

I want to know precisely how much object code is generated by GCC for each of a collection of compilation units, but I'm having an odd problem where the "size" command from binutils is not giving the correct result.
Let's take a C file containing only this function:
int foo (int a, int b)
{
return a+b;
}
We can compile it and check the object code size using both "size" and "objdump":
$ gcc -O foo.c -c
$ size foo.o
text data bss dec hex filename
52 0 0 52 34 foo.o
$ objdump -d foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 retq
From the objdump output, it is clear that the object code size is 4 bytes. However, size reports 52 bytes, which is incorrect.
From using the "-D" option to objdump, it looks like the exception handling code and maybe some other stuff is getting measured by "size" and added to the size of the code that I actually care about. Does anyone know a relatively straightforward way to get size to ignore these extras?
Do you have to stick with size? It has many issues similar to what you ran into so I usually use this readelf snippet:
OBJ=foo.o
SEC=.text
readelf -SW "$OBJ" \
| sed 's/^ *\[[0-9 ]*\] *//' \
| awk '
/NOBITS/ { next; }
/^'$SEC'\>/ { sz = strtonum("0x" $5); s += sz; }
END { print s }'

Building sse switches for GCC from /proc/cpuinfo

I've got a Makefile that's I'd like to parse the flags in /proc/cpuinfo and build up a list of available sse instruction sets to pass to gcc (-msse -msse2, etc). This is the best I've come up with so far, which Make isn't happy with at all:
DUMM = $(foreach tag,$(SSE_TAGS),
ifneq ($(shell cat /proc/cpuinfo | grep $(tag) | wc -l),"")
OPT_FLAG += -m$(tag)
endif)
So I thought I'd see here if anyone had any ideas.
For anyone that comes after me, this does what I want:
SSE_TAGS = $(shell /bin/grep -m 1 flags /proc/cpuinfo | /bin/grep -o \
'sse\|sse2\|sse3\|ssse3\|sse4a\|sse4.1\|sse4.2\|sse5')
NUM_PROC = $(shell cat /proc/cpuinfo | grep processor | wc -l)
ifneq (${SSE_TAGS},)
CCOPTS += -mfpmath=sse
CCOPTS += $(foreach tag,$(SSE_TAGS),-m$(tag))
endif

Resources