I am confused about some 0a (i.e. NL ASCII byte) happening in some bash commands. On the following:
$ echo | sha1sum $1 | awk '{print $1;}' | xxd -r -ps > test.bin
$ echo | sha1sum $1 | awk '{print $1;}' > test.hex
$ xxd test.bin
00000000: adc8 3b19 e793 491b 1c6e a0fd 8b46 cd9f ..;...I..n...F..
00000010: 32e5 92fc 2...
$ xxd test.hex
00000000: 6164 6338 3362 3139 6537 3933 3439 3162 adc83b19e793491b
00000010: 3163 3665 6130 6664 3862 3436 6364 3966 1c6ea0fd8b46cd9f
00000020: 3332 6535 3932 6663 0a 32e592fc.
what is responsible for the 0a byte to be present in test.hex but not in test.bin?
Note 1: this is a question that I have been asking myself following the solution used there:
Dump a ```sha``` checksum output to disk in binary format instead of plaintext hex in bash
Note 2: I am able to suppress the 0a byte, this is not the question, I am just curious of why it is present in one case but not the other:
$ echo | sha1sum $1 | awk '{print $1;}' | head -c-1 > test_2.hex
$ xxd test_2.hex
00000000: 6164 6338 3362 3139 6537 3933 3439 3162 adc83b19e793491b
00000010: 3163 3665 6130 6664 3862 3436 6364 3966 1c6ea0fd8b46cd9f
00000020: 3332 6535 3932 6663 32e592fc
The 0a that you are seeing is coming from awk. By default the output record separator for awk is \n and you can remote it by setting the ORS (e.g. with a BEGIN {ORS=""}).
You lose it when you pipe through xxd -r -ps due to the -r parameter. From the man page: "Additional Whitespace and line-breaks are allowed anywhere."
Related
I'm trying to do a hex search for a pattern.
I have a file and I search for a pattern on the file with...
xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"
This returns the lines that contain that pattern:
FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001
But I want it to return the 4 digits before that pattern which is 08B9 in this case. How could I do it?
With GNU grep and a Perl-compatible regular expression:
xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'
Output:
08B9
Don't use grep, use sed, e.g. using any sed:
$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9
A not very elegant but intuitively simple approach might be to pipe your grep result into sed and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk and printing the last field (steps shown on separate lines for presentation, join them):
xxd -g 2 -c 32 -u file |
grep "0045 5804 0001 0000" |
sed 's/0045 5804 0001 0000.*//' |
awk '{print $NF}'
nawk 'sub(".* ",_, $!--NF)^_' OFS= FS=' 0045 5804 0001 0000.*$'
mawk '$!NF = $--NF' FS=' 0045 5804 0001 0000.*$| '
gawk ' $_ = $--NF' FS=' 0045 5804 0001 0000.*$| '
08B9
I would harness GNU AWK for this task following way, let file.txt content be
FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001
then
awk 'match($0, /[[:xdigit:]]{4} 0045 5804 0001 0000/){print substr($0,RSTART,4)}' file.txt
gives output
08B9
Explanation: I use two String Functions, match to check if current line ($0) and set RSTART variable, then substr to get 4 first characters of match. [[:xdigit:]] denotes base-16 digit, {4} number of repeats.
(tested in gawk 4.2.1)
My xxd prints an 8-digit address, a :, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:
$ xxd -g 2 -c 32 -u file
1 2 3 4 5 6 7 8 9 10 11 12 13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61 dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466 2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A lkasjdf.
NOTE: the 1st two lines (a ruler) added to show column numbering
OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).
From here we need to address 4x different scenarios:
match could occur at the very beginning of the xxd output in which
case there is no preceeding 4-digit hex code
match occurs at the beginning of the line so we're interested in the
4-digit hex code at the end of the previous line
match occurs in the middle of the line in which case we're
interested in the 4-digit hex code just prior to the match
match spans two lines in which case we're interested in the 4-digit
hex code just prior to the match on the 1st line
A contrived set of xxd output to demonstrate all 4x scenarios:
$ cat xxd.out
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
# ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804 dflkasjdf.B 0.868 asdlfkjasdflka
# ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C jsdf;lkasjdflkasjdf.C 3.467 asdl
# ^^^^^^^^^
NOTE: comments added to highlight our matches
One idea using awk:
x='0045 5804 0001 0000'
cat xxd.out | # simulate feeding xxd output to awk
awk -v x="${x}" '
function parse_string() {
while ( length(string) > (2 * lenx) ) {
pos= index(string,x)
if (pos) {
if (pos==1) output= "NA (at front of file)"
else output= substr(string,pos - 5,4)
cnt++
printf "Match #%s: %s\n", cnt, output
string= substr(string,pos + lenx)
}
else {
string= substr(string,length(string) - (2 * lenx))
break
}
}
}
BEGIN { lenx = length(x) }
{ string=string substr($0,11,80) # strip off address & raw data, append 4-digit hex codes into one long string
if ( length(string) > (1000 * lenx) )
parse_string()
}
END { parse_string() }
'
NOTE: the parse_string() function and the assorted if (length(string) > ...) tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)
This generates:
Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364
Just make a lookahead and print only the matched string
$ xxd -g 2 -c 32 -u file | grep -Po "[0-9A-F]{4} (?=0045 5804 0001 0000)"
$ xxd -g 2 -c 32 -u file | perl -lne 'print for /([0-9A-F]{4}) (?=0045 5804 0001 0000)/'
But searching the hex representation like that is just silly because:
It won't work when the pattern 0045 5804 0001 0000 is at the beginning of the line (i.e. the output is on the previous line)
It'll be much slower than searching directly in binary
So just search directly with grep then decode like this
grep -Pao "..\x00\x45\x58\x04\x00\x01\x00\x00" file | xxd -p -u -l 2
It matches 2 bytes followed by your byte pattern, then print the first 2 bytes as hex
grep -ao $'..\x12\x34<remaining bytes of hex pattern>' file | xxd -p -u -l 2 also works but not in every case due to the handling of null bytes
If the pattern contains LF \n then you'll also need the -z option
grep -Pzao "..<hex pattern>" file | xxd -p -u -l 2
grep -zao $'..<hex pattern>' file | xxd -p -u -l 2
See also
Using grep to search for hex strings in a file
How can I grep a hex value in a string in a binary file?
I'm trying to find 7zip version 3 file headers in a file. According to the documentation they should look like this:
00: 6 bytes: 37 7A BC AF 27 1C - Signature
06: 2 bytes: 00 04 - Format version
So I constructed this grep command which should match them:
grep --only-matching --byte-offset --binary --text $'7z\xBC\xAF\x27\x1C\x00\x03'
Yet it also matches the string ending in 0000:
% xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" | grep --only-matching --byte-offset --binary --text $'7z\xBC\xAF\x27\x1C\x00\x03'
2:7z'
13:7z'
The output I expect to have is just 13:7z'
It's not possible to pass zero byte as part of an argument. Because a string ends with zero byte in C, so grep when running strlen(argv[...]) will not "see" anything after zero byte.
If there are no newlines in regex, you could use --file=.
xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" |
LC_ALL=C grep --only-matching --byte-offset --binary --text -f <(
echo -n 7z;
echo BCAF271C0003 | xxd -r -p
)
see https://www.gnu.org/software/grep/manual/grep.html#Matching-Non_002dASCII
Alternatively use PERL regex
xxd -p -r <<< "aaaa 377a bcaf 271c 0000 bbbb 00 377a bcaf 271c 0003" |
LC_ALL=C grep --only-matching --byte-offset --binary --text -P '7z\xBC\xAF\x27\x1C\x00\x03'
When dealing with binary, remember to disable UTF-8 sequences handling with locale setting LC_ALL=C.
Note: <<<"" and $'string' are not available in any shell - they are available in bash.
I came across a shell script that is fast at eliminating duplicates without sorting the lines. During the later investigation I also found that a similar method is suggested with awk and perl.
However, I noticed that these methods work a bit differently than the usual sort -u and sort | uniq.
$ dd if=/dev/urandom of=sortable bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.10448 s, 1.0 GB/s
$ wc -l sortable
409650 sortable
$ cat sortable | sort -u | wc -l
404983
$ cat sortable | sort | uniq | wc -l
404983
$ cat sortable | ./sortuniq | wc -l
406650
$ cat sortable | awk '!x[$0]++' | wc -l
406651
$ cat sortable | perl -ne 'print if ! $x{$_}++' | wc -l
406650
Why the differences? I tried setting up a small test file with empty lines, lines of 0, lines padded with whitespace. And I found all the methods to do the same.
I was able to use cmp find that awk line count is greater simply because it added a newline at the end. But I was not able to get my head around the other cases. I sorted the array-uniqued file and found that in my case the first difference was in the line 12. I printed some lines from both files (awk '!x[$0]++;' file | sort and sort -u file) and the line seems to have shifted with lines 12, 13 and 14 inserted between the sort's 11 and 12.:
$ sed -n '11p' sorted | hexdump
0000000 9300 000a
0000003
$ sed -n '12p' sorted | hexdump
0000000 b700 ef00 d062 d0b4 6478 1de1 a9e8 c6fd
0000010 4e05 e385 e678 7cbb 5f46 ce85 3d26 1875
0000020 56e4 baf1 b34a 0006 1dda 06cc efd6 9627
0000030 edbe 3bf7 a2c7 8b3f 1fe0 790e 9b1b 237e
0000040 42ac 3f5b 827b 535d 2e59 4001 3ce1 bd7d
0000050 7712 21c9 e025 751d c591 84ce b809 4038
0000060 372a 07d4 220f 59cc 3e2f 7ac3 88bb 23b1
0000070 fe37 1a36 31f8 fde6 7368 bd89 631b f3a9
0000080 8467 b413 9a28 000a
0000087
$ sed -n '13p' sorted | hexdump
0000000 f800 cb00 f473 583d e2c5 2a8c 7c81 cbcd
0000010 3af1 9cf7 4992 2aab 90ed b018 5f4f b03b
0000020 40f1 8731 17fa d74a ba7e db12 6f8d 5a37
0000030 dd97 837e 4eb2 05d4 7d28 722e 8e49 7ffa
0000040 176d c54b a0a0 a63a 26a2 db5e 4ea8 5f44
0000050 33fe 26a7 40bb 98b0 6660 62bd b56a 949e
0000060 eaa7 9dd1 9427 5fab 7840 f509 4fbf 06ea
0000070 d389 15c8 fbf0 3ea6 4a53 909f 1c75 2acb
0000080 7074 d41e 40f2 14b7 b8aa 04e2 00bf 7b6e
0000090 ff3f 4822 c3e6 b3e9 1708 6a93 55fd a5f6
00000a0 ad3b 9b7d 7c2e faa1 4d25 2f32 c434 4a8c
00000b0 a42e 6d8c 138f 030b accd 086b baa2 6f92
00000c0 6256 e959 b19a c371 f7bf 7c63 773c 9e4d
00000d0 bb2b f555 bc05 9454 29a6 f221 e088 c259
00000e0 9bed ab59 0591 2d30 9162 1dd1 91ea c928
00000f0 cb8f 60bc 6f25 62b2 a424 2f97 0058 0d3e
0000100 95f2 7cf4 d53b 6208 6cba c013 3505 9704
0000110 5a1f f63f 9bea 7d45 2dd6 8084 d078 d8b1
0000120 5fdc fb57 8cf8 6ae8 b791 23bd f2f5 70eb
0000130 9094 407a 228d 5818 a0fa d480 53f7 eb8e
0000140 f07b b288 e39b 60c7 a581 8481 97da 68d9
0000150 7240 2fb1 6ec6 fc57 78cd 4988 90a2 52d3
0000160 2fb6 3efd c140 d890 c2ff 2c0c ad02 47db
0000170 106e da82 dd0f 3f7f 49c1 2d2c dc0f 4a1e
0000180 01d3 95de 000a
0000185
$ sed -n '14p' sorted | hexdump
0000000 c400 0ac8
0000004
$ sed -n '11p' awksorted | hexdump
0000000 9300 000a
0000003
$ sed -n '12p' awksorted | hexdump
0000000 a100 000a
0000003
$ sed -n '13p' awksorted | hexdump
0000000 ff00 000a
0000003
$ sed -n '14p' awksorted | hexdump
0000000 d200 000a
0000003
$ sed -n '15p' awksorted | hexdump
0000000 b700 ef00 d062 d0b4 6478 1de1 a9e8 c6fd
0000010 4e05 e385 e678 7cbb 5f46 ce85 3d26 1875
0000020 56e4 baf1 b34a 0006 1dda 06cc efd6 9627
0000030 edbe 3bf7 a2c7 8b3f 1fe0 790e 9b1b 237e
0000040 42ac 3f5b 827b 535d 2e59 4001 3ce1 bd7d
0000050 7712 21c9 e025 751d c591 84ce b809 4038
0000060 372a 07d4 220f 59cc 3e2f 7ac3 88bb 23b1
0000070 fe37 1a36 31f8 fde6 7368 bd89 631b f3a9
0000080 8467 b413 9a28 000a
0000087
$ sed -n '16p' awksorted | hexdump
0000000 f800 cb00 f473 583d e2c5 2a8c 7c81 cbcd
0000010 3af1 9cf7 4992 2aab 90ed b018 5f4f b03b
0000020 40f1 8731 17fa d74a ba7e db12 6f8d 5a37
0000030 dd97 837e 4eb2 05d4 7d28 722e 8e49 7ffa
0000040 176d c54b a0a0 a63a 26a2 db5e 4ea8 5f44
0000050 33fe 26a7 40bb 98b0 6660 62bd b56a 949e
0000060 eaa7 9dd1 9427 5fab 7840 f509 4fbf 06ea
0000070 d389 15c8 fbf0 3ea6 4a53 909f 1c75 2acb
0000080 7074 d41e 40f2 14b7 b8aa 04e2 00bf 7b6e
0000090 ff3f 4822 c3e6 b3e9 1708 6a93 55fd a5f6
00000a0 ad3b 9b7d 7c2e faa1 4d25 2f32 c434 4a8c
00000b0 a42e 6d8c 138f 030b accd 086b baa2 6f92
00000c0 6256 e959 b19a c371 f7bf 7c63 773c 9e4d
00000d0 bb2b f555 bc05 9454 29a6 f221 e088 c259
00000e0 9bed ab59 0591 2d30 9162 1dd1 91ea c928
00000f0 cb8f 60bc 6f25 62b2 a424 2f97 0058 0d3e
0000100 95f2 7cf4 d53b 6208 6cba c013 3505 9704
0000110 5a1f f63f 9bea 7d45 2dd6 8084 d078 d8b1
0000120 5fdc fb57 8cf8 6ae8 b791 23bd f2f5 70eb
0000130 9094 407a 228d 5818 a0fa d480 53f7 eb8e
0000140 f07b b288 e39b 60c7 a581 8481 97da 68d9
0000150 7240 2fb1 6ec6 fc57 78cd 4988 90a2 52d3
0000160 2fb6 3efd c140 d890 c2ff 2c0c ad02 47db
0000170 106e da82 dd0f 3f7f 49c1 2d2c dc0f 4a1e
0000180 01d3 95de 000a
0000185
$ sed -n '17p' awksorted | hexdump
0000000 c400 0ac8
0000004
sortuniq
Here is the sortuniq code. I found it in this shell script collection (that's why I refer to it as a "shell script").
#!/usr/bin/php
<?php
$in = fopen('php://stdin',"r");
$d = array();
while($z = fgets($in))
#$d[$z]++;
if($argc > 1 and ($argv[1] == 'c' or $argv[1] == '-c'))
foreach($d as $a => $b)
echo ("$b $a");
else
foreach($d as $a => $b)
echo ("$a");
Just be careful, this is dangerously fast. I was planning to ask about the speed itself before I found this issue during performance tests.
The uniq of coreutils does not actually check if the lines are unique but if the lines have different sort order in the current locale.
We can check that with a collation that's not "hard" we get the same results as with awk. This effectively disables collations and just compares the bytes:
$ ( LC_COLLATE=POSIX sort -u sortable | wc -l)
406651
$ ( LC_COLLATE=C sort -u sortable | wc -l)
406651
Example
Knowing the reasons, it's simple to reproduce this behaviour with a valid text file. Take japanese, arabic or whatever characters and use a locale where these characters have no defined sort order.
$ echo 'あ' > utf8file
$ echo 'い' >> utf8file
$ file utf8file
utf8file: UTF-8 Unicode text
$ sort -u utf8file
あ
$ (LC_COLLATE=en_US.UTF-8 sort -u utf8file)
あ
$ (LC_COLLATE=C sort -u utf8file)
あ
い
$ (LC_COLLATE=POSIX sort -u utf8file)
あ
い
$ (LC_COLLATE=C.UTF-8 sort -u utf8file)
あ
い
The code
We can trace this starting with the different function in util. It uses xmemcoll if it determines that the locale is hard - not C or POSIX. xmemcoll seems to be a memcoll wrapper that adds error reporting. The memcoll source explains that bytewise-different strings will be rechecked for equality using strcoll:
/* strcoll is slow on many platforms, so check for the common case
where the arguments are bytewise equal. Otherwise, walk through
the buffers using strcoll on each substring. */
if (s1len == s2len && memcmp (s1, s2, s1len) == 0)
{
errno = 0;
diff = 0;
}
else
{
//
}
Interestingly, a \0 byte is not a problem for memcoll. While C strcoll will stop at \0, memcoll function is dropping the bytes from the start of string one by one to work around this - see lines 39 to 55.
I am having some inconsistencies when using hexdump and xxd. When I run the following command:
echo -n "a42d9dfe8f93515d0d5f608a576044ce4c61e61e" \
| sed 's/\(..\)/\1\n/g' \
| awk '/^[a-fA-F0-9]{2}$/ { printf("%c",strtonum("0x" $0)); }' \
| xxd
it returns the following results:
00000000: c2a4 2dc2 9dc3 bec2 8fc2 9351 5d0d 5f60 ..-........Q]._`
00000010: c28a 5760 44c3 8e4c 61c3 a61e ..W`D..La...
Note the "c2" characters. This also happens with I run xxd -p
When I run the same command except with hexdump -C:
echo -n "a42d9dfe8f93515d0d5f608a576044ce4c61e61e" \
| sed 's/\(..\)/\1\n/g' \
| awk '/^[a-fA-F0-9]{2}$/ { printf("%c",strtonum("0x" $0)); }' \
| hexdump -C
I get the same results (as far as including the "c2" character):
00000000 c2 a4 2d c2 9d c3 be c2 8f c2 93 51 5d 0d 5f 60 |..-........Q]._`|
00000010 c2 8a 57 60 44 c3 8e 4c 61 c3 a6 1e |..W`D..La...|
However, when I run hexdump with no arguments:
echo -n "a42d9dfe8f93515d0d5f608a576044ce4c61e61e" \
| sed 's/\(..\)/\1\n/g' \
| awk '/^[a-fA-F0-9]{2}$/ { printf("%c",strtonum("0x" $0)); }' \
| hexdump
I get the following [correct] results:
0000000 a4c2 c22d c39d c2be c28f 5193 0d5d 605f
0000010 8ac2 6057 c344 4c8e c361 1ea6
For the purpose of this script, I'd rather use xxd as opposed to hexdump. Thoughts?
The problem that you observe is due to UTF-8 encoding and little-endiannes.
First, note that when you try to print any Unicode character in AWK, like 0xA4 (CURRENCY SIGN), it actually produces two bytes of output, like the two bytes 0xC2 0xA4 that you see in your output:
$ echo 1 | awk 'BEGIN { printf("%c", 0xA4) }' | hexdump -C
Output:
00000000 c2 a4 |..|
00000002
This holds for any character bigger than 0x7F and it is due to UTF-8 encoding, which is probably the one set in your locale. (Note: some AWK implementations will have different behavior for the above code.)
Secondly, when you use hexdump without argument -C, it displays each pair of bytes in swapped order due to little-endianness of your machine. This is because each pair of bytes is then treated as a single 16-bit word, instead of treating each byte separately, as done by xxd and hexdump -C commands. So the xxd output that you get is actually the correct byte-for-byte representation of input.
Thirdly, if you want to produce the precise byte string that is encoded in the hexadecimal string that you are feeding to sed, you can use this Python solution:
echo -n "a42d9dfe8f93515d0d5f608a576044ce4c61e61e" | sed 's/\(..\)/0x\1,/g' | python3 -c "import sys;[open('tmp','wb').write(bytearray(eval('[' + line + ']'))) for line in sys.stdin]" && cat tmp | xxd
Output:
00000000: a42d 9dfe 8f93 515d 0d5f 608a 5760 44ce .-....Q]._`.W`D.
00000010: 4c61 e61e La..
Why not use xxd with -r and -p?
echo a42d9dfe8f93515d0d5f608a576044ce4c61e61e | xxd -r -p | xxd
output
0000000: a42d 9dfe 8f93 515d 0d5f 608a 5760 44ce .-....Q]._`.W`D.
0000010: 4c61 e61e La..
Hopefully fairly straightforward, to explain the use case when I run the following command (OS X 10.6):
$ pwd | pbcopy
the pasteboard contains a newline character at the end. I'd like to get rid of it.
pwd | tr -d '\n' | pbcopy
printf $(pwd) | pbcopy
or
echo -n $(pwd) | pbcopy
Note that these should really be quoted in case there are whitespace characters in the directory name. For example:
echo -n "$(pwd)" | pbcopy
I wrote a utility called noeol to solve this problem. It pipes stdin to stdout, but leaves out the trailing newline if there is one. E.g.
pwd | noeol | pbcopy
…I aliased copy to noeol | pbcopy.
Check it out here: https://github.com/Sidnicious/noeol
For me I was having issues with the tr -d '\n' approach. On OSX I happened to have the coreutils package installed via brew install coreutils. This provides all the "normal" GNU utilities prefixed with a g in front of their typical names. So head would be ghead for example.
Using this worked more safely IMO:
pwd | ghead -c -1 | pbcopy
You can use od to see what's happening with the output:
$ pwd | ghead -c -1 | /usr/bin/od -h
0000000 552f 6573 7372 732f 696d 676e 6c6f 6c65
0000020 696c
0000022
vs.
$ pwd | /usr/bin/od -h
0000000 552f 6573 7372 732f 696d 676e 6c6f 6c65
0000020 696c 000a
0000023
The difference?
The 00 and 0a are the hexcodes for a nul and newline. The ghead -c -1 merely "chomps" the last character from the output before handing it off to | pbcopy.
$ man ascii | grep -E '\b00\b|\b0a\b'
00 nul 01 soh 02 stx 03 etx 04 eot 05 enq 06 ack 07 bel
08 bs 09 ht 0a nl 0b vt 0c np 0d cr 0e so 0f si
We can first delete the trailing newline if any, and then give it to pbcopy as follows:
your_command | perl -0 -pe 's/\n\Z//' | pbcopy
We can also create an alias of this:
alias pbc="perl -0 -pe 's/\n\Z//' | pbcopy"
Then the command would become:
pwd | pbc