JQ explode function is returning incorrect chars - utf-8

I am trying to decode base64 encoded binary content in JQ using explode function.
When I run explode and then through implode, I am expecting it to return the same string. But it is not. Try it here: https://jqplay.org/s/Rt8H1qv8VRP
Base64 encoded string: "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/"
JQ: '#base64d | explode | implode | #base64'
Output: "AQEAAAABAQAyGWRkZBXvv71bBwAAAAABAgMEBQYHCAkKUgbvv70/Ugbvv70/"
Debugging further,
#base64d | explode | .[14]
returns
65533
Running the following on Ubuntu, you can see the [14] char is 315 (octal) == 215(decimal)
$ echo "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/" | base64 -d | od -bc
0000000 001 001 000 000 000 001 001 000 062 031 144 144 144 025 315 133
001 001 \0 \0 \0 001 001 \0 2 031 d d d 025 315 [
0000020 007 000 000 000 000 001 002 003 004 005 006 007 010 011 012 122
\a \0 \0 \0 \0 001 002 003 004 005 006 \a \b \t \n R
0000040 006 236 077 122 006 236 077
006 236 ? R 006 236 ?
0000047
Why is JQ returning this weird 65533 (0xFFFD) character? What am I missing?

First of all, the issue has nothing to do with explode or implode. Using just #base64d | #base64 produces the same result.
jq expects the string encoded with base64 to be text encoded with UTF-8.
If the decoded string is not UTF-8, the results are undefined.
Your input is not UTF-8.
U+FFFD REPLACEMENT CHARACTER is a character used to mark input errors.

Related

reverse engineer binary encoded data

Is there a general approach? It doesn't appear to be encrypted, and I know the file must contain numeric tabular data of some kind:
$ od -tc filename.hobo | head
0000000 H O B O 210 \r 004 \0 \0 001 d 210 035 004 \0 \0
0000020 001 c 210 " 035 001 \0 \0 001 035 q - \0 $ 070
0000040 8 E 001 d 377 377 235 220 \0 \0 001 \0 \0 \v \f
0000060 030 002 210 5 032 001 003 003 001 \0 \a \0 \0 \0 \0 \a
0000100 \0 \0 \0 \0 \0 \0 \0 005 004 \0 \0 \0 \0 \0 \0 210
0000120 c 001 \0 210 033 002 \a ҈ ** 034 002 \0 001 210 001 002
0000140 017 033 210 002 002 001 035 210 003 002 001 \n 210 004 032 O
0000160 n s e t C o m p u t e r C o
0000200 r p o r a t i o n 210 005 024 H O B O
0000220 U 2 3 - 0 0 1 T e m p / R H

unexpected result: grep from a changing line

I wrote a bash command to test grep from a changing line:
for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5
The result shows:
9
Update
The real problem is as follows:
mplayer shows and refreshes a single-line playing progress when playing a media file. A sample result is:
A: 17.2 (17.2) of 213.0 (03:33.0) 0.5%
And I'm trying to grep this playing progress and ingore other lines. I used this command:
mplayer xxx.mp3 | grep ^A:
The result does not contain the line expected.
Update 2
mplayer xxx.mp3 | od -xda
shows:
0002140 4a5b 410d 203a 2020 2e31 2033 3028 2e31
[ J \r A : 1 . 3 ( 0 1 .
133 112 015 101 072 040 040 040 061 056 063 040 050 060 061 056
0002160 2932 6f20 2066 3132 2e33 2030 3028 3a33
2 ) o f 2 1 3 . 0 ( 0 3 :
062 051 040 157 146 040 062 061 063 056 060 040 050 060 063 072
0002200 3333 302e 2029 3020 342e 2025 5b1b 0d4a
3 3 . 0 ) 0 . 4 % 033 [ J \r
063 063 056 060 051 040 040 060 056 064 045 040 033 133 112 015
0002220 3a41 2020 3120 352e 2820 3130 342e 2029
A : 1 . 5 ( 0 1 . 4 )
101 072 040 040 040 061 056 065 040 050 060 061 056 064 051 040
0002240 666f 3220 3331 302e 2820 3330 333a 2e33
o f 2 1 3 . 0 ( 0 3 : 3 3 .
157 146 040 062 061 063 056 060 040 050 060 063 072 063 063 056
And
mplayer xxx.mp3 | tr '\r' '\n'
shows
A: 0.2 (00.1) of 213.0 (03:33.0) 0.3%
A: 0.3 (00.3) of 213.0 (03:33.0) 0.3%
A: 0.5 (00.5) of 213.0 (03:33.0) 0.4%
A: 0.6 (00.6) of 213.0 (03:33.0) 0.4%
A: 0.8 (00.8) of 213.0 (03:33.0) 0.4%
A: 1.0 (01.0) of 213.0 (03:33.0) 0.4%
While,
mplayer xxx.mp3 | tr '\r' '\n' | grep ^A
shows empty result.
Any tip will be appreciated.
It's your definition of "line" that's causing the problem here. The -n means that all the numbers are output on a single line, according the the definition used by grep (a series of characters, terminated by the \n character):
\r1\r2\r3\r4\r5\r6\r7\r8\r9
If you pipe the output through something like a hex dump, you can see what's happening:
$ for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 300d 310d 320d 330d 340d 350d 360d 370d
\r 0 \r 1 \r 2 \r 3 \r 4 \r 5 \r 6 \r 7
015 060 015 061 015 062 015 063 015 064 015 065 015 066 015 067
0000020 380d 390d 000a
\r 8 \r 9 \n
015 070 015 071 012
0000025
That single line containing all the carriage returns (and not newlines) will, when output, appear to be a single line with just the 9 on it. Removing the -n will result instead in:
$ for i in $(seq 0 9); do echo -e "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 350d 000a
\r 5 \n
015 065 012
0000003
which would look like just the 5 was being output.
If you have a process that outputs "lines" separated by carriage returns rather than newlines, there's nothing to stop you changing them on the fly so as to be able to handle them as real lines:
$ echo -e "junk\rA: good 1\rjunk\rA: good 2\rjunk" | tr '\r' '\n' | grep '^A'
A: good 1
A: good 2
Applying that back to your original question, it would be (with the sleep removed since it's irrelevant):
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5
5
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5 | od -xcb
0000000 0a35
5 \n
065 012
0000002

Program to view a files bytes

There are some file formats that i am looking into, and in order to use them i want to understand them at the byte level. I have been trying to find a program that displays the files bytes but the only ones i have so far only support displaying the bytes in hexadecimal. I would prefer them to be displayed in decimal instead of hexadecimal since the format i am looking at uses decimal. I could write my own program to do this, however that would be less readable, and would take more time. Currently all the programs i have found so far have displayed their output only in hexadecimal .
You can try HexEdit. It gives your the option to show Content as Integer-Values. (Property-Window):
You can download it from www.hexedit.com.
For me, the following works:
$ od -t u1 /bin/ls | head -5
0000000 127 069 076 070 001 002 001 000 000 000 000 000 000 000 000 000
0000020 000 002 000 002 000 000 000 001 000 001 015 192 000 000 000 052
0000040 000 000 103 092 000 000 000 000 000 052 000 032 000 006 000 040
0000060 000 023 000 021 000 000 000 006 000 000 000 052 000 001 000 052
0000100 000 000 000 000 000 000 000 192 000 000 000 192 000 000 000 005
I find it a bad idea, though, to look at it in decimal.
One should be able to mentally convert hexadecimal byte values to decimal or to ASCII characters if one dives into such matters.

awk : multilines output fr single string w easy looking replacement

000 000 000 000 (4 fields each of which is a group of 3 zeros separated by a space)
Process to generate 4 new lines
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
On ea line a group of three zeros is replaced by 100
How can I do this ?
tom
$ echo '000 000 000 000' | awk '{for (i=1;i<=NF;i++) {f=$i; $i="100"; print; $i=f}}'
100 000 000 000
000 100 000 000
000 000 100 000
000 000 000 100
Edit:
The fields are iterated over using the for loop. Each field ($i - for the field number i) is saved to a temporary variable f. Then the contents of the field are replaced. The record ($0) is printed. The field is returned to its previous value using the temporary value.
It might be easier to follow if this data was used: 001 002 003 004. Then the output would look like:
100 002 003 004
001 100 003 004
001 002 100 004
001 002 003 100
Here's a shell script version using sed:
data='001 002 003 004' # or 000 000 000 000
for i in {1..4}; do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or
count=$(echo "data" | $wc -w)
for ((i=1;i<=count;i++)); do echo "$data" | sed "s/\<[0-9]\{3\}\>/100/$i"; done
or Bash without any external utilities:
data=(001 002 003 004) # use an array
count=${#data[#]}
for ((i=0;i<count;i++)); do f=${data[i]}; data[i]=100; echo "${data[#]}"; data[i]=$f; done
Or many other possibilities.

unexpected result from gnu sort

when I try to sort the following text file 'input':
test1 3
test3 2
test 4
with the command
sort input
the output is exactly the input. Here is the output of
od -bc input
:
0000000 164 145 163 164 061 011 063 012 164 145 163 164 063 011 062 012
t e s t 1 \t 3 \n t e s t 3 \t 2 \n
0000020 164 145 163 164 011 064 012
t e s t \t 4 \n
0000027
It's just a tab separated file with two columns. When I do
sort -k 2
The output changes to
test3 2
test1 3
test 4
which is what I would expect. But if I do
sort -k 1
nothing changes with respect to the input, whereas I would expect 'test' to sort before 'test1'. Finally, if I do
cat input | cut -f 1 | sort
I get
test
test1
test3
as expected. Is there a logical explanation for this? What exactly is sort supposed to do by default, something like:
sort -k 1
?
My version of sort:
sort (GNU coreutils) 7.4
From the man pages:
* WARNING * The locale specified by the environment affects
sort
order. Set LC_ALL=C to get the traditional sort order that uses
native
byte values.
So it seems export LC_ALL=C must help

Resources