What's the best way to read and convert a JSON field's decimal value to an ASCII character? For instance, converting 107 to 'k'. The manual doesn't appear to mention a direct way to do so.
$ jq -n '[107] | implode'
"k"
implode will work for both ASCII and non-ASCII decimal codes. As illustrated here, it converts an array of admissible decimals to a UTF-8 string equivalent.
Here's an example showing conversion:
$ jq -n -c '{"a": [107, 108]} | .a |= implode'
{"a":"kl"}
Related
I've been experiencing some weird issues today while debugging, and I've managed to trace this to something I overlooked at first.
Take a look at the outputs of these two commands:
root#test:~# printf '%X' 10 | xxd -r -p | xxd -p
root#test:~# printf '%X' 43 | xxd -r -p | xxd -p
2b
root#test:~#
The first xxd command converts hex to ASCII. The second converts ASCII back to hex. (43 decimal = 2b hex).
Unfortunately, it seems that converting hex to ASCII does not preserve non-printable characters. For example, the raw hex "A" (10 decimal = A hex), somehow gets eaten up by xxd -r -p. Thus, when I perform the inverse operation, I get an empty result.
What I am trying to do is feed some data into minimodem. I need to generate Call Waiting Caller ID (FSK), effectively via bit banging. My bash script has the right bits, but if I do a hexdump, the non-printable characters are missing. Unfortunately, it seem that minimodem only accepts ASCII characters, and I need to feed it raw hex, but it seems that gets eaten up in the conversion. Is it possible to preserve these characters somehow? I don't see it as any option, so wondering if there's a better way.
xxd expects two characters per byte. One A is invalid. Do:
printf '%02X' 10 | xxd -r -p | xxd -p
How to convert hex to ASCII while preserving non-printable characters
Use xxd. If your input has one character, pad it with an initial 0.
ASCII does not preserve non-printable characters
It does preserve any bytes, xxd is the common tool to work with any binary data in shell.
Is it possible to preserve these characters somehow?
Yes - input sequence of two characters per byte to xxd.
If I base64 encode a string which consists of seven characters e.g. abcdefg with the website https://www.base64encode.org/ the result is YWJjZGVmZw==. The trailing "==" characters are padding because the number of input characters cannot be divided by 7.
I've to reproduce this result in bash. So I've tried the following command:
echo "abcdefg" | base64
However, the result is different now:
YWJjZGVmZwo=
I'm using Ubuntu where base64 (GNU coreutils) 8.25 is installed.
I would be glad if someone could give me a hint.
I've just noticed that the reason for the described behaviour is the newline which echo writes at the end. So the correct command is the following which suppress the newline
echo -n "abcdefg" | base64
Then the output is like I expect it:
YWJjZGVmZw==
It is also tricky how a here-string will produce unexpected output. It is probably missing the null character \0.
$ base64 <<<"abcdefg"
YWJjZGVmZwo=
$ printf 'abcdefg' | base64
YWJjZGVmZw==
I'm trying to run:
sed 's/[\xE0-\xEF]/_/g;
but am getting a complaint about an "invalid collation character". What's wrong with my range of characters in the square brackets?
Try to set the LC_ALL environnement variable to the C locale (aka the POSIX locale):
LC_ALL=C sed 's/[\xE0-\xEF]/_/g'
The non-ASCII compliant characters generated may interfere with encodings or whatever. Note that it works fine with standard ASCII ranges: sed 's/[\x41-\x42]/_/g'
Here's a way with tr:
tr "\340-\357" "_" < input > output
(those are octal values for the hex codes you provided).
This question already has answers here:
Base 64 encoding from command line gives different output than other methods
(2 answers)
Closed 7 days ago.
Can anyone explain this?
[vagrant#centos ~]$ echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=
[vagrant#centos ~]$ echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm
The first string encodes with o= at the end, but the encoded string with == at the end instead, decodes to the same original string...
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Compare these
echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64 | od -c
echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -D | od -c
echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=" | base64 -D | od -c
If we don't send the newline when using echo the o is missing, have a look at this...
echo -n "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64
It's the newline that's being encoded that gives the o in o=
The = is padding and it might not always be there. Have a look here..
https://en.wikipedia.org/wiki/Base64#Padding
Different implementations may also use different padding characters. You can see some of the differences here
https://en.wikipedia.org/wiki/Base64#Variants_summary_table
From the RFC
3.2. Padding of Encoded Data
In some circumstances, the use of padding ("=") in base-encoded
data is not required or used. In the general case, when
assumptions about the size of transported data cannot be made,
padding is required to yield correct decoded data.
Implementations MUST include appropriate pad characters at the end
of encoded data unless the specification referring to this document
explicitly states otherwise.
The base64 and base32 alphabets use padding, as described below in
sections 4 and 6, but the base16 alphabet does not need it; see
section 8.
When you use $echo, a newline is appended to the end of the output. This newline character is part of the base64 encoding. When you change the 'o' to a '=', you're changing the encoding of the newline character. In this case, the character it decodes to is still not a printable character.
In my terminal, decoding the two string yields the same output, but the string ending in "o=" has a newline, and the string ending in "==" does not.
$> echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm
$> echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm $>
Using $echo -n would allow you to pass the string into base64 without the trailing newline. The string without the newline encodes to the string ending in "==".
On Macs, I noticed I had to append "\c" to the end of my string to get it to work, like this:
[vagrant#centos ~]$ echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm\c" | base64
The result was:
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbVxjCg==
PHP also encodes it properly, which leads me to believe there is some issue with the base64 program in bash, as I haven't found any mention of 'o' somehow being used as a padding character.
php > echo base64_encode("10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm");
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==
I'm parsing a file that may contain control-characters (ASCII 0-31). Now I want to replace each of those control-characters with their ASCII-code in hexadecimal representation. A rather simple example of what I have in mind:
$ echo -e "a\011b" | sed -e 's/\o11/\\x09/g'
a\x09b
This converts the tab (\011) to \x09, so the a<tab>b becomes a\x09b.
Obviously I could use 32 -e-parameters, but I consider that bad. Is there a generic approach to this?
BTW, it's not a problem if the \n remains a \n. sed isn't required.
I would use Perl. Note that tab is actually 9, not 8 - if you're trying to change the value, then this is incorrect, but if you're just encoding, this should do the trick:
echo -e "a\011b" | perl -lpe 's/[\0-\037\177]/sprintf "\\x%02x", ord $&/ge'