Display and format aligned binary data using bash - bash

I have a binary file that stores a collection of C structs, e.g.
typedef struct{
uint8_t field1;
uint16_t field2;
uint32_t field3;
}example;
I would like to dump the file aligned, i.e. have one instance per line. I don't really need to have space separated values for each field, this would be enough for example :
# field 1 == 0xaa, field 2 == 0xbbcc, field 3 == 0x00112233
$ command my_file.bin
aabbcc00112233 # output is repeated for each struct
Considering the example above, file content is the following :
$ hexdump my_file.bin
0000000 ccaa 33bb 1122 aa00 bbcc 2233 0011 ccaa
0000010 33bb 1122 aa00 bbcc 2233 0011 ccaa 33bb
0000020 1122 aa00 bbcc 2233 0011 ccaa 33bb 1122
0000030 aa00 bbcc 2233 0011 ccaa 33bb 1122 aa00
0000040 bbcc 2233 0011
0000046
od is a perfect fit when the struct is a multiple of 4 (e.g. od -tx --width=8), but does not work properly in this example where the width is 7 bytes. Is it possible in bash ?

Tell od to print 7 bytes per line, each individually, and get rid of spaces using tr.
$ od -An -v -tx1 -w7 file | tr -d ' '
aabbcc00112233
...
Note that this is only good for big-endian inputs.

Related

JQ explode function is returning incorrect chars

I am trying to decode base64 encoded binary content in JQ using explode function.
When I run explode and then through implode, I am expecting it to return the same string. But it is not. Try it here: https://jqplay.org/s/Rt8H1qv8VRP
Base64 encoded string: "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/"
JQ: '#base64d | explode | implode | #base64'
Output: "AQEAAAABAQAyGWRkZBXvv71bBwAAAAABAgMEBQYHCAkKUgbvv70/Ugbvv70/"
Debugging further,
#base64d | explode | .[14]
returns
65533
Running the following on Ubuntu, you can see the [14] char is 315 (octal) == 215(decimal)
$ echo "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/" | base64 -d | od -bc
0000000 001 001 000 000 000 001 001 000 062 031 144 144 144 025 315 133
001 001 \0 \0 \0 001 001 \0 2 031 d d d 025 315 [
0000020 007 000 000 000 000 001 002 003 004 005 006 007 010 011 012 122
\a \0 \0 \0 \0 001 002 003 004 005 006 \a \b \t \n R
0000040 006 236 077 122 006 236 077
006 236 ? R 006 236 ?
0000047
Why is JQ returning this weird 65533 (0xFFFD) character? What am I missing?
First of all, the issue has nothing to do with explode or implode. Using just #base64d | #base64 produces the same result.
jq expects the string encoded with base64 to be text encoded with UTF-8.
If the decoded string is not UTF-8, the results are undefined.
Your input is not UTF-8.
U+FFFD REPLACEMENT CHARACTER is a character used to mark input errors.

output to a variable file name in for loop in bash

I am doing some tasks in side the for loop and trying to stdout to a variable file name during every iteration. But it is giving me the only one file with part of file assigned.
This is my script:
#!/bin/sh
me1_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me1_data"
me3_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/h3k4me3_data"
dnase_dir="/Users/njayavel/Downloads/Silencer_project/roadmap_analysis/data/dnase_data"
index=(003 004)
#index=(003 004 005 006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
#index=(006 007 008 017 021 022 028 029 032 033 034 046 050 051 055 056 057 059 080 081 082 083 084 085 086 088 089 090 091 092 093 094 097 098 100 109)
for i in "${index[#]}"; do
dnase_file="$dnase_dir/E$i-DNase.hotspot.fdr0.01.broad.bed"
me1_fil="$me1_dir/E$i-H3K4me1.broadPeak"
me3_fil="$me3_dir/E$i-H3K4me3.broadPeak"
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me1_fil > me1_file.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $me3_fil > me3_file.bed
ctcf_file="CTCFsites_hg19_sorted_bedmerged.bed"
tss_file="TSS_gene_2kbupstrm_0.5kbdownstrm.bed"
cat me1_file.bed me3_file.bed $ctcf_file $tss_file | sort -k1,1 -k2,2n > file2.bed
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' $dnase_file | sort -k1,1 -k2,2n > file1.bed
bedtools intersect -v -a file1.bed -b file2.bed > E$i_file.txt;
done
It is giving only the output file "E.txt" from the last line in for loop. I am expecting E003_file.txt and E004_file.txt.
I am newbie please help me out.
Thank you
When you write
E$i_file.txt
the shell is looking for a variable named i_file, because _ is a valid character in a variable name, not a delimiter. You need to use braces to delimit the variable name:
bedtools intersect -v -a file1.bed -b file2.bed > "E${i}_file.txt"

unexpected result: grep from a changing line

I wrote a bash command to test grep from a changing line:
for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5
The result shows:
9
Update
The real problem is as follows:
mplayer shows and refreshes a single-line playing progress when playing a media file. A sample result is:
A: 17.2 (17.2) of 213.0 (03:33.0) 0.5%
And I'm trying to grep this playing progress and ingore other lines. I used this command:
mplayer xxx.mp3 | grep ^A:
The result does not contain the line expected.
Update 2
mplayer xxx.mp3 | od -xda
shows:
0002140 4a5b 410d 203a 2020 2e31 2033 3028 2e31
[ J \r A : 1 . 3 ( 0 1 .
133 112 015 101 072 040 040 040 061 056 063 040 050 060 061 056
0002160 2932 6f20 2066 3132 2e33 2030 3028 3a33
2 ) o f 2 1 3 . 0 ( 0 3 :
062 051 040 157 146 040 062 061 063 056 060 040 050 060 063 072
0002200 3333 302e 2029 3020 342e 2025 5b1b 0d4a
3 3 . 0 ) 0 . 4 % 033 [ J \r
063 063 056 060 051 040 040 060 056 064 045 040 033 133 112 015
0002220 3a41 2020 3120 352e 2820 3130 342e 2029
A : 1 . 5 ( 0 1 . 4 )
101 072 040 040 040 061 056 065 040 050 060 061 056 064 051 040
0002240 666f 3220 3331 302e 2820 3330 333a 2e33
o f 2 1 3 . 0 ( 0 3 : 3 3 .
157 146 040 062 061 063 056 060 040 050 060 063 072 063 063 056
And
mplayer xxx.mp3 | tr '\r' '\n'
shows
A: 0.2 (00.1) of 213.0 (03:33.0) 0.3%
A: 0.3 (00.3) of 213.0 (03:33.0) 0.3%
A: 0.5 (00.5) of 213.0 (03:33.0) 0.4%
A: 0.6 (00.6) of 213.0 (03:33.0) 0.4%
A: 0.8 (00.8) of 213.0 (03:33.0) 0.4%
A: 1.0 (01.0) of 213.0 (03:33.0) 0.4%
While,
mplayer xxx.mp3 | tr '\r' '\n' | grep ^A
shows empty result.
Any tip will be appreciated.
It's your definition of "line" that's causing the problem here. The -n means that all the numbers are output on a single line, according the the definition used by grep (a series of characters, terminated by the \n character):
\r1\r2\r3\r4\r5\r6\r7\r8\r9
If you pipe the output through something like a hex dump, you can see what's happening:
$ for i in $(seq 0 9); do echo -e -n "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 300d 310d 320d 330d 340d 350d 360d 370d
\r 0 \r 1 \r 2 \r 3 \r 4 \r 5 \r 6 \r 7
015 060 015 061 015 062 015 063 015 064 015 065 015 066 015 067
0000020 380d 390d 000a
\r 8 \r 9 \n
015 070 015 071 012
0000025
That single line containing all the carriage returns (and not newlines) will, when output, appear to be a single line with just the 9 on it. Removing the -n will result instead in:
$ for i in $(seq 0 9); do echo -e "\r"$i; sleep 0.1; done | grep 5 | od -xcb
0000000 350d 000a
\r 5 \n
015 065 012
0000003
which would look like just the 5 was being output.
If you have a process that outputs "lines" separated by carriage returns rather than newlines, there's nothing to stop you changing them on the fly so as to be able to handle them as real lines:
$ echo -e "junk\rA: good 1\rjunk\rA: good 2\rjunk" | tr '\r' '\n' | grep '^A'
A: good 1
A: good 2
Applying that back to your original question, it would be (with the sleep removed since it's irrelevant):
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5
5
$ for i in $(seq 0 9); do echo -e -n "\r"$i; done | tr '\r' '\n' | grep 5 | od -xcb
0000000 0a35
5 \n
065 012
0000002

computer basics problem

hi everybody can anyone tell me answer of this question ?
i created a simple txt file. it contain only two words and the words are hello word according to i studied computer uses ascii code to store the text on disk or memory .In ascii code each letter or symbol is represented by one byte or in simple words one byte is used to store a symbol.
Now the problem is this when ever i saw the size of file it shows 11 byte I understand 9 byte for words one byte for space makes the total of 10 then why it is showing 11 byte size .i tried different things such as changing the name of file saving it with shortest name possible or longest name possible but it did not change the total storage
so can any body explain why it is happening? i tried this thing over window or Linux(Ubuntu.centos) system result is same.
pax> echo hello word >outfile.txt
pax> ls -al outfile.txt
-rw-r--r-- 1 pax pax 11 2010-11-19 15:34 outfile.txt
pax> od -xcb outfile.txt
0000000 6568 6c6c 206f 6f77 6472 000a
h e l l o w o r d \n
150 145 154 154 157 040 167 157 162 144 012
pax> hd outfile.txt
00000000 68 65 6c 6c 6f 20 77 6f 72 64 0a |hello word.|
0000000b
As per above, you're storing "hello word" and the newline character. That's 11 characters in total. If you don't want the newline, you can use something like the -n option of echo (which doesn't add the newline):
pax> echo -n hello word >outfile.txt
pax> ls -al outfile.txt
-rw-r--r-- 1 pax pax 10 2010-11-19 15:36 outfile.txt
pax> od -xcb outfile.txt
0000000 6568 6c6c 206f 6f77 6472
h e l l o w o r d
150 145 154 154 157 040 167 157 162 144
pax> hd outfile.txt
00000000 68 65 6c 6c 6f 20 77 6f 72 64 |hello word|
0000000a
If you want to see the content of the file you can perform an octal dump of it using the "od" command under linux "od ". Most probably what you will see is a CR (carriage return) and a LN (linefeed).
The name of the file has nothing to do with his size.
Luis
Did you a new line in the text file (\n)? Just because this character cannot be seen does not mean it is not there.

unexpected result from gnu sort

when I try to sort the following text file 'input':
test1 3
test3 2
test 4
with the command
sort input
the output is exactly the input. Here is the output of
od -bc input
:
0000000 164 145 163 164 061 011 063 012 164 145 163 164 063 011 062 012
t e s t 1 \t 3 \n t e s t 3 \t 2 \n
0000020 164 145 163 164 011 064 012
t e s t \t 4 \n
0000027
It's just a tab separated file with two columns. When I do
sort -k 2
The output changes to
test3 2
test1 3
test 4
which is what I would expect. But if I do
sort -k 1
nothing changes with respect to the input, whereas I would expect 'test' to sort before 'test1'. Finally, if I do
cat input | cut -f 1 | sort
I get
test
test1
test3
as expected. Is there a logical explanation for this? What exactly is sort supposed to do by default, something like:
sort -k 1
?
My version of sort:
sort (GNU coreutils) 7.4
From the man pages:
* WARNING * The locale specified by the environment affects
sort
order. Set LC_ALL=C to get the traditional sort order that uses
native
byte values.
So it seems export LC_ALL=C must help

Resources