Cut -b does not stop at expected point - bash

I'm trying to extract a range of bytes from a file. File contains continuous 16 bit sample data. I would think cut -b should work but I am getting error in the data.
Extracting 20 bytes (10 samples)
cut -b188231561-188231580 file.dat > out.dat
I expect it to create a 20 byte file with 10 samples (last sample should be the -79). However it creates a 5749bytes file, with the following contents (displayed using od -s)
0000000 -69 -87 -75 -68 -83 -94 -68 -67
0000020 -82 -79 2570 2570 2570 2570 2570 2570
0000040 2570 2570 2570 2570 2570 2570 2570 2570
*
0013140 -65 -67 -69 -69 -71 -66 -72 -68
0013160 -69 -80 10
0013165
As you can see, there is a whole bunch of repeated 2570 values where cut was supposed to stop.
What am I doing wrong here? Also the Wikipedia article on cut says -b is limited to 1023 byte lines. Although the man pages for cut don't seem to mention this limitation.
Is there a better bash command to extract bytes N-M from a binary file? I already wrote a Perl script to do it. I'm just curious.

cut -b is used to get certain bytes from each line, it can't be used to get bytes from the file as a whole.
You can use head/tail instead:
N=120
M=143
tail -c +$N file | head -c $((M-N))

Related

Display and format aligned binary data using bash

I have a binary file that stores a collection of C structs, e.g.
typedef struct{
uint8_t field1;
uint16_t field2;
uint32_t field3;
}example;
I would like to dump the file aligned, i.e. have one instance per line. I don't really need to have space separated values for each field, this would be enough for example :
# field 1 == 0xaa, field 2 == 0xbbcc, field 3 == 0x00112233
$ command my_file.bin
aabbcc00112233 # output is repeated for each struct
Considering the example above, file content is the following :
$ hexdump my_file.bin
0000000 ccaa 33bb 1122 aa00 bbcc 2233 0011 ccaa
0000010 33bb 1122 aa00 bbcc 2233 0011 ccaa 33bb
0000020 1122 aa00 bbcc 2233 0011 ccaa 33bb 1122
0000030 aa00 bbcc 2233 0011 ccaa 33bb 1122 aa00
0000040 bbcc 2233 0011
0000046
od is a perfect fit when the struct is a multiple of 4 (e.g. od -tx --width=8), but does not work properly in this example where the width is 7 bytes. Is it possible in bash ?
Tell od to print 7 bytes per line, each individually, and get rid of spaces using tr.
$ od -An -v -tx1 -w7 file | tr -d ' '
aabbcc00112233
...
Note that this is only good for big-endian inputs.

awk script to return results recursively [duplicate]

This question already has an answer here:
awk script along with for loop
(1 answer)
Closed 7 years ago.
I have a data set as below (t.txt):
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I have second file as below (t1.txt):
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract column 2 elements of the second file from the first data set using a for loop.
I have tried :
for i in `cat t1.txt`
do
awk -F " " '$1=i {print $2}' t.txt > t0.txt
done
Desired output is :
694
820
774
Unfortunately I am getting a blank file.
I have tried to do it manually like : awk -F " " '$1==752 {print $2}' t.txt > t0.txt
Results obtained are
694
774
How can I do it for the entire t1 file in one go?
Simplest way: using join
$ join -o 1.2 <(sort t.txt) <(sort t1.txt)
694
774
820
join requires the files to be lexically sorted on the comparison field (the default field one). The -o option instructs join to output the 2nd field from the 1st file.
With awk
$ awk 'NR==FNR {key[$1]; next} $1 in key {print $2}' t1.txt t.txt
694
820
774
That remembers the keys in t1.txt, then loops over t.txt (when the accumulated record number NR is not equal to the file's record number FNR), if the first field occurred in t1, print the second field.

The simplest way to delete a section of text, n times

I have a file bigger than 4gb which is bad news for me because I can't open the file in notepad++ and use the macro feature to record and repeat a process to the end of a file.
What I'd like to do is say, leave the first 20 lines of text, then delete the next 80, then repeat that process to the end of a file.
What would be the easiest way to do this?
I'm looking at these files on a linux server so running a script of some kind would be the easiest way, or maybe someone knows a way to do this in vi? (hence the lame taging)
Thanks in advance
awk can do this fairly easily:
awk '(NR-1)%100 < 20' bigfile.txt
I would go with the awk solution, but here's one way you could do the same thing with sed:
seq 20 | sed 's/$/~100p/' | sed -nf - bigfile.txt
Testing:
seq 20 | sed 's/$/~100p/' | sed -nf - <(seq 200)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

computer basics problem

hi everybody can anyone tell me answer of this question ?
i created a simple txt file. it contain only two words and the words are hello word according to i studied computer uses ascii code to store the text on disk or memory .In ascii code each letter or symbol is represented by one byte or in simple words one byte is used to store a symbol.
Now the problem is this when ever i saw the size of file it shows 11 byte I understand 9 byte for words one byte for space makes the total of 10 then why it is showing 11 byte size .i tried different things such as changing the name of file saving it with shortest name possible or longest name possible but it did not change the total storage
so can any body explain why it is happening? i tried this thing over window or Linux(Ubuntu.centos) system result is same.
pax> echo hello word >outfile.txt
pax> ls -al outfile.txt
-rw-r--r-- 1 pax pax 11 2010-11-19 15:34 outfile.txt
pax> od -xcb outfile.txt
0000000 6568 6c6c 206f 6f77 6472 000a
h e l l o w o r d \n
150 145 154 154 157 040 167 157 162 144 012
pax> hd outfile.txt
00000000 68 65 6c 6c 6f 20 77 6f 72 64 0a |hello word.|
0000000b
As per above, you're storing "hello word" and the newline character. That's 11 characters in total. If you don't want the newline, you can use something like the -n option of echo (which doesn't add the newline):
pax> echo -n hello word >outfile.txt
pax> ls -al outfile.txt
-rw-r--r-- 1 pax pax 10 2010-11-19 15:36 outfile.txt
pax> od -xcb outfile.txt
0000000 6568 6c6c 206f 6f77 6472
h e l l o w o r d
150 145 154 154 157 040 167 157 162 144
pax> hd outfile.txt
00000000 68 65 6c 6c 6f 20 77 6f 72 64 |hello word|
0000000a
If you want to see the content of the file you can perform an octal dump of it using the "od" command under linux "od ". Most probably what you will see is a CR (carriage return) and a LN (linefeed).
The name of the file has nothing to do with his size.
Luis
Did you a new line in the text file (\n)? Just because this character cannot be seen does not mean it is not there.

unexpected result from gnu sort

when I try to sort the following text file 'input':
test1 3
test3 2
test 4
with the command
sort input
the output is exactly the input. Here is the output of
od -bc input
:
0000000 164 145 163 164 061 011 063 012 164 145 163 164 063 011 062 012
t e s t 1 \t 3 \n t e s t 3 \t 2 \n
0000020 164 145 163 164 011 064 012
t e s t \t 4 \n
0000027
It's just a tab separated file with two columns. When I do
sort -k 2
The output changes to
test3 2
test1 3
test 4
which is what I would expect. But if I do
sort -k 1
nothing changes with respect to the input, whereas I would expect 'test' to sort before 'test1'. Finally, if I do
cat input | cut -f 1 | sort
I get
test
test1
test3
as expected. Is there a logical explanation for this? What exactly is sort supposed to do by default, something like:
sort -k 1
?
My version of sort:
sort (GNU coreutils) 7.4
From the man pages:
* WARNING * The locale specified by the environment affects
sort
order. Set LC_ALL=C to get the traditional sort order that uses
native
byte values.
So it seems export LC_ALL=C must help

Resources