bash/unix toolchain binary stream processing/slicing

bash/unix toolchain binary stream processing/slicing - bash

I have a binary stream on standard input, it's in a fixed size format, a continuos stream of packets, each packet has a header with length X and a body with length Y.
So if X=2 Y=6 then it's something like 00abcdef01ghijkl02mnopqr03stuvwx, but it's binary and both the header and data can contain any "characters" (including '\0' and newline), the example is just for readability.
I want to get rid of the header data so the output looks like this: abcdefghijklmnopqrstuvwx.
Are there any commands in the unix toolchain that allow me to do this? And in general are there any tools for handling binary data? The only tool I could think of is od/hexdump but how do you convert the result back to binary?

Use xxd which goes to and from a hexdump.
xxd -c 123 -ps
will output your stream with 123 bytes per line. To reverse use
xxd -r -p
You should now be able to put this together with cut to drop characters since you can do something like
cut -c 3-
to get all characters from 3 to the end of a line. Do not forget to use a number of characters equal to 2X to account for two hex characters per byte.
So something along the lines of
xxd -c X+Y -ps | cut -c 2X+1- | xxd -r -p
where X+Y and 2X+1 are replaced with actual numerical values. You'll need to put your datastream somewhere appropriate in to the above command.

Perl is a pretty standard unix tool. Pipe it to perl. If its fixed length byte aligned a simple substr operation should work. Here is a perl sample that should work.
#!/usr/bin/env perl
use strict;
use warnings;
my $buf;
my $len = 8;
my $off = 2;
while(sysread(STDIN,$buf,$len) != 0 ){
print substr($buf,$off);
}
exit 0;

As a one-liner, I'd write:
perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}'
example:
echo '00abcdef01ghijkl02mnopqr03stuvw
00abcdef01ghi
kl02mnopqr' | perl -00 -ne 'chomp; while (/(?:..)(......)/sg) {print $1}' | od -c
produces
0000000 a b c d e f g h i j k l m n o p
0000020 q r s t u v w \n a b c d e f g h
0000040 i \n k l m n o p q r
0000052

There's also bbe - binary block editor, which is kind of binary sed for handling binary data the Unix way.
http://bbe-.sourceforge.net

The binary stream editor is a tool written in java for handling streams.
It can be used from java as well as command line.
https://sourceforge.net/projects/bistreameditor/
DISCLAIMER : i am the author of this tool.
Unlike new-line based tools like sed, it allows custom traversing and data-storage via the traversal and buffer. Binary data can be treated as one byte chars and string operations/matches allowed. It can write to multiple outputs and use different encodings. Because of this flexibility, currently the command line has a lot of parameters, which needs to be simplified.
The bse.zip file should be downloaded and used.
For the above example, we would simply need to do a substr(2) on the input of len 8. The full command line is
java -classpath "./bin:$CMN_LIB_PATH/commons-logging-1.1.1.jar:$CMN_LIB_PATH/commons-io-2.1.jar:$CMN_LIB_PATH/commons-jexl-2.1.1.jar:$CMN_LIB_PATH/commons-lang3-3.1.jar"
-Dinputsrc=file:/fullpathtofile|URL|System.in
-Dtraverser=org.milunsagle.io.streameditor.FixedLengthTraverser
-Dtraversercons=size -Dtraverserconsarg0=8
-Dbuffer=org.milunsagle.io.streameditor.CircularBuffer
-Dbuffercons=size -Dbufferconsarg0=8
-Dcommands='PRN V $$__INPUT.substring(2)'
org.milunsagle.io.streameditor.BinaryStreamEditorInvoker

Related

Transliteration in sed

I'm trying to convert Arabic numerals to Roman numerals using sed (just as a learning exercise), but I'm not getting the expected output.
The sed manual says
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
Input
echo "1 5 15 20" | sed 'y/151520/IVXVXX/'
Output
I V IV XX
Expected output
I V XV XX
I've tried replacing first X with any character, and the output is the same for each, so I gather that 1 is mapped to I by the sed program. However, according to the description of the y command, shouldn't the program be transliterating character by character? How would I do this?

I think you misunderstand the description. What y does is whenever any character from the left-hand side occurs in the input, replace it with the corresponding character from the right-hand side.
Specifying one character multiple times doesn't really make sense and I'm not sure the behavior of sed is defined in this case, although your version apparently takes the first occurrence and uses that.
To illustrate:
$ echo HELLO WORLD | sed 'y/L/x/'
HExxO WORxD
$ echo HELLO WORLD | sed 'y/LL/xy/'
HExxO WORxD
Fundamentally, your problem is that it's impossible to accomplish this task with just transliteration.
Your case illustrates that quite nicely: 15 is really 1 and 5 and sed has no way of distinguishing between the two.

You already got the "why" answer, but FYI here's how to do what you wanted using a standard UNIX tool:
$ echo "1 5 15 20" | awk '
BEGIN { split("1 5 15 20",a); split("I V XV XX",r); for (i in a) map[a[i]]=r[i] }
{ for (i=1; i<=NF; i++) $i=map[$i] }
1'
I V XV XX

Replace and increment letters and numbers with awk or sed

I have a string that contains
fastcgi_cache_path /var/run/nginx-cache15 levels=1:2 keys_zone=MYSITEP:100m inactive=60m;
One of the goals of this script is to increment nginx-cache two digits based on the value find on previous file. For doing that I used this code:
# Replace cache_path
PREV=$(ls -t /etc/nginx/sites-available | head -n1) #find the previous cache_path number
CACHE=$(grep fastcgi_cache_path $PREV | awk '{print $2}' |cut -d/ -f4) #take the string to change
SUB=$(echo $CACHE |sed "s/nginx-cache[0-9]*[0-9]/&#/g;:a {s/0#/1/g;s/1#/2/g;s/2#/3/g;s/3#/4/g;s/4#/5/g;s/5#/6/g;s/6#/7/g;s/7#/8/g;s/8#/9/g;s/9#/#0/g;t a};s/#/1/g") #increment number
sed -i "s/nginx-cache[0-9]*/$SUB/g" $SITENAME #replace number
Maybe not so elegant, but it works.
The other goal is to increment last letter of all occurrences of MYSITEx (MYSITEP, in that case, should become MYSITEQ, after MYSITEP, etc. etc and once MYSITEZ will be reached add another letter, like MYSITEAA, MYSITEAB, etc. etc.
I thought something like:
sed -i "s/MYSITEP[A-Z]*/MYSITEGG/g" $SITENAME
but it can't works cause MYSITEGG is a static value and can't be used.
How can I calculate the last letter, increment it to the next one and once the last Z letter will be reached, add another letter?
Thank you!

Perl's autoincrement will work on letters as well as digits, in exactly the manner you describe
We may as well tidy your nginx-cache increment as well while we're at it
I assume SITENAME holds the name of the file to be modified?
It would look like this. I have to assign the capture $1 to an ordinary variable $n to increment it, as $1 is read-only
perl -i -pe 's/nginx-cache\K(\d+)/ ++($n = $1) /e; s/MYSITE\K(\w+)/ ++($n = $1) /e;' $SITENAME
If you wish, this can be done in a single substitution, like this
perl -i -pe 's/(?:nginx-cache|MYSITE)\K(\w+)/ ++($n = $1) /ge' $SITENAME

Note: The solution below is needlessly complicated, because as Borodin's helpful answer demonstrates (and #stevesliva's comment on the question hinted at), Perl directly supports incrementing letters alphabetically in the manner described in the question, by applying the ++ operator to a variable containing a letter (sequence); e.g.:
$ perl -E '$letters = "ZZ"; say ++$letters'
AAA
The solution below may still be of interest as an annotated showcase of how Perl's power can be harnessed from the shell, showing techniques such as:
use of s///e to determine the replacement string with an expression.
splitting a string into a character array (split //, "....")
use of the ord and chr functions to get the codepoint of a char., and convert a(n incremented) codepoint back to a char.
string replication (x operator)
array indexing and slices:
getting an array's last element ($chars[-1])
getting all but the last element of an array (#chars[0..$#chars-1])
A perl solution (in effect a re-implementation of what ++ can do directly):
perl -pe 's/\bMYSITE\K([A-Z]+)/
#chars = split qr(), $1; $chars[-1] eq "Z" ?
"A" x (1 + scalar #chars)
:
join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1])
/e' <<'EOF'
...=MYSITEP:...
...=MYSITEZP:...
...=MYSITEZZ:...
EOF
yields:
...=MYSITEQ:... # P -> Q
...=MYSITEZQ:... # ZP -> ZQ
...=MYSITEAAA:... # ZZ -> AAA
You can use perl's -i option to replace the input file with the result
(perl -i -pe '...' "$SITENAME").
As Borodin's answer demonstrates, it's not hard to solve all tasks in the question using perl alone.
The s function's /e option allows use of a Perl expression for determining the replacement string, which enables sophisticated replacements:
$1 references the current MYSITE suffix in the expression.
#chars = split qr(), $1 splits the suffix into a character array.
$chars[-1] eq "Z" tests if the last suffix char. is Z
If so: The suffix is replaced with all As, with an additional A appended
("A" x (1 + scalar #chars)).
Otherwise: The last suffix char. is replaced with the following letter in the alphabet
(join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1]))

sorting numerically by first row

I have a file with almost 900 lines in excel that I've saved as a tab deliminated .txt file. I'd like to sort the text file by the numbers given in the first column (they range between 0 and 2250). The other columns are both numbers and letters of varying length eg.
myfile.txt:
0251 abcd 1234,24 bcde
2240 efgh 2345,98 ikgpppm
0001 lkjsi 879,09 ikol
I've tried
sort -k1 -n myfile.txt > myfile_num.txt
but I just get an identical file with new name. I'd like to get:
myfile_num.txt
0001 lkjsi 879,09 ikol
0251 abcd 1234,24 bcde
2240 efgh 2345,98 ikgpppm
What am I doing wrong? I'm guessing that it's quite simple, but I'd appreciate any help I can get! I only know a little bash scripting, so it'd be nice if the script is a very simple one-liner that I can understand :)
Thanks :)

Use this to convert old Mac OS carriage return to newline:
tr '\r' '\n' < myfile.txt | sort

As stated here you can have problems with this (and in the other pseudo-follow-up-duplicate question you asked, yes, you did)
tr '\r' '\n' < myfile.txt | sort -n
It works fine here on MSYS but on some platforms you may have to add:
export LC_CTYPE=C
or tr will consider the file as a text file, and probably will tag it as corrupt after having reached the max line limit.
Obviously I could not test it, but I'm confident it will solve the problem given what I read on the linked answer.

A python approach (python 2 & 3 compatible), immune to all shell problems. Works great, and portable. I noticed that the input file has some '0x8C' chars (exotic dots), probably confusing tr command.
That is handled properly below:
import csv,sys
# read the file as binary, as it is not really text
with open("Proteins.txt","rb") as f:
data = bytearray(f.read())
# replace 0x8c char by classical dots
for i,c in enumerate(data):
if c>0x7F: # non-ascii: replace by dot
data[i] = ord(".")
# convert to list of ASCII strings (split using the old MAC separator)
lines = "".join(map(chr,data)).split("\r")
# treat our lines as input for CSV reader
cr = csv.reader(lines,delimiter='\t',quotechar='"')
# read all the lines in a list
rows = list(cr)
# perform the sort (tricky)
# on first row, numerical, removing the leading 0 which is illegal
# in python 3, and if not numerical, put it at the top
rows = sorted(rows,key=lambda x : x[0].isdigit() and int(x[0].strip("0")))
# write back the file as a nice, legal, ASCII tsv file
if sys.version_info < (3,):
f = open("Proteins_sorted_2.txt","wb")
else:
f = open("Proteins_sorted_2.txt","w",newline='')
cw = csv.writer(f,delimiter='\t',quotechar='"')
cw.writerows(rows)
f.close()

Caesar Cypher Code Not Working

I am meant to create a Caesar Cypher that takes in a parameter and shifts the code based on that parameter but my code messes up with the Upper Case and lower case.
So, it's meant to be like:
$ echo "I came, I saw, I conquered." | ./caesar.sh
V pnzr, V fnj, V pbadhrerq.
but I get:
V pnzr, V FnJ, V pBADHERrq.
My code is:
#!/bin/sh
if [ -z "$#" ];then
rotation=13;
else
rotation=$((# % 16));
fi
tr $(printf %${rotation}s | tr ' ' '.')\a-zA-Z a-zA-Z
How can I fix this?

You are rotating across the entire double-alphabet, 'a-zA-Z', so 's' maps to 'F':
abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ
|------------^
You apparently want to preserve case, so I would recommend that you apply two separate mappings: first, map 'a-z' to 'n-za-m' (or whatever, as appropriate for your input parameter). Then in the second pass, map capitals, 'A-Z' -> 'N-ZA-M'.

A basic adaptation of your scheme that works is:
rotation=$((${1:-13} % 26))
padding=$(printf "%${rotation}s" "" | tr ' ' '\001')
tr "${padding}a-z" "a-za-z" |
tr "${padding}A-Z" "A-ZA-Z"
This uses parameter expansion and arithmetic to determine the rotation.
It uses your basic mechanism for setting the padding, but uses Control-A instead of . as the padding character; you seldom have Control-A in your text.
The actual rotation commands deal with lower case separately from upper case.
With the script contained in a file script.sh, I got:
$ bash script.sh
I came, I saw, I conquered
Can you say SYZYGY after midnight?
V pnzr, V fnj, V pbadhrerq
Pna lbh fnl FLMLTL nsgre zvqavtug?
$ bash script.sh 3
I came, I saw, I conquered, and O, was it ever worthwhile!
Can you say SYZYGY after midnight? ABC...XYZ abc...xyz
L fdph, L vdz, L frqtxhuhg, dqg R, zdv lw hyhu zruwkzkloh!
Fdq brx vdb VBCBJB diwhu plgqljkw? DEF...ABC def...abc
$
The pipeline meant that the first line of input was not pushed through to the second tr command at the end of line.

How to line wrap output in bash?

I have a command which outputs in this format:
A
B
C
D
E
F
G
I
J
etc
I want the output to be in this format
A B C D E F G I J
I tried using ./script | tr "\n" " " but all it does is remove n from the output
How do I get all the output in one line. (Line wrapped)
Edit: I accidentally put in grep while asking the question. I removed
it. My original question still stands.

The grep is superfluous.
This should work:
./script | tr '\n' ' '
It did for me with a command al that lists its arguments one per line:
$ al A B C D E F G H I J
A
B
C
D
E
F
G
H
I
J
$ al A B C D E F G H I J | tr '\n' ' '
A B C D E F G H I J $

As Jonathan Leffler points out, you don't want the grep. The command you're using:
./script | grep tr "\n" " "
doesn't even invoke the tr command; it should search for the pattern "tr" in files named "\n" and " ". Since that's not the output you reported, I suspect you've mistyped the command you're using.
You can do this:
./script | tr '\n' ' '
but (a) it joins all its input into a single line, and (b) it doesn't append a newline to the end of the line. Typically that means your shell prompt will be printed at the end of the line of output.
If you want everything on one line, you can do this:
./script | tr '\n' ' ' ; echo ''
Or, if you want the output wrapped to a reasonable width:
./script | fmt
The fmt command has a number of options to control things like the maximum line length; read its documentation (man fmt or info fmt) for details.

No need to use other programs, why not use Bash to do the job? (-- added in edit)
line=$(./script.sh)
set -- $line
echo "$*"
The set sets command-line options, and one of the (by default) seperators is a "\n". EDIT: This will overwrite any existing command-line arguments, but good coding practice would suggest that you reassigned these to named variables early in the script.
When we use "$*" (note the quotes) it joins them alll together again using the first character of IFS as the glue. By default that is a space.
tr is an unnecessary child process.
By the way, there is a command called script, so be careful of using that name.

If I'm not mistaken, the echo command will automatically remove the newline chars when its argument is given unquoted:
tmp=$(./script.sh)
echo $tmp
results in
A B C D E F G H I J
whereas
tmp=$(./script.sh)
echo "$tmp"
results in
A
B
C
D
E
F
G
H
I
J
If needed, you can re-assign the output of the echo command to another variable:
tmp=$(./script.sh)
tmp2=$(echo $tmp)
The $tmp2 variable will then contain no newlines.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash/unix toolchain binary stream processing/slicing - bash

There's also bbe - binary block editor, which is kind of binary sed for handling binary data the Unix way. http://bbe-.sourceforge.net

Related

Transliteration in sed

Replace and increment letters and numbers with awk or sed

sorting numerically by first row

Caesar Cypher Code Not Working

How to line wrap output in bash?

Categories

Resources