tcp packet/receive and compared numbers, optimization learning question

tcp packet/receive and compared numbers, optimization learning question - bash

I use this code on embedded device to receive chapter number, this is number after sign "/" if is more than 01, execute script:
echo -n "REMOTE QCH" | /tmp/nc 0.0.0.0 48360 > /tmp/QCH
sleep 1s
a=$(cat /tmp/QCH | grep -o '[^/":]\+$' | grep -o '[[:digit:]]*')
if [ "$a" -gt "01" ]; then
echo "action"
fi
This code send tcp packet/receive and save to file /tmp/QCH, It can give you numbers 01/12, 04/18...
echo -n "REMOTE QCH" | /tmp/nc 0.0.0.0 48360 > /tmp/QCH
Everything works fine, I wrote the code myself, but is it well optimized? maybe can be faster or better?
Greetings

No, it is not optimized. bash in general is slower than a compiled language such as C. Pipes also take up a lot a resources; most, if not all could be removed. grep and regular expressions can use a lot of resources; replacing these with exact string matches, if possible, is almost always more optimized and often possible (not suer in this case). Not storing variables, also might optimize memory usage (trivial).
Another issue, regarding correctness, is that /tmp/QCH might change in 1 second, which would break the cat

Related

Bash split stdin by null and pipe to pipeline

I have a stream that is null delimited, with an unknown number of sections. For each delimited section I want to pipe it into another pipeline until the last section has been read, and then terminate.
In practice, each section is very large (~1GB), so I would like to do this without reading each section into memory.
For example, imagine I have the stream created by:
for I in {3..5}; do seq $I; echo -ne '\0';
done
I'll get a steam that looks like:
1
2
3
^#1
2
3
4
^#1
2
3
4
5
^#
When piped through cat -v.
I would like to pipe each section through paste -sd+ | bc, so I get a stream that looks like:
6
10
15
This is simply an example. In actuality the stream is much larger and the pipeline is more complicated, so solutions that don't rely on streams are not feasible.
I've tried something like:
set -eo pipefail
while head -zn1 | head -c-1 | ifne -n false | paste -sd+ | bc; do :; done
but I only get
6
10
If I leave off bc I get
1+2+3
1+2+3+4
1+2+3+4+5
which is basically correct. This leads me to believe that the issue is potentially related to buffering and the way each process is actually interacting with the pipes between them.
Is there some way to fix the way that these commands exchange streams so that I can get the desired output? Or, alternatively, is there a way to accomplish this with other means?
In principle this is related to this question, and I could certainly write a program that reads stdin into a buffer, looks for the null character, and pipes the output to a spawned subprocess, as the accepted answer does for that question. Given the general support of streams and null delimiters in bash, I'm hoping to do something that's a little more "native". In particular, if I want to go this route, I'll have to escape the pipeline (paste -sd+ | bc) in a string instead of just letting the same shell interpret it. There's nothing too inherently bad about this, but it's a little ugly and will require a bunch of somewhat error prone escaping.
Edit
As was pointed out in an answer, head makes no guarantees about how much it buffers. Unless it only buffers single byte at a time, which would be impractical, this will never work. Thus, it seems like the only solution would be to read it into memory, or write a specific program.

The issue with your original code is that head doesn't guarantee that it won't read more than it outputs. Thus, it can consume more than one (NUL-delimited) chunk of input, even if it's emitting only one chunk of output.
read, by contrast, guarantees that it won't consume more than you ask it for.
set -o pipefail
while IFS= read -r -d '' line; do
bc <<<"${line//$'\n'/+}"
done < <(build_a_stream)
If you want native logic, there's nothing more native than just writing the whole thing in shell.
Calling external tools -- including bc, cut, paste, or others -- involves a fork() penalty. If you're only processing small amounts of data per invocation, the efficiency of the tools is overwhelmed by the cost of starting them.
while read -r -d '' -a numbers; do # read up to the next NUL into an array
sum=0 # initialize an accumulator
for number in "${numbers[#]}"; do # iterate over that array
(( sum += number )) # ...using an arithmetic context for our math
done
printf '%s\n' "$sum"
done < <(build_a_stream)
For all of the above, I tested with the following build_a_stream implementation:
build_a_stream() {
local i j IFS=$'\n'
local -a numbers
for ((i=3; i<=5; i++)); do
numbers=( )
for ((j=0; j<=i; j++)); do
numbers+=( "$j" )
done
printf '%s\0' "${numbers[*]}"
done
}

As discussed, the only real solution seemed to be writing a program to do this specifically. I wrote one in rust called xstream-util. After installing it with cargo install xstream-util, you can pipe the input into
xstream -0 -- bash -c 'paste -sd+ | bc'
to get the desired output
6
10
15
It doesn't avoid having to run the program in bash, so it still needs escaping if the pipeline is complicated. Also, it currently only supports single byte delimiters.

Bash running time optimization

I am trying to solve an optimization problem and to find the most efficient way of performing the following commands:
whois -> sed -> while (exit while) ->perform action
while loop currently look like
while [x eq smth]; do
x=$((x+1))
done
some action
Maybe it is more efficient to have while true with an if inside (if clause the same as for while). Also, what is the best case using bash to evaluate the time required for every single step?

The by far biggest performance penalty and most common performance problem in Bash is unnecessary forking.
while [[ something ]]
do
var+=$(echo "$expression" | awk '{print $1}')
done
will be thousands of times slower than
while [[ something ]]
do
var+=${expression%% *}
done
Since the former will cause two forks per iteration, while the latter causes none.
Things that cause forks include but are not limited to pipe | lines, $(command expansion), <(process substitution), (explicit subshells), and using any command not listed in help (which type somecmd will identify as 'builtin' or 'shell keyword').

Well for starters you could remove $(, this creates a subshell and is sure
to slow the task down somewhat
while [ x -eq smth ]
do
(( x++ ))
done

Bash: Too many arguments

I've coded the following script to add users from a text file. It works, but I'm getting an error that says "too many arguments"; what is the problem?
#!/bin/bash
file=users.csv
while IFS="," read USRNM DOB SCH PRG PST ENROLSTAT ; do
if [ $ENROLSTAT == Complete ] ;
then
useradd $USRNM -p $DOB
else
echo "User $USRNM is not fully enrolled"
fi
done < $file
#cat users.csv | head -n 2 | tail -n 1

Use quotes. Liberally.
if [ "$ENROLSTAT" = Complete ]
(It's a single equal sign, too.) My greatest problem in shell programming is always hidden spaces. It's one of the reasons I write so much in Perl, and why, in Perl, I tell everyone on my team to avoid the shell whenever running external programs. There is just so much power in the shell, with so many little things that can trip you up, that I avoid it where possible. (And not where not possible.)

Shell script takes a list of commands as input, tries to execute them, and fails

I am, like many non-engineers or non-mathematicians who try writing algorithms, an intuitive. My exact psychological typology makes it quite difficult for me to learn anything serious like computers or math. Generally, I prefer audio, because I can engage my imagination more effectively in the learning process.
That said, I am trying to write a shell script that will help me master Linux. To that end, I copied and pasted a list of Linux commands from the O'Reilly website's index to the book Python In a Nutshell. I doubt they'll mind, and I thank them for providing it. These are the textfile `massivelistoflinuxcommands,' not included fully below in order to save space...
OK, now comes the fun part. How do I get this script to work?
#/bin/sh
read -d 'massivelistoflinuxcommands' commands <<EOF
accept
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
EOF
for i in $commands
do
$i --help | less | cat > masterlinuxnow
text2wave masterlinuxnow -o ml.wav
done

It really helps when you include error messages or specific ways that something deviates from expected behavior.
However, your problem is here:
read -d 'massivelistoflinuxcommands' commands <<EOF
It should be:
read -d '' commands <<EOF
The delimiter to read causes it to stop at the first character it finds that matches the first character in the string, so it stops at "bzc" because the next character is "m" which matches the "m" at the beginning of "massive..."
Also, I have no idea what this is supposed to do:
$i --help | less | cat > masterlinuxnow
but it probably should be:
$i --help > masterlinuxnow
However, you should be able to pipe directly into text2wave and skip creating an intermediate file:
$i --help | text2wave -o ml.wav
Also, you may want to prevent each file from overwriting the previous one:
$i --help | text2wave -o ml-$i.wav
That will create files named like "ml-accept.wav" and "ml-bison.wav".
I would point out that if you're learning Linux commands, you should prioritize them by frequency of use and/or applicability to a beginner. For example, you probably won't be using bison right away`.

The first problem here is that not every command has a --help option!! In fact the very first command, accept, has no such option! A better approach might be executing man on each command since a manual page is more likely to exist for each of the commands. Thus change;
$i --help | less | cat > masterlinuxnow
to
man $i >> masterlinuxnow
note that it is essential you use the append output operator ">>" instead of the create output operator ">" in this loop. Using the create output operator will recreate the file "masterlinuxnow" on each iteration thus containing only the output of the last "man $i" processed.
you also need to worry about whether the command exists on your version of linux (many commands are not included in the standard distribution or may have different names). Thus you probably want something more like this where the -n in the head command should be replace by the number of lines you want, so if you want only the first 2 lines of the --help output you would replace -n with -2:
if [ $(which $i) ]
then
$i --help | head -n >> masterlinuxnow
fi
and instead of the read command, simply define the variable commands like so:
commands="
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
"
Putting this all together, the following script works quite nicely:
commands="
bison
bzcmp
bzdiff
bzgrep
bzip2
bzless
bzmore
c++
lastb
lastlog
strace
strfile
zmore
znew
"
for i in $commands
do
if [ $(which $i) ]
then
$i --help | head -1 >> masterlinuxnow 2>/dev/null
fi
done

You're going to learn to use Linux by listening to help descriptions? I really think that's a bad idea.
Those help commands usually list every obscure option to a command, including many that you will never use-- especially as a beginner.
A guided tutorial or book would be much better. It would only present the commands and options that will be most useful. For example, that list of commands you gave has many that I don't know-- and I've been using Linux/Unix extensively for 10 years.

Easy parallelisation

I often find myself writing simple for loops to perform an operation to many files, for example:
for i in `find . | grep ".xml$"`; do bzip2 $i; done
It seems a bit depressing that on my 4-core machine only one core is getting used.. is there an easy way I can add parallelism to my shell scripting?
EDIT: To introduce a bit more context to my problems, sorry I was not more clear to start with!
I often want to run simple(ish) scripts, such as plot a graph, compress or uncompress, or run some program, on reasonable sized datasets (usually between 100 and 10,000). The scripts I use to solve such problems look like the one above, but might have a different command, or even a sequence of commands to execute.
For example, just now I am running:
for i in `find . | grep ".xml.bz2$"`; do find_graph -build_graph $i.graph $i; done
So my problems are in no way bzip specific! (Although parallel bzip does look cool, I intend to use it in future).

Solution: Use xargs to run in parallel (don't forget the -n option!)
find -name \*.xml -print0 | xargs -0 -n 1 -P 3 bzip2

This perl program fits your needs fairly well, you would just do this:
runN -n 4 bzip2 `find . | grep ".xml$"`

gnu make has a nice parallelism feature (eg. -j 5) that would work in your case. Create a Makefile
%.xml.bz2 : %.xml
all: $(patsubt %.xml,%xml.bz2,$(shell find . -name '*.xml') )
then do a
nice make -j 5
replace '5' with some number, probably 1 more than the number of CPU's. You might want to do 'nice' this just in case someone else wants to use the machine while you are on it.

The answer to the general question is difficult, because it depends on the details of the things you are parallelizing.
On the other hand, for this specific purpose, you should use pbzip2 instead of plain bzip2 (chances are that pbzip2 is already installed or at least in the repositories or your distro). See here for details: http://compression.ca/pbzip2/

I find this kind of operation counterproductive. The reason is the more processes access the disk at the same time the higher the read/write time goes so the final result ends in a longer time. The bottleneck here won't be a CPU issue, no matter how many cores you have.
Haven't you ever performed a simple two big file copies at the same time on the same HD drive? I is usually faster to copy one and then another.
I know this task involves some CPU power (bzip2 is demanding compression method), but try measuring first CPU load before going the "challenging" path we all technicians tend to choose much more often than needed.

I did something like this for bash. The parallel make trick is probably a lot faster for one-offs, but here is the main code section to implement something like this in bash, you will need to modify it for your purposes though:
#!/bin/bash
# Replace NNN with the number of loops you want to run through
# and CMD with the command you want to parallel-ize.
set -m
nodes=`grep processor /proc/cpuinfo | wc -l`
job=($(yes 0 | head -n $nodes | tr '\n' ' '))
isin()
{
local v=$1
shift 1
while (( $# > 0 ))
do
if [ $v = $1 ]; then return 0; fi
shift 1
done
return 1
}
dowait()
{
while true
do
nj=( $(jobs -p) )
if (( ${#nj[#]} < nodes ))
then
for (( o=0; o<nodes; o++ ))
do
if ! isin ${job[$o]} ${nj[*]}; then let job[o]=0; fi
done
return;
fi
sleep 1
done
}
let x=0
while (( x < NNN ))
do
for (( o=0; o<nodes; o++ ))
do
if (( job[o] == 0 )); then break; fi
done
if (( o == nodes )); then
dowait;
continue;
fi
CMD &
let job[o]=$!
let x++
done
wait

If you had to solve the problem today you would probably use a tool like GNU Parallel (unless there is a specialized parallelized tool for your task like pbzip2):
find . | grep ".xml$" | parallel bzip2
To learn more:
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line
with love you for it.

I think you could to the following
for i in `find . | grep ".xml$"`; do bzip2 $i&; done
But that would spin off however many processes as you have files instantly and isn't an optimal as just running four processes at a time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio