Why avoid subshells?

Why avoid subshells? - bash

I've seen a lot of answers and comments on Stack Overflow
that mention doing something to avoid a subshell. In some
cases, a functional reason for this is given
(most often, the potential need to read a variable
outside the subshell that was assigned inside it), but in
other cases, the avoidance seems to be viewed as an end
in itself. For example
union of two columns of a tsv file
suggesting { ... ; } | ... rather than
( ... ) | ..., so there's a subshell either way.
unhide hidden files in unix with sed and mv commands
Linux bash script to copy files
explicitly stating,
"the goal is just to avoid a subshell"
Why is this? Is it for style/elegance/beauty? For
performance (avoiding a fork)? For preventing likely
bugs? Something else?

There are a few things going on.
First, forking a subshell might be unnoticible when it happens only once, but if you do it in a loop, it adds up to measurable performance impact. The performance impact is also greater on platforms such as Windows where forking is not as cheap as it is on modern Unixlikes.
Second, forking a subshell means you have more than one context, and information is lost in switching between them -- if you change your code to set a variable in a subshell, that variable is lost when the subshell exits. Thus, the more your code has subshells in it, the more careful you have to be when modifying it later to be sure that any state changes you make will actually persist.
See BashFAQ #24 for some examples of surprising behavior caused by subshells.

sometimes examples are helpful.
f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -n "$( grep 're' <<< $f )" ]];then ((y++));fi;done;echo $y
real 0m3.878s
user 0m0.794s
sys 0m2.346s
1000
f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -z "${f/*re*/}" ]];then ((y++));fi;done;echo $y
real 0m0.041s
user 0m0.027s
sys 0m0.001s
1000
f='fred';y=0;time for ((i=0;i<1000;i++));do if grep -q 're' <<< $f ;then ((y++));fi;done >/dev/null;echo $y
real 0m2.709s
user 0m0.661s
sys 0m1.731s
1000
As you can see, in this case, the difference between using grep in a subshell and parameter expansion to do the same basic test is close to 100x in overall time.
Following the question further, and taking into account the comments below, which clearly fail to indicate what they are trying to indicate, I checked the following code:
https://unix.stackexchange.com/questions/284268/what-is-the-overhead-of-using-subshells
time for((i=0;i<10000;i++)); do echo "$(echo hello)"; done >/dev/null
real 0m12.375s
user 0m1.048s
sys 0m2.822s
time for((i=0;i<10000;i++)); do echo hello; done >/dev/null
real 0m0.174s
user 0m0.165s
sys 0m0.004s
This is actually far far worse than I expected. Almost two orders of magnitude slower in fact in overall time, and almost THREE orders of magnitude slower in sys call time, which is absolutely incredible.
https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html
Note that the point of demonstrating this is to show that if you are using a testing method that's quite easy to fall into the habit of using, subshell grep, or sed, or gawk (or a bash builtin, like echo), which is for me a bad habit I tend to fall into when hacking fast, it's worth realizing that this will have a significant performance hit, and it's probably worth the time avoiding those if bash builtins can handle the job natively.
By carefully reviewing a large programs use of subshells, and replacing them with other methods, when possible, I was able to cut about 10% of the overall execution time in a just completed set of optimizations (not the first, and not the last, time I have done this, it's already been optimized several times, so gaining another 10% is actually quite significant)
So it's worth being aware of.
Because I was curious, I wanted to confirm what 'time' is telling us here:
https://en.wikipedia.org/wiki/Time_(Unix)
The total CPU time is the combination of the amount of time the CPU or
CPUs spent performing some action for a program and the amount of time
they spent performing system calls for the kernel on the program's
behalf. When a program loops through an array, it is accumulating user
CPU time. Conversely, when a program executes a system call such as
exec or fork, it is accumulating system CPU time.
As you can see in particularly the echo loop test, the cost of the forks is very high in terms of system calls to the kernel, those forks really add up (700x!!! more time spent on sys calls).
I'm in an ongoing process of resolving some of these issues, so these questions are actually quite relevant to me, and the global community of users who like the program in question, that is, this is not an arcane academic point for me, it's realworld, with real impacts.

well, here's my interpretation of why this is important: it's answer #2!
there's no little performance gain, even when it's about avoiding one subshell… Call me Mr Obvious, but the concept behind that thinking is the same that's behind avoiding useless use of <insert tool here> like cat|grep, sort|uniq or even cat|sort|uniq etc..
That concept is the Unix philosophy, which ESR summed up well by a reference to KISS: Keep It Simple, Stupid!
What I mean is that if you write a script, you never know how it may get used in the end, so every little byte or cycle you can spare is important, so if your script ends up eating billions of lines of input, then it will be by that many forks/bytes/… more optimized.

I think the general idea is it makes sense to avoid creating an extra shell process unless otherwise required.
However, there are too many situations where either can be used and one makes more sense than the other to say one way is overall better than the other. It seems to me to be purely situational.

Related

How to reduce overhead and run applescripts faster?

I'm developing an app that may have to run many applescripts in sequence, and I am looking for any way to run applescripts faster. I'm not talking about the internals, I'm talking just about the execution startup/cleanup.
Example:
onePlusOne.applescript:
1+1
I then compiled this with osacompile -o onePlusOne.scpt onePlusOne.applescript
Now, the test (in a bash shell):
time zsh -c "(( val = 1 + 1 ))":
real 0m0.013s
user 0m0.004s
sys 0m0.007s
time osascript onePlusOne.scpt:
real 0m0.054s
user 0m0.026s
sys 0m0.022s
I'm wondering if anything can be done about the additional overhead and to execute applescripts faster.

You don’t say what type of app, or what kind of AppleScripts, or whether the scripts are part of the app or user-supplied. Nor do you say if you’ve actually built a prototype and identified actual performance problems, or are talking purely hypotheticals at this point.
Assuming a typical sandboxed Swift/ObjC desktop app:
For arbitrary user-supplied AppleScripts, you usually want to use NSUserAppleScriptTask as that runs scripts outside the app’s own sandbox. One big limitation: its subprocesses aren’t persistent, so you can’t load a script once and call its handlers multiple times; you have to create a new subprocess each time, and if you want to preserve scripts’ state between calls then you’ll have to arrange something yourself.
Exception to the above: if your user-supplied scripts can run inside the limitations of your app’s sandbox then running them via NSAppleScript/OSAScript is an option, and those do allow you to load a script once and call it multiple times.
For built-in AppleScripts, use the AppleScript-ObjC bridge to expose your scripts as ObjC subclasses, and call their handlers directly. The overheads there are mostly in the time it takes to cross the bridge.
Otherwise, the overheads are the overheads; there’s usually not a lot you can do about them. TBH, your average AppleScript will spend more time waiting on IO (Apple event IPC is powerful but slow) or churning through pathological algorithms (e.g. iterating a list is generally O(n²) due to AS’s shonky internals, and most AppleScripters are amateurs so will write inefficient code). There may be ways to ameliorate poor runtime performance, but that is a far larger discussion.
One last consideration: with AppleScript-based automation, “Is it fast?” is less important than “Is it fast than doing the same task by hand”? And even a slow AppleScript will do a job 10–100x quicker than a fast human.

Why use builtin commands over external programs (in bash scripts)?

This question is concerned with the negative impacts of using external programs versus instead of built-in constructs -- specifically concerning sed, and external programs in general.
My thinking is that in order to maximize compatibility across UNIX systems, one should use builtin commands. However, some programs are virtually standard. Consider this example:
# Both functions print an array definition for use in
# assignments, for loops, etc.
uses_external() {
declare -p $1 \
| sed -e "s/declare \-a [^=]*=\'\(.*\)\'\$/\1/" \
| sed "s/\[[0-9]*\]\=//g"
}
uses_builtin() {
local r=$( declare -p $1 )
r=${r#declare\ -a\ *=}
echo ${r//\[[0-9]\]=}
}
In terms of compatibility, is there much of a difference between uses_builtin() and uses_external()?
With regards to compatibility, is there a certain class of external programs that are nearly universal? Is there a resource that gives this kind of info? (For the example above, I had to read though many sources before I felt comfortable assuming that sed is a more compatible choice than awk or a second language.)
I really want to weigh the pros and cons, so feel free to point out other considerations between builtin commands and external programs (i.e. performance, robustness, support, etc). Or, does the question of "builtin vs external" generally a per-program matter?

Objectively speaking, using built-in commands is more efficient, since you don't need to fork any new processes for them. (Subjectively speaking, the overhead of such forking may be negligible.) Large numbers of built-in commands that could be subsumed by a single call to an external program may be slower.
Use of built-ins may or may not produce more readable code. It depends on who is reading the code.

Counterintuitively, the builtin is slower for large data sets with your example
Parameter expansion slow for large data sets

Would recursive ls and grep be faster than find on a large filesystem? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a question that could use a theoretical answer.
I'm searching over a large, 100+TB volume for all files with a specific attribute. To do this, I've been using the "find" command, since it does everything that I want.
That is, except run in a reasonable amount of time. I realize that traversing a huge filesystem will be time-consuming in any case, but a possible solution occurred to me.
What if one would just use ls and grep if possible, recursively? Note: the code below isn't meant to be syntactically correct. It's only for illustration.
my_ls{
# get a listing of all files in the directory passed
var=`ls -lsa $1`
# iterate over each file/directory returned by ls
for each file/directory in $var
if $each is a directory
my_ls $each
done
# search the lines output from ls for the attributes
echo $var | grep $searchstring
}
Would this idea be faster overall than find for a large filesystem? The memory requirements could potentially get large quickly, but not too much so. It might also be possible to parallelize this, and offload the threads to a GPU for faster processing (not in bash I know, but in general).
Edit: Yes, I am quite dim for suggesting paralleization of an io-bound operation in most cases.

Using ls and grep is not only slower (adding overhead for forking, waiting, reading and writing to the pipeline, etc); it's also incorrect.
See http://mywiki.wooledge.org/ParsingLs for a description of why using ls in scripts is evil (in the "causes bugs, some of them security-exploitable" sense).

I strongly suspect the overhead of spawning processes repeatedly would far outweigh how much resource a find would take. You should consider where the resource bottleneck is, and for navigating a filesystem, it's going to be the disk access. The CPU will be negligible.

I'm guessing no. Both are synchronous operations, but you have to start up a whole new process to recurse, which has its own overhead. If you're looking to spead up the operation, I would suggest using a map/reduce model.
Typically map/reduce is used when parsing file or database contents, but the idea can be adapted to your situation. Here's an introduction to map/reduce: http://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/
EDIT:
As many have noted here, this is an IO bound process, and the typical implementation of map/reduce is a parallel system with many mappers and reducers, but this doesn't mean you can't benefit from splitting your task into a map function and a reduce function. The map/reduce model is still useful.
For what I'm proposing, the mapper should be one thread which recursively finds all files under a specified path. The reducer then evaluates whether the file is owned by the right user (or whatever predicate you have).
This decouples the IO from the evaluation, meaning the IO thread is never pausing to evaluate. This might only save you a microsecond per file, but on a large filesystem it could add up to significant savings.
What I'm describing is not EXACTLY the map/reduce people know and are comfortable with, but it's similar enough to be a useful starting point.

Single-Tasking for programming competitions

I will start with the question and then proceed to explain the need:
Given a single C++ source code file which compiles well in modern g++ and uses nothing more than the standard library, can I set up a single-task operating system and run it there?
EDIT: Following the comment by Chris Lively, I would have better asked: What's the easiest way you can suggest to try to tweak linux into effectively giving me a single-tasking behavior.
Nevertheless, it seems like I did get a good answer although I did not phrase my question well enough. See the second paragraph in sarnold's answer regarding the scheduler.
Motivation:
In some programming competitions the communication between a contestant's program and the grading program involves a huge number of very short interactions.
Thus, using getrusage to measure the time spent by a contestant's program is inaccurate because getrusage works by sampling a process at constant intervals (usually around once per 10ms) which are too large compared to the duration of each interaction.
Another approach to timing would be to measure time before and after the program is run using something like *clock_gettime* and then subtract their values. We should also subtract the amount of time spent on I/O and this can be done be intercepting printf and scanf using something like LD_PRELOAD and accumulate the time spent in each of these functions by checking the time just before and just after each call to printf/scanf (it's OK to require contestants to use these functions only for I/O).
The method proposed in the last paragraph is ofcourse only true assuming that the contestant's program is the only program running which is why I want a single-tasking OS.
To run both the contestant's program and the grading program at the same time I would need a mechanism which, when one of these program tries to read input and blocks, runs the other program until it write enough output. I still consider this to be single tasking because the programs are not going to run at the same time. The "context switch" would occur when it is needed.
Note: I am aware of the fact that there are additional issues to timing such as CPU power management, but I would like to start by addressing the issue of context switches and multitasking.

First things first, what I think would suit your needs best would actually be a language interpreter -- one that you can instrument to keep track of "execution time" of the program in some purpose-made units like "mems" to represent memory accesses or "cycles" to represent the speeds of different instructions. Knuth's MMIX in his The Art of Computer Programming may provide exactly that functionality, though Python, Ruby, Java, Erlang, are all reasonable enough interpreters that can provide some instruction count / cost / memory access costs should you do enough re-writing. (But losing C++, obviously.)
Another approach that might work well for you -- if used with extreme care -- is to run your programming problems in the SCHED_FIFO or SCHED_RR real-time processing class. Programs run in one of these real-time priority classes will not yield for other processes on the system, allowing them to dominate all other tasks. (Be sure to run your sshd(8) and sh(1) in a higher real-time class to allow you to kill runaway tasks.)

When does whitespace impact on performance?

This is something I've always wondered about, so here goes.
When writing code, I was/am taught to space out lines, comment them, etc... to improve the readibility (as I guess most of us are). I obviously don't see this as any kind of problem, but it got me thinking, if all of this whitespace and commented sections are being ignored by the compiler/interpreter or whatever else, how much does this impact on the its performance?
Admittedly, I don't know a lot about how a compiler operates - only the basic concepts. However, I have a fair idea that for one to be able to "ignore whitespace", it would first need to identify it (at least), and that takes work, and therefore time.
So then I thought, what about whitespace or comments at extreme levels? Say, millions or billions of sections of them?
I guess the question I'm asking is: At what point (ie. extreme level) will ignored sections of code impact a compiler's/interpreter's ability to produce a timely result and therefore impact on a user's experience?
Thanks.

Try this:
Do comments affect Perl performance?
Edit for comment.
Simple example using a hello world in Scheme with varying zillions of comment lines:
netbsd1# ls -l file*
-rw-r--r-- 1 root wheel 1061 Mar 11 00:01 file.out
-rw-r--r-- 1 root wheel 102041 Mar 11 00:01 file1.out
-rw-r--r-- 1 root wheel 10200041 Mar 11 00:01 file2.out
-rw-r--r-- 1 root wheel 1020000041 Mar 11 00:03 file3.out
netbsd1# for i in file*
> do
> echo $i
> time ./scm $i
> done
file.out
hello world
0.06s real 0.01s user 0.01s system
file1.out
hello world
0.03s real 0.01s user 0.02s system
file2.out
hello world
0.64s real 0.28s user 0.30s system
file3.out
hello world
61.36s real 11.78s user 41.10s system
netbsd1#
Clearly, the 1GB file had major impact which is not necessarily surprising considering I only have 512M of RAM on this box.
Also, this is interpreting/compile speed. If you actually compiled these files, the runtimes would all be identical. You can draw your own conclusions defining impact.

It will not affect the compiled data as the word implies. However please dont go for comment diarrhea, it will affect other programmers performance.

It depends.
In compiled languages, compilation could take longer, but you probably don't care since this is done once.
In interpreted languages, there will be wasted load time, execution time and more memory usage if the interpreter keeps the text in memory.
In something like JavaScript being delivered to a browser, you don't only have to worry about parse time, but also transmitting all of those comments to your client's browser. Because of this, many people will run their scripts through a minifier that pulls out comments and does other tricks to reduce code size.
For severely over-commented code, plagued by comments emitted by code-generators, over-zealous revision control systems, and "religious commenters", I would actually worry more about the poor reader / reviewer that has to wade through all of that mostly useless and probably out of synch text to get to the code.

Compiling (and linking) is phase 1.
Execution is phase 2.
Since phase 1 is at least O(input length) you can expect phase 1 to take time proportional (at least) to the input length.
If the file length is under 10^4 lines it probably won't bother you too much.
If the file length is 10^12 lines it might take years, if something doesn't break first.
But that will not affect phase 2. What affects phase 2 is how much work the program does and how much it needs to do.

If you're talking about a compiled binary then there's exactly 0 impact on performance - they're just a sequence of instructions executed, whitespace doesn't really exist as a concept in this sense. If you're talking about interpreted languages then I guess theoretically millions of lines of whitespace could have a very slight impact on performance, but not enough to ever be noticeable.
In short, whilst an interesting question from an academic viewpoint it's not something you should ever worry about, whether you're using a compiled or interpreted language. Always favour readability and comments. If you quote performance for a reason not to use whitespace or comments, future maintainers of your code will be out to get you!

Whitespace in source files has zero impact on a user's experience. Once the binary is compiled, that's that. It doesn't matter if the compiler took delta-t longer to parse your source code because there were millions of comments.

White space affects performance when whites space compiles into machine instructions. Luckily, most sane languages don't do this.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio