The Unix philosophy teaches that we should develop small programs that do one thing well. It also teaches that we should separate policy from mechanics. I guess one way to take this is to design a text-based shell command first and build a gui on top of that later (if desired).
I truly like the idea that small programs can be composed (piped together) into more complex systems. I also like the fact that simple, focused designs should theoretically need less maintenance than a monolithic system that binds all its rules together.
How sound would it be to program something (in Ruby or Python for example) that relegates some of its functionality to shell commands called straight from the code? Taking this a step further, does it make sense to deliberately design a shell command that is intended to be called directly from code (compiled or scripted)? Obviously, this would only make sense if the shell command had some worthy console use.
I can't say from my experience that this is a practice I've seen much of. More times than not task-specific code relies on task-specific libraries. Of course, it's possible that, unbeknownst to me, I have made use of libraries which are actually just wrappers around shell commands. (Or rather the shell command is a wrapper around some library.)
The unix paradigm is modularity. You should write your program as a bunch of modules, which can then be extracted into multiple programs if you want to. However, executing a new program whenever you'd like to make a function call is slow and unpractical.
Related
at work, we are using docker and docker-compose, our developers need to start many containers locally and import a large database, there are many services that need to run together for development to be successful and easy.
so we sort of define reusable functions as make commands to make the code easier to maintain, is there another way to define and reuse many shell commands better than make.
for us due to network limitations running docker locally is the only option.
we managed to solve this challenge and make our developers' life easier by abstracting away complex shell commands behind multiple make targets, and in order to split these numerous make targets that control our docker infrastructure and containers we decided to split the targets among many files with .mk extension.
there are multiple make commands, like 40 of them, some of them are low level, some are meant to be called by developers to do certain tasks.
make launch_app
make import_db_script
make build_docker_images
but lately things are starting to become a little slow, with make commands calling other make commands internally, each make call is taking significant amount of time, since each lower level make call has to go through all defined .mk files, and do some calculations, as it shows when we run make -d, so it starts to add up to a considerable overhead.
is there any way to manage a set of complex shell commands using anything other than make, while still being easy for our developers to call.
thanks in advance.
Well, you could always just write your shell commands in a shell script instead of a makefile. Using shell functions, shell variables, etc. it can be managed. You don't give examples of how complex your use of make constructs is.
StackOverflow is not really a place to ask open-ended questions like "what's the best XYZ". So instead I'll treat this question as, "how can I speed up my makefiles".
To me it sounds like you just have poorly written makefiles. Again, you don't show any examples but it sounds like your rules are invoking lots of sub-makes (e.g., your rule recipes run $(MAKE) etc.) That means lots of processes invoked, lots of makefiles parsed, etc. Why don't you just have a single instance of make and use prerequisites, instead of sub-makes, to run other targets? You can still split the makefiles up into separate files then use include ... to gather them all into a single instance of make.
Also, if you don't need to rebuild the makefiles themselves you should be sure to disable the built-in rules that might try to do that. In fact, if you are just using make to run docker stuff you can disable all the built-in rules and speed things up a good bit. Just add this to your makefile:
MAKEFLAGS += -r
(see Options Summary for details of this option).
ETA
You don't say what version of GNU make you're using, or what operating system you're running on. You don't show any examples of the recipes you're using so we can see how they are structured.
The problem is that your issue, "things are slow", is not actionable, or even defined. As an example, the software I work on every day has 41 makefiles containing 22,500 lines (generated from cmake, which means they are not as efficient as they could be: they are generic makefiles and not using GNU make features). The time it takes for my build to run when there is nothing to actually do (so, basically the entire time is taken by parsing the makefiles), is 0.35 seconds.
In your comments you suggest you have 10 makefiles and 50 variables... I can't imagine how any detectable slowness could be caused by this size of makefile. I'm not surprised, given this information, that -r didn't make much difference.
So, there must be something about your particular makefiles which is causing the slowness: the slowness is not inherent in make. Obviously we cannot just guess what that might be. You will have to investigate this.
Use time make launch_app. How long does that take?
Now use time make -n launch_app. This will read all makefiles but not actually run any commands. How long does that take?
If make -n takes no discernible time then the issue is not with make, but rather with the recipes you've written and switching to a different tool to run those same recipes won't help.
If make -n takes a noticeable amount of time then something in your makefiles is slow. You should examine it for uses of $(shell ...) and possibly $(wildcard ...); those are where the slowness will happen. You can add $(info ...) statements around them to get output before and after they run: maybe they're running lots of times unexpectedly.
Without specific examples of things that are slow, there's nothing else we can do to help.
TLDP's Advanced Bash Scripting Guide states that shell scripts shouldn't be used for "situations where security is important, where you need to guarantee the integrity of your system and protect against intrusion, cracking, and vandalism."
What makes shell scripts unsuitable for such a use case?
Because of the malleability of the shell, it is difficult to verify that a shell script performs its intended function and only that function in the face of adversarial input. The way the shell behaves depends on the environment, plus the settings of its own numerous configuration variables. Each command line is subject to multiple levels of expansion, evaluation and interpolation. Some shell constructs run in subprocesses while the variables the construct contains are expanded in the parent process. All of this is counter to the KISS principle when designing systems that might be attacked.
Probably because it's just easy to screw up. When the PATH is not set correctly, your script will start executing the wrong commands. Putting a space somewhere in a string might cause it to become two strings later on. These can lead to exploitable security holes. In short: shells give you some guarantees as to how your script will behave, but they're too weak or too complex for truly secure programming.
(To this I would like to add that secure programming is an art in itself, and screwing up is possible in any language.)
I would disagree with that statement, as there is nothing about scripts that make them inherently unsafe. Bash scripting are perfectly safe if some simple guidelines are followed:
Does the script contain info that others shouldn't be able to view?
If so, make sure it's only readable by the owner.
Does the script depend on input data from somewherE? If so, ensure that input data
can not be tainted in any way, or that tainted data can be detected
and discarded.
Does it matter if others were to try and run the
script? If so, as with the first point, ensure that nobody can execute it, and preferably not read from it. chmod 0700 is generally a good idea for scripts that perform system functions.
And the cases where you'd want a script to have a setuid (via its interpreter) are
extremely rare
The two points that separate a script from a compiled program would be that the source is visible, and that an interpreter executes it. As long as the interpreter hasn't been compromised (such as having a setuid bit on it), you'd be fine.
When writing scripts to do system tasks, typos and screwups and general human error when writing it do to some extent represent a potential security failure, but that would also be the case with compiled programs (and a lot of people tend to ignore the fact that compiled programs can also be disassembled)
It is worth noting that in most (if not all) linux flavors, most (if not all, in fact, can't think of any that aren't) services are started via a shellscript.
it's easier for bad boys to make shell script work differently (it interacts a lot with other processes, PATH, shell functions, prifile)
it's harder for good boys to deal with sensitive data (passing passwords, etc)
Let me start with giving an example of what I'm dealing with first:
I often call existed Perl scripts from previous engineers to process some data, and then proceed further with my script. I either use system or back-ticks to call other people scripts within my script.
Now, I'm wondering if I rewrite those scripts as packages and use require or use to include those packages in my script, will it increase the processing speed? How big of a difference would it be?
Benefits:
It would save the time taken to load the shell, load perl, compile the script and the module it uses. That's a couple of seconds minimum, but it could be much larger.
If you had to serialize data to pass to the child, you also save the time taken to serialize and deserialize the data.
It would allow more flexible interfaces.
It would make error handling easier and more flexible.
Downsides:
Since everything is now in the same process, the child can have a much larger effect on the parent. e.g. A crash in the child will take down the parent.
Is there a way to hide a system call from strace and a dynamic library call from ltrace? For example, the use of system (<stdlib.h>).
In the last class for my software construction this semester, the instructor revealed to us that we could have gotten away with using the system library function call in many parts of the command shell project we were assigned instead of the more complicated fork, exec, readdir, stat, dup, and pipe system calls we were told to use.
The way system works, he said, is you simply pass in a string of the command you want to execute: system("cmd [flags] [args]; cmd && cmd"); and there you are.
We were not supposed to use this function, but he said he didn't check our programs for it. One way to hide its use would have been to obscure it through Macro definitions and such. However, ltrace is still able to track system down when used through Macros. I believe it even finds it when its called from a separate program, like `execvp( "./prgrm_with_system", ...).
My chance to use it is gone, but I am really curious about whether there is a way to hide system from even ltrace.
system() doesn't do anything that's magic. It doesn't even do anything that's smart (and using it is often a code smell). It also isn't a system call in the sense that the term "syscall" refers to.
You could trivially create your own version of system() using the underlying syscalls fork() and execve(), and bypass detection with ltrace... but strace would still show those calls happening.
You also could bypass ltrace with static linking, but since syscalls are by definition for things that require the OS kernel's help, you can't do without them entirely -- so tools such as strace, sysdig, truss, dtrace, and local equivalents can't be so easily avoided (without exploiting security vulnerabilities in the OS or the tools themselves).
I am aware that this is nothing new and has been done several times. But I am looking for some reference implementation (or even just reference design) as a "best practices guide". We have a real-time embedded environment and the idea is to be able to use a "debug shell" in order to invoke some commands. Example: "SomeDevice print reg xyz" will request the SomeDevice sub-system to print the value of the register named xyz.
I have a small set of routines that is essentially made up of 3 functions and a lookup table:
a function that gathers a command line - it's simple; there's no command line history or anything, just the ability to backspace or press escape to discard the whole thing. But if I thought fancier editing capabilities were needed, it wouldn't be too hard to add them here.
a function that parses a line of text argc/argv style (see Parse string into argv/argc for some ideas on this)
a function that takes the first arg on the parsed command line and looks it up in a table of commands & function pointers to determine which function to call for the command, so the command handlers just need to match the prototype:
int command_handler( int argc, char* argv[]);
Then that function is called with the appropriate argc/argv parameters.
Actually, the lookup table also has pointers to basic help text for each command, and if the command is followed by '-?' or '/?' that bit of help text is displayed. Also, if 'help' is used for a command, the command table is dumped (possible only a subset if a parameter is passed to the 'help' command).
Sorry, I can't post the actual source - but it's pretty simple and straight forward to implement, and functional enough for pretty much all the command line handling needs I've had for embedded systems development.
You might bristle at this response, but many years ago we did something like this for a large-scale embedded telecom system using lex/yacc (nowadays I guess it would be flex/bison, this was literally 20 years ago).
Define your grammar, define ranges for parameters, etc... and then let lex/yacc generate the code.
There is a bit of a learning curve, as opposed to rolling a 1-off custom implementation, but then you can extend the grammar, add new commands & parameters, change ranges, etc... extremely quickly.
You could check out libcli. It emulates Cisco's CLI and apparently also includes a telnet server. That might be more than you are looking for, but it might still be useful as a reference.
If your needs are quite basic, a debug menu which accepts simple keystrokes, rather than a command shell, is one way of doing this.
For registers and RAM, you could have a sub-menu which just does a memory dump on demand.
Likewise, to enable or disable individual features, you can control them via keystrokes from the main menu or sub-menus.
One way of implementing this is via a simple state machine. Each screen has a corresponding state which waits for a keystroke, and then changes state and/or updates the screen as required.
vxWorks includes a command shell, that embeds the symbol table and implements a C expression evaluator so that you can call functions, evaluate expressions, and access global symbols at runtime. The expression evaluator supports integer and string constants.
When I worked on a project that migrated from vxWorks to embOS, I implemented the same functionality. Embedding the symbol table required a bit of gymnastics since it does not exist until after linking. I used a post-build step to parse the output of the GNU nm tool for create a symbol table as a separate load module. In an earlier version I did not embed the symbol table at all, but rather created a host-shell program that ran on the development host where the symbol table resided, and communicated with a debug stub on the target that could perform function calls to arbitrary addresses and read/write arbitrary memory. This approach is better suited to memory constrained devices, but you have to be careful that the symbol table you are using and the code on the target are for the same build. Again that was an idea I borrowed from vxWorks, which supports both teh target and host based shell with the same functionality. For the host shell vxWorks checksums the code to ensure the symbol table matches; in my case it was a manual (and error prone) process, which is why I implemented the embedded symbol table.
Although initially I only implemented memory read/write and function call capability I later added an expression evaluator based on the algorithm (but not the code) described here. Then after that I added simple scripting capabilities in the form of if-else, while, and procedure call constructs (using a very simple non-C syntax). So if you wanted new functionality or test, you could either write a new function, or create a script (if performance was not an issue), so the functions were rather like 'built-ins' to the scripting language.
To perform the arbitrary function calls, I used a function pointer typedef that took an arbitrarily large (24) number of arguments, then using the symbol table, you find the function address, cast it to the function pointer type, and pass it the real arguments, plus enough dummy arguments to make up the expected number and thus create a suitable (if wasteful) maintain stack frame.
On other systems I have implemented a Forth threaded interpreter, which is a very simple language to implement, but has a less than user friendly syntax perhaps. You could equally embed an existing solution such as Lua or Ch.
For a small lightweight thing you could use forth. Its easy to get going ( forth kernels are SMALL)
look at figForth, LINa and GnuForth.
Disclaimer: I don't Forth, but openboot and the PCI bus do, and I;ve used them and they work really well.
Alternative UI's
Deploy a web sever on your embedded device instead. Even serial will work with SLIP and the UI can be reasonably complex ( or even serve up a JAR and get really really complex.
If you really need a CLI, then you can point at a link and get a telnet.
One alternative is to use a very simple binary protocol to transfer the data you need, and then make a user interface on the PC, using e.g. Python or whatever is your favourite development tool.
The advantage is that it minimises the code in the embedded device, and shifts as much of it as possible to the PC side. That's good because:
It uses up less embedded code space—much of the code is on the PC instead.
In many cases it's easier to develop a given functionality on the PC, with the PC's greater tools and resources.
It gives you more interface options. You can use just a command line interface if you want. Or, you could go for a GUI, with graphs, data logging, whatever fancy stuff you might want.
It gives you flexibility. Embedded code is harder to upgrade than PC code. You can change and improve your PC-based tool whenever you want, without having to make any changes to the embedded device.
If you want to look at variables—If your PC tool is able to read the ELF file generated by the linker, then it can find out a variable's location from the symbol table. Even better, read the DWARF debug data and know the variable's type as well. Then all you need is a "read-memory" protocol message on the embedded device to get the data, and the PC does the decoding and displaying.