accelerate Tcl eval - performance

I'm currently writing a Tcl-based tool for symbolic matrix manipulation, but the code is getting slow. I'm looking for ways to accelerate my Tcl code (Tcl version 8.6).
I have one suspicion. My code builds lists with a command name as the first element and command arguments as the following elements (this comes from emulating an object-oriented approach). I use eval to invoke these commands (and this is done often in the recursive processing). I read at https://wiki.tcl-lang.org/page/eval and https://wiki.tcl-lang.org/page/Tcl+Performance that eval may be slow.
I have three questions:
What would be the fastest way to invoke a command from a list with command name and parameters which is constructed just beforehand?
Would it accelerate the code to separate the command name myCmd and the parameter list myPar and invoke the command with [$myCmd {*}$myPar] instead (suggested at https://stackoverflow.com/a/27619692/3852630)?
Is the trick with if 1 instead of eval still promising in 8.6?
Thanks a lot for your help!

Above all, don't assume: time it to be sure. Be aware when timing things that repeatedly running a thing may change the time it takes to run it (as caches warm up). Think carefully about what you want to actually get the speed of.
The eval command is usually slow, but not in all cases. If you give it a list that you've constructed (e.g., with list or linsert or lappend or…) then it's fairly fast as it can avoid reparsing the input; it knows, but only in that case, that it can skip straight to dispatching to the command implementation. The other case that is fast is when you give it a value that was previously given to eval; the bytecode is already built and cached. These notes also apply with uplevel.
Doing $myCmd {*}$myParameters is fairly fast too; that's bytecoded into “assemble the words on the Tcl operand stack and do the right command dispatch” which is very close to what it would be for an arbitrary user command anyway (which very rarely have direct bytecode implementations).
I'd expect things with if 1 to be very quick in some cases and very slow in others; it forces full compilation, so if things can be cached well then that will be fast and if things can't it will be slow. And if you're just calling a command, it won't make much difference at all at best. The cases where it wins are when the thing being called is itself a bytecoded command and where you can cache things correctly.
If you're dealing with an ordinary command (e.g., a procedure, or one of Tcl's commands that touch the OS), I'd go with option 2: $myCmd {*}$myParameters or variants on it. It's about as fast as you're going to get. But I would not do:
set myParameters [linsert $myOriginalValues 0 "literal1" [cmdOutput2] $value3]
$myCmd {*}$myParameters
That's ridiculous. This is clearer and cleaner and faster:
$myCmd "literal1" [cmdOutput2] $value3 {*}$myOriginalValues
Part of the point of expansion syntax ({*}) is that you don't need to do complex argument marshalling, and that's good because complexity is hard to get right all the time.

A note about K and unsharing objects
Avoid copying data in memory. Change
set mylist [linsert $mylist 0 some new content]
to
set mylist [linsert $mylist[set mylist ""] 0 some new content]
This dereferences the value of the variable and then sets the variable to
the empty string. This reduces the variable's reference count.
See also https://stackoverflow.com/a/64117854/7552

Related

Bash: ensuring a variable is set without erasing any existing value

Let's say I'm running a bash script under set -u. Obviously, for any given variable, I need to ensure that it's set. Something like:
foo=
However, if I want to keep any pre-existing value that might be set by my caller, this would overwrite it. A simple solution to this problem is to do this instead:
: ${foo:=}
But I have some code that does this (more complicated) way:
foo=${foo+$foo}
Now, I know this second way works. My question is, is there any advantage to it over the first way? I am assuming there is but now can't remember what it was. Can anyone either think of an edge case (no matter how obscure) where these two constructs would behave differently, or provide a compelling explanation that they can't?
I can't think of any case where they would differ. They're just alternative logic for the same thing.
The meaning of the simple solution is: If foo is unset/empty, set it to the empty string.
The meaning of your code is: If foo is set, set it to itself, otherwise set it to an empty string.
Your code seems like more work -- why set something to itself? Just do nothing and it will keep its value. That's what the simpler version does.
You can also simplify the simple solution further by removing the : in the parameter expansion.
: ${foo=}
This makes it only test whether foo is unset. If it's set to the empty string, no default needs to be assigned.
My question is, is there any advantage to it over the first way?
Maybe this is subjective, but one advantage is that it clearly looks like a variable assignment. Anyone who sees the command foo=${foo+$foo} will immediately understand that it sets the variable foo (even if they need to look up the ${parameter+word} notation to figure out what it sets it to); but someone who sees the command : ${foo:=} is likely to completely miss that it has the side-effect of modifying foo. (The use of : is definitely a hint that something might be happening, since : itself does nothing; but it's not as blatant.)
And of course, someone searching the script for foo= will find the former but not the latter.
That said, I would personally write this as either foo="${foo-}" or foo="${foo:-}", which still makes clear that it sets foo, but is a bit simpler than foo=${foo+$foo}. I also think that readers are more likely to be familiar with ${parameter-word} than ${parameter+word}, but I haven't asked around to check.

How can I generate a list of every valid syntactic operator in Bash including input and output?

According to the Bash Reference Manual, the Bash scripting language is constituted of 4 distinct subclasses of syntactic elements:
built-in commands (alias, cd)
reserved words (if, function)
parameters and variables ($, IFS)
functions (abort, end-of-file - activated with keybindings such as Ctrl-d)
Apart from reading the manual, I became inherently curious if there was a programmatic way to list out or generate all such keywords, at least from one of the above categories. I think this could be useful in some contexts. Sometimes I wish I could see all the options available to me for what I can write in any given moment, and having that information as data, instead of a formatted manual, is convenient, focused, and can be edited, in case you want to strike out commands you know well, or that are too obscure for now.
My understanding is that Bash takes the input into stdin and passes it to the running shell process. When code is distributed in a production-ready form, it is compiled, so it runs faster. Unlike using a Python REPL, you don’t have access to the Bash source code from within Bash, so it is not a very direct route to write a program that searches through source files to find various defined commands. I mean that if you wanted to list all functions, Python has the dir() function which programmatically looks for function names in the namespace. But I don’t think Bash can do that. I think it doesn’t have a special syntax in its source files which makes it easy to find and identify all the keywords. Instead, they will be found if you simply enter them - like cd will “find” the program cd because $PATH returns the path to that command - but there’s no special way to discover them.
Or am I wrong? Technically, you could run a “brute force” search by generating every combination of symbols of every length and record when you did not get “error: unknown command” as a response.
Is there any other clever programmatic way to do this?
I mean I want to see a list of every symbol or string that the bash
compiler
Bash is not a compiler. It and every other shell I know are interpreters of various languages.
recognises and knows what to do with, including commands like
“ls” or just a symbol like “*”. I also want to see the inputs and
outputs for each symbol, i.e., some commands are executed in the shell
prompt by themselves, but what data type do they return?
All commands executed by the shell have an exit status, which is a number between 0 and 255. This is as close to a "return type" as you get. Many of them also produce idiosyncratic output to one or two streams (a standard output stream and a standard error stream) under some conditions, and many have other effects on the shell environment or operating environment.
And some
require a certain data type to standard input.
I can't think of a built-in utility whose expected input is well characterized as having a particular data type. That's not really a stream-oriented concept.
I want to do this just as a rigorous way to study the language.
If you want to rigorously study the language, then you should study its manual, where everything you describe has already been compiled. You might also want to study the POSIX shell command language manual for a slightly different perspective, which is more thorough in some areas, though what it documents differs in a few details from Bash's default behavior.
If you want to compile your own summary of Bash syntax and behavior, then those are the best source materials for such an effort.
You can get a list of all reserved words and syntactic elements of bash using this trick:
help -s '*' | cut -d: -f1
Or more accurately:
help -s \* | awk -F ': ' 'NR>2&&!/variables/{print $1}'

Refactor eval(some_variable).is_a?(Proc) to not use eval

I have some old code that looks like:
some_variable = "-> (params) { Company.search_by_params(params) }"
if eval(some_variable).is_a?(Proc)
...
Rubocop is complaining about the use of eval. Any ideas on how to remove the usage of eval?
I don't really understand Procs so any guidance on that would be appreciated.
Simple. Don't define your variable object as a string but as a lambda Proc
my_lamda = -> (params) { Company.search_by_params(params) }
if my_lambda.is_a?(Proc)
#do stuff
end
But why would you instantiate a string object which contains what appears to be a normal lambda which is a Proc, when you can define a Proc instead?
I am going to answer the question "If I want to run code at a later time, What is the difference between using a proc and a eval'd string?" (which I think is part of your question and confusion):
What eval does is take a string and parses it to code, and then runs it. This string can come from anywhere, including user input. But eval is very unsafe and problematic, especially when used with raw user input.
The problems with eval are usually:
There is almost always a better way to do it
Very dangerous and insecure
Makes debugging difficult
Slow
Using eval allows full control of the ruby process, and if you have high permissions given to the ruby process, potentially even root acmes to the machine. So the general recommendation is use 'eval' only if you absolutely have no other options, and especially not with user input.
Procs/lambdas/blocks also let you save code for later, (and solve most of the problems with eval, they are the "better way") but instead of storing arbitrary code as a string to read later, they are code already, already parsed and ready to go. In someways, they are methods you can pass around later. Making a proc/lambda gives you an object with a #call method. Then when you later want to run the proc/block/lambda, you call call([arguments...]). What you can't do with procs though is let users write arbitrary code (and generally that's good). You have to write the code for the proc in a file ruby loads (most of the time). Eval does get around that, but you really should rethink if you really want that to be possible.
Your code sample oddly combines both these methods: it evaluates a string to a lambda. So what's happening here is eval is running the code in the string right away and returning the (last) result, which in this case happens to be a lambda/proc. (Note that this would happen every time you ran eval, which would result in multiple copies of the proc, with different identities, but the same behavior). Since the code inside the string happens to make a lambda, the value returned is a Proc which can later be #call'd. So when eval is run, the code is parsed, and a new lambda is created, with the code in the lambda stored to be run at a later time. If the code inside the string did not create a lambda, the all that code would be run immediately when eval was called with the string.
This behavior might be desired, but there is probably a better way to do this, and this is definitely a foot-gun: there are at least a half dozen subtle ways this code could do unintended things if you weren't really careful with it.

Reading a TCL variable in a Makefile

I am trying to read (and eventually set to a Make variable) the value of a variable set in a TCL script in order to echo it to a file.
(I know I can simply puts the var to the file, but it's a more complicated flow, and doing it in the Make would be easier. Just want to know if this is possible)
Set final_val "Test finished! No Failures"
I then want to use the value of final_val (set in the TCL) in the Makefile that calls the script:
#file.tcl
#echo final_val >> $(out_file)
P.S. I am on TCL 8.5
It's not trivial to get a value in a Tcl script into a make run. The simplest way might be to arrange for the script to produce that variable as its only output, as you can then use a fairly simple approach:
On the Tcl side:
#!/usr/bin/env tclsh
set final_val "Test finished! No Failures"
puts $final_val
On the Make side (assuming you've made everything executable):
FINAL_VAL := $(shell thescript.tcl)
There's a lot more complexity possible than just this, of course, but this is the simplest technique that could possibly work.
If you're producing a lot of output in that script, you might need to instead use a separate post-processing of the output to get the part you want. That can get really complex; those cases are often better off being converted to using intermediate files, possibly with recursive makes, as you don't really want to have significant processing done during the variable definition phase of processing a makefile (as it causes real headaches when it goes wrong, and puts you in a world of pain with debugging).
One way that's more complex but which works really well for me is to make the Tcl code generate some file, perhaps outputinfo.mk, that contains the make variable definitions that I want. (On the Tcl side, you're just writing to a file. There's loads of ways to do that.)
Then, I'd make the main makefile have a dependency rule that says that you generate outputinfo.mk you need to run the Tcl script, and then say that the makefile wants to include that outputinfo.mk:
outputinfo.mk:
thescript.tcl > outputinfo.mk
include outputinfo.mk
(For the sake of argument, I'm assuming here that the script writes the file contents to stdout with puts.)
This works well, since make knows about such dependencies caused by include and does the right thing.

`functions` in bash shell

In the Z shell there's a handy command that returns a list of all available functions. The command is, conveniently, called functions. I cannot find a similar alternative in Bash. I threw together a quick & dirty (and wholly unacceptable) function to approximately do the same thing, but it has at least one glaring problem: since it relies on parsing files you must either list all the files to look in (which may become stale) or give an expression (which is guaranteed to give files you don't want to look in, such as .bash_history).
Here's the function, since I know someone will ask for it if I don't post it, but I'm pretty sure it's a dead end, or at least the wrong approach.
functions() {
grep "^function " "$HOME/."{bashrc,bash_profile,aliases,functions,projects,variables} | sort | sed -e 's/{//' | uniq
}
I could improve on this wrong-headed approach by parsing .bash_profile and getting a list of all sourced files and then parsing them for functions, but by the time you add the following complications into the mix, it's really not worth it:
You can source files with . or source.
I also happen to use a function to source files, which checks for the file's existence first.
You could easily source after && or ;: it's not necessarily the first or only thing on a line.
You have to account for the fact that functions don't necessarily have the keyword function before them.
You can omit the () after the function name.
There are probably other complicating factors I haven't thought of.
Fundamentally this is wrong because it is parsing files rather than reporting what is loaded in memory.
Is there any reasonable way to do this—get a list of all functions loaded in memory—in Bash? It seems like an enormous omission, if not.
(And for those looking for duplicate questions, this one is very different, as it's asking for a way to list only those functions that come from a specific file.)
Use typeset -f in bash. In zsh, functions is just a synonym for the same command.

Resources