In the Z shell there's a handy command that returns a list of all available functions. The command is, conveniently, called functions. I cannot find a similar alternative in Bash. I threw together a quick & dirty (and wholly unacceptable) function to approximately do the same thing, but it has at least one glaring problem: since it relies on parsing files you must either list all the files to look in (which may become stale) or give an expression (which is guaranteed to give files you don't want to look in, such as .bash_history).
Here's the function, since I know someone will ask for it if I don't post it, but I'm pretty sure it's a dead end, or at least the wrong approach.
functions() {
grep "^function " "$HOME/."{bashrc,bash_profile,aliases,functions,projects,variables} | sort | sed -e 's/{//' | uniq
}
I could improve on this wrong-headed approach by parsing .bash_profile and getting a list of all sourced files and then parsing them for functions, but by the time you add the following complications into the mix, it's really not worth it:
You can source files with . or source.
I also happen to use a function to source files, which checks for the file's existence first.
You could easily source after && or ;: it's not necessarily the first or only thing on a line.
You have to account for the fact that functions don't necessarily have the keyword function before them.
You can omit the () after the function name.
There are probably other complicating factors I haven't thought of.
Fundamentally this is wrong because it is parsing files rather than reporting what is loaded in memory.
Is there any reasonable way to do this—get a list of all functions loaded in memory—in Bash? It seems like an enormous omission, if not.
(And for those looking for duplicate questions, this one is very different, as it's asking for a way to list only those functions that come from a specific file.)
Use typeset -f in bash. In zsh, functions is just a synonym for the same command.
Related
For context, i'm trying to create an overly simplified version of bash, not like a bash full script interpreter, just a series of commands and operators (|,||, &&, <, >, <<, >>,$, $?) small interpreter, The mental model which i used in a nutshell is:
Lexer + Expander: in the first stage i used a simple state machine to lex and store data (commands, arguments, redirection files etc.) and lex input into tokens, i expand env variables and i handle lexical errors too.(as simple as checking finite states of valid characters).
Parser: in the second i stage i intend to create an AST out of the tokens + data, and handle parsing errors.
Executor: Finally i'll execute the AST.
No i'm at the parser stage, and i'm trying to think about how might i handle parsing errors, now the thought i had is out of the possible range of valid statements, it seems very difficult to check the validity of such an input cause the range is too big or at least that's what i think, and i'm sure there's some generalized solution for the problem, why i'm sure? because bash have done it.
For example this statement:
$ < $FILE || && > outfile
From the lexer point of view it's all bright and shiny, but it's surely not a valid input from the parser's perspective. Now one possible solution to this is to check whether there's a command token in the input if not then invalid. but what about this one:
$ || ls > $FILE && cat < $FILE
Again all valid lexeme, but unparsable statement, maybe that too could be checked against "if the line start with an OR or AND token error.".
Now the specific question is how bash exactly parse these combination of commands and operators, either there's some sort of more generalized solution or i'm left with an if&else error checking against inputs that i think is invalid. which honestly seems stupid and cumbersome.
Most of the complexity of shell parsing is in the tokenisation, although you certainly don't need to worry about all of the complications which have crept in over the years. The grammar itself is pretty simple; it's designed to be parsed by a parser generated with a tool like Bison (or some other yacc derivative), and that's precisely how Bash works.
The various syntactic rules recognised by Bash are scattered throughout the Bash manual, but the grammar is based on the standard shell grammar specified in the Posix standard, which is probably an easier starting point. In that document, the grammar is included as what is basically a Yacc input file (without any of the semantic actions necessary for an actual implementation); you can find it at the end of section 2.10. Make sure to read the initial part of that section, though, because it contains important information about how tokens are classified. Also, take note of section 2.3, token recognition.
Between these two sections you'll find a precise description of shell quoting rules and the various expansions which are done prior to parsing (or, better said, intermingled with parsing because command substitution makes the whole process recursive.) You might not want to absorb all of that on a first reading, although it will also help you be more effective in your use of the shell.
Bash implements a lot more features, but probably most or all of them go beyond your needs.
#choroba has the right idea - to understand exactly how Bash parses scripts you need to look at the source of Bash. There are basically fractal rules of thumb for how Bash works in increasingly complex cases, and any description short enough to fit in a SO response is probably not detailed enough to give you the full picture.
According to the Bash Reference Manual, the Bash scripting language is constituted of 4 distinct subclasses of syntactic elements:
built-in commands (alias, cd)
reserved words (if, function)
parameters and variables ($, IFS)
functions (abort, end-of-file - activated with keybindings such as Ctrl-d)
Apart from reading the manual, I became inherently curious if there was a programmatic way to list out or generate all such keywords, at least from one of the above categories. I think this could be useful in some contexts. Sometimes I wish I could see all the options available to me for what I can write in any given moment, and having that information as data, instead of a formatted manual, is convenient, focused, and can be edited, in case you want to strike out commands you know well, or that are too obscure for now.
My understanding is that Bash takes the input into stdin and passes it to the running shell process. When code is distributed in a production-ready form, it is compiled, so it runs faster. Unlike using a Python REPL, you don’t have access to the Bash source code from within Bash, so it is not a very direct route to write a program that searches through source files to find various defined commands. I mean that if you wanted to list all functions, Python has the dir() function which programmatically looks for function names in the namespace. But I don’t think Bash can do that. I think it doesn’t have a special syntax in its source files which makes it easy to find and identify all the keywords. Instead, they will be found if you simply enter them - like cd will “find” the program cd because $PATH returns the path to that command - but there’s no special way to discover them.
Or am I wrong? Technically, you could run a “brute force” search by generating every combination of symbols of every length and record when you did not get “error: unknown command” as a response.
Is there any other clever programmatic way to do this?
I mean I want to see a list of every symbol or string that the bash
compiler
Bash is not a compiler. It and every other shell I know are interpreters of various languages.
recognises and knows what to do with, including commands like
“ls” or just a symbol like “*”. I also want to see the inputs and
outputs for each symbol, i.e., some commands are executed in the shell
prompt by themselves, but what data type do they return?
All commands executed by the shell have an exit status, which is a number between 0 and 255. This is as close to a "return type" as you get. Many of them also produce idiosyncratic output to one or two streams (a standard output stream and a standard error stream) under some conditions, and many have other effects on the shell environment or operating environment.
And some
require a certain data type to standard input.
I can't think of a built-in utility whose expected input is well characterized as having a particular data type. That's not really a stream-oriented concept.
I want to do this just as a rigorous way to study the language.
If you want to rigorously study the language, then you should study its manual, where everything you describe has already been compiled. You might also want to study the POSIX shell command language manual for a slightly different perspective, which is more thorough in some areas, though what it documents differs in a few details from Bash's default behavior.
If you want to compile your own summary of Bash syntax and behavior, then those are the best source materials for such an effort.
You can get a list of all reserved words and syntactic elements of bash using this trick:
help -s '*' | cut -d: -f1
Or more accurately:
help -s \* | awk -F ': ' 'NR>2&&!/variables/{print $1}'
I'm currently writing a Tcl-based tool for symbolic matrix manipulation, but the code is getting slow. I'm looking for ways to accelerate my Tcl code (Tcl version 8.6).
I have one suspicion. My code builds lists with a command name as the first element and command arguments as the following elements (this comes from emulating an object-oriented approach). I use eval to invoke these commands (and this is done often in the recursive processing). I read at https://wiki.tcl-lang.org/page/eval and https://wiki.tcl-lang.org/page/Tcl+Performance that eval may be slow.
I have three questions:
What would be the fastest way to invoke a command from a list with command name and parameters which is constructed just beforehand?
Would it accelerate the code to separate the command name myCmd and the parameter list myPar and invoke the command with [$myCmd {*}$myPar] instead (suggested at https://stackoverflow.com/a/27619692/3852630)?
Is the trick with if 1 instead of eval still promising in 8.6?
Thanks a lot for your help!
Above all, don't assume: time it to be sure. Be aware when timing things that repeatedly running a thing may change the time it takes to run it (as caches warm up). Think carefully about what you want to actually get the speed of.
The eval command is usually slow, but not in all cases. If you give it a list that you've constructed (e.g., with list or linsert or lappend or…) then it's fairly fast as it can avoid reparsing the input; it knows, but only in that case, that it can skip straight to dispatching to the command implementation. The other case that is fast is when you give it a value that was previously given to eval; the bytecode is already built and cached. These notes also apply with uplevel.
Doing $myCmd {*}$myParameters is fairly fast too; that's bytecoded into “assemble the words on the Tcl operand stack and do the right command dispatch” which is very close to what it would be for an arbitrary user command anyway (which very rarely have direct bytecode implementations).
I'd expect things with if 1 to be very quick in some cases and very slow in others; it forces full compilation, so if things can be cached well then that will be fast and if things can't it will be slow. And if you're just calling a command, it won't make much difference at all at best. The cases where it wins are when the thing being called is itself a bytecoded command and where you can cache things correctly.
If you're dealing with an ordinary command (e.g., a procedure, or one of Tcl's commands that touch the OS), I'd go with option 2: $myCmd {*}$myParameters or variants on it. It's about as fast as you're going to get. But I would not do:
set myParameters [linsert $myOriginalValues 0 "literal1" [cmdOutput2] $value3]
$myCmd {*}$myParameters
That's ridiculous. This is clearer and cleaner and faster:
$myCmd "literal1" [cmdOutput2] $value3 {*}$myOriginalValues
Part of the point of expansion syntax ({*}) is that you don't need to do complex argument marshalling, and that's good because complexity is hard to get right all the time.
A note about K and unsharing objects
Avoid copying data in memory. Change
set mylist [linsert $mylist 0 some new content]
to
set mylist [linsert $mylist[set mylist ""] 0 some new content]
This dereferences the value of the variable and then sets the variable to
the empty string. This reduces the variable's reference count.
See also https://stackoverflow.com/a/64117854/7552
I'm writing an experimental Bash module system that would allow local function namespaces, and my first idea was to write a Bash function parser that would read the function code line by line and prepend each function/variable name with <module-name>. (i.e. function func in module module would become module.func - which could again be imported in another module like module_2.module.func and so on; variables inside functions would be name-mangled - variable var within function func in module module would become __module_func_var).
However, in order to do that, I need a way to detect which names are variables and replace all their occurences in the function with the transported import-name. Trivial cases like variable=[...] are easily parsable, but there are countless of other cases where it's not that trivial - what about while read variable; do [...] done and variable2="asdf${variable//_/+}"?
It seems to me that in order to do this I need to dive into the parsing mechanisms of Bash or read a book on programming languages - but where do I start in order to achieve what I have explained above?
I need a way to detect which names are variables
I'm sorry to say this, but in general it's impossible.
Supporting only the static cases where variables can occur is possible but very tricky. Consider only variable assignments: Besides x= there are declare x=, printf -v x, read x, mapfile x, readarray x and probably many more. Even mature tools like shellcheck still have problems parsing all these cases correctly (for instance, see this issue).
However, even if you mastered parsing all the static cases correctly there still could by dynamic variables, for instance:
x=$(someCommand)
declare "$x=something"
In this example you cannot know the name of the new variable without executing someCommand. Other things which are equally (or even) worse are bash's indirection operator ${!x}, implicit indirection in arithmetic contexts (e.g. x=y; echo $((x))), and eval.
tl;dr: The only way to get all the variables in a script is to interpret/execute the script.
But here comes another problem: Executing the script is also not an option if there is non-determinism (declare "$(tr -cd a-z /dev/urandom | head -c1)=..."). Note that user-input is also non-deterministic (read x; declare "var$x=..."). You would have to write a static analyzer. But this is also not an option because of the halting problem. From the halting problem we can deduce that it is (in general) impossible to tell whether a given bash script has a finite amount of variables.
To implement your module system you could use another approach. For instance, if someone wants to implement a module for your framework then they have to specify the functions/variables in this module in an easy parsable format.
In writing a function for fish shell I want to know if a lone wildcard (not part of a bigger expression) was used in the command arguments. Fish does the wildcard expansion before passing arguments to my function, so there is no easy way that I can see to do that, aside from check whether the arguments are the same as the output of ls. The inefficiency of that method makes me sad, though. Is there a better way to do this, without going into fish's source code?
EDIT:
Thanks for the input. Specifically, I am looking to add some functionality like zshell has for warning if there is a * in the arguments of rm. I know that there was an issue opened on GitHub specifically about this but I couldn't find the link again. I have typod, for example, rm * .o instead of rm *.o, and accidentally deleted all my code (... which I brought back from git, but still).
EDIT 2:
Here is the issue on GitHub: https://github.com/fish-shell/fish-shell/issues/1511
No, there's no way for a function to tell where its arguments came from. Maybe if you give more details about what you're really trying to accomplish, we can give another suggestion.