Maximum command length not reached, yet still exceeded - bash

I'm having a weird problem when calling a MATLAB function from a (bash) shell script run in CygWin.
This is the problematic command:
"$MATLAB_PATH/matlab" -wait -nojvm -nosplash -automation -logfile "$MATLAB_LOGFILE" -r "myFunction $(echo ${FUNCTION_ARGS[#]}); quit;"
which, when echoed on the bash command line, evaluates to something like the following:
/cygdrive/c/Program Files (x86)/MATLAB/R2010a/bin/matlab -wait -nojvm \
-nosplash -automation -logfile MATLAB_output.txt -r myFunction \
/path/to/relevant/data/data1.txt /path/to/other/relevant/data/data2.txt \
<<several more such arguments>>; quit;
In total, the length of the command is ~2000 characters, depending a bit on which path the script is called from.
The problem is that my MATLAB function receives only 17 arguments (~1017 characters), while I send it well over 30 arguments.
Other observed behavior:
When I copy-paste the echoed command line into a regular MATLAB session (that is, not the automation server), there seems to be no problem and the function executes just fine on all ~30 arguments.
When I reduce the length of the command line (for example, by removing the -wait option), the MATLAB function will suddenly receive 18 arguments, with the last argument a portion of the 18th string that I passed in.
Reducing or increasing the command line length by a few characters in other ways (duplicating slashes in paths, duplicating spaces, etc.) has no effect.
EDIT: The copy-pasted command line length seems to have a maximum length of 1014 characters.
So apparently, somewhere along the tool chain, there is limitation on the maximum length a command can have. I'm not finding anything relevant in the docs of MATLAB, its automation server, bash, or CygWin -- they all have limits, but more in the order of 32K characters, way more than what I'm passing in.
So...I'm at a loss. I'm not sure how to diagnose which tool is causing this...any ideas?
EDIT:
Output of xargs --show-limits:
Your environment variables take up 5556 bytes
POSIX upper limit on argument length (this system): 24396
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 18840
Size of command buffer we are actually using: 24396
Output of expr $(getconf ARG_MAX) - $(env|wc -c) - $(env|wc -l) \* 4 - 2048:
24098
So, as I said, even the smallest of these does not come close to my ~2000 characters.

There may be a limit in Matlab's processing of command line. For example, the length of the buffer allocated for the function may be limited to 1014 characters.
In your script, save the function invocation in a script and try to pass the script to Matlab on the command line.
http://www.mathworks.com/help/matlab/ref/matlabwindows.html says:
matlab -r "statement" starts MATLAB and executes the specified MATLAB statement. If statement is the name of a MATLAB function or script, do not specify the file extension. Any required file must be on the MATLAB search path or in the startup folder.

Related

Output time to a file with the Unix "time" command, but leave the output of the command to the console

I time a command that has some output. I want to output the real time from the time command to a file, but leave the output of the command to the console.
For example, if I do time my_command I get this printed in the console:
several lines of output from my_command ...
real 1m25.970s
user 0m0.427s
sys 0m0.518s
In this case, I want to store only 1m25.970s to a file, but still print the output of the command to the console.
The time command is tricky. The POSIX specification of time
doesn't define the default output format, but does define a format for the -p (presumably for 'POSIX') option. Note the (not easily understood) discussion of command sequences in pipelines.
The Bash specification say time prefixes a 'pipeline', which means that time cmd1 | cmd2 times both cmd1 and cmd2. It writes its results to standard error. The Korn shell is similar.
The POSIX format requires a single space between the tags such as real and the time; the default format often uses a tab instead of a space. Note that the /usr/bin/time command may have yet another output format. It does on macOS, for example, listing 3 times on a single line, by default, with the label after the time value; it supports -p to print in an approximation to the POSIX format (but it has multiple spaces between label and time).
You can easily get all the information written to standard error into a file:
(time my_command) 2> log.file
If my_command or any programs it invokes reports any errors to standard error, those will got to the log file too. And you will get all three lines of the output from time written to the file.
If your shell is Bash, you may be able to use process substitution to filter some of the output.
I wouldn't try it with a single command line; the hieroglyphs needed to make it work are ghastly and best encapsulated in shell scripts.
For example, a shell script time.filter to capture the output from time and write only the real time to a log file (default log.file, configurable by providing an alternative log file name as the first argument
#!/bin/sh
output="${1:-log.file}"
shift
sed -E '/^real[[:space:]]+(([0-9]+m)?[0-9]+[.][0-9]+s?)/{ s//\1/; w '"$output"'
d;}
/^(user|sys)[[:space:]]+(([0-9]+m)?[0-9]+[.][0-9]+s?)/d' "$#"
This assumes your sed uses -E to enable extended regular expressions.
The first line of the script finds the line containing the real label and the time after it (in a number of possible formats — but not all). It accepts an optional minutes value such as 60m05.003s, or just a seconds value 5.00s, or just 5.0 (POSIX formats — at least one digit after the decimal point is required). It captures the time part and prints it to the chosen file (by default, log.file; you can specify an alternative name as the first argument on the command line). Note that even GNU sed treats everything after the w command as file name; you have to continue the d (delete) command and the close brace } on a newline. GNU sed does not require the semicolon after d; BSD (macOS) sed does. The second line recognizes and deletes the lines reportin the user and sys times. Everything else is passed through unaltered.
The script processes any files you give it after the log file name, or standard input if you give it none. A better command line notation would use an explicit option (-l logfile) and getopts to specify the log file.
With that in place, we can devise a program that reports to standard error and standard output — my_command:
echo "nonsense: error: positive numbers are required for argument 1" >&2
dribbler -s 0.4 -r 0.1 -i data -t
echo "apoplexy: unforeseen problems induced temporary amnesia" >&2
You could use cat data instead of the dribbler command. The dribbler command as shown reads lines from data, writes them to standard output, with a random delay with a gaussian distribution between lines. The mean delay is 0.4 seconds; the standard deviation is 0.1 seconds. The other two lines are pretending to be commands that report errors to standard error.
My data file contained a nonsense 'poem' called 'The Great Panjandrum'.
With this background in place, we can run the command and capture the real time in log.file, delete (ignore) the user and system time values, while sending the rest of standard error to standard error by using:
$ (time my_command) 2> >(tee raw.stderr | time.filter >&2)
nonsense: error: positive numbers are required for argument 1
So she went into the garden
to cut a cabbage-leaf
to make an apple-pie
and at the same time
a great she-bear coming down the street
pops its head into the shop
What no soap
So he died
and she very imprudently married the Barber
and there were present
the Picninnies
and the Joblillies
and the Garyulies
and the great Panjandrum himself
with the little round button at top
and they all fell to playing the game of catch-as-catch-can
till the gunpowder ran out at the heels of their boots
apoplexy: unforeseen problems induced temporary amnesia
$ cat log.file
0m7.278s
(The time taken is normally between 6 and 8 seconds. There are 17 lines, so you'd expect it to take around 6.8 seconds at 0.4 seconds per line.) The blank line is from time; it is pretty hard to remove that blank line, and only that blank line, especially as POSIX says it is optional. It isn't worth it.

Bash adding unknown extra characters to command in script

I am currently trying to create a script that executes a program 100 times, with different parameters, typically pretty simple, but it's adding strange characters into the output filename that is passed into the command call for the program, the script i have written goes as follows
#!/bin/bash
for i in {1..100}
do ./generaterandomizedlist 10 input/input_10_$i.txt
done
I've taken a small screenshot of the output file name here
https://imgur.com/I855Hof
(extra characters are not recognized by chrome so simply pasting the name doesn't work)
It doesn't do this when i manually call the command issued in the script, any ideas?
Your script has some stray CRs in it. Use dos2unix or tr to fix it.

Bash script output of a top command

When I execute command
top -c -b -n | grep XXX > TEST
on Red Hat Linux command line, all the paths and the rest of the stuff from the top command is exported into the TEST file (in one line it wrote more than 100 columns/character, for example).
However, when I add the same command in a script and run it via crontab, the output is every time only until the 81 column/character.
How can I get the full width information in the cron script?
The explanation is in section 1 of man top:
-w :Output-width-override as: -w [ number ]
In Batch mode, when used without an argument top will
format output using the COLUMNS= and LINES= environment
variables, if set. Otherwise, width will be fixed at
the maximum 512 columns. With an argument, output
width can be decreased or increased (up to 512) but the
number of rows is considered unlimited.
In normal display mode, when used without an argument
top will attempt to format output using the COLUMNS=
and LINES= environment variables, if set. With an
argument, output width can only be decreased, not
increased. Whether using environment variables or an
argument with -w, when not in Batch mode actual termi‐
nal dimensions can never be exceeded.
Note: Without the use of this command-line option, out‐
put width is always based on the terminal at which top
was invoked whether or not in Batch mode.
COLUMNS and LINES are set by your terminal when you start it and when you resize its window.
You could try the following:
COLUMNS=1000 top -c -b -n 1 | grep XXX > TEST
This will ensure that COLUMNS environment variable value is set to a large value.
Courtesy: ServerFault

Bash command line and input limit

Is there some sort of character limit imposed in bash (or other shells) for how long an input can be? If so, what is that character limit?
I.e. Is it possible to write a command in bash that is too long for the command line to execute?
If there is not a required limit, is there a suggested limit?
The limit for the length of a command line is not imposed by the shell, but by the operating system. This limit is usually in the range of hundred kilobytes. POSIX denotes this limit ARG_MAX and on POSIX conformant systems you can query it with
$ getconf ARG_MAX # Get argument limit in bytes
E.g. on Cygwin this is 32000, and on the different BSDs and Linux systems I use it is anywhere from 131072 to 2621440.
If you need to process a list of files exceeding this limit, you might want to look at the xargs utility, which calls a program repeatedly with a subset of arguments not exceeding ARG_MAX.
To answer your specific question, yes, it is possible to attempt to run a command with too long an argument list. The shell will error with a message along "argument list too long".
Note that the input to a program (as read on stdin or any other file descriptor) is not limited (only by available program resources). So if your shell script reads a string into a variable, you are not restricted by ARG_MAX. The restriction also does not apply to shell-builtins.
Ok, Denizens. So I have accepted the command line length limits as gospel for quite some time. So, what to do with one's assumptions? Naturally- check them.
I have a Fedora 22 machine at my disposal (meaning: Linux with bash4). I have created a directory with 500,000 inodes (files) in it each of 18 characters long. The command line length is 9,500,000 characters. Created thus:
seq 1 500000 | while read digit; do
touch $(printf "abigfilename%06d\n" $digit);
done
And we note:
$ getconf ARG_MAX
2097152
Note however I can do this:
$ echo * > /dev/null
But this fails:
$ /bin/echo * > /dev/null
bash: /bin/echo: Argument list too long
I can run a for loop:
$ for f in *; do :; done
which is another shell builtin.
Careful reading of the documentation for ARG_MAX states, Maximum length of argument to the exec functions. This means: Without calling exec, there is no ARG_MAX limitation. So it would explain why shell builtins are not restricted by ARG_MAX.
And indeed, I can ls my directory if my argument list is 109948 files long, or about 2,089,000 characters (give or take). Once I add one more 18-character filename file, though, then I get an Argument list too long error. So ARG_MAX is working as advertised: the exec is failing with more than ARG_MAX characters on the argument list- including, it should be noted, the environment data.
There is a buffer limit of something like 1024. The read will simply hang mid paste or input. To solve this use the -e option.
http://linuxcommand.org/lc3_man_pages/readh.html
-e use Readline to obtain the line in an interactive shell
Change your read to read -e and annoying line input hang goes away.
In the old days, tcsh had a limit of 1024 characters per command line, which made it difficult if you had a very long $PATH. I was forced to rebuild a private version of tcsh with the buffer size increased to allow users to have long $PATH settings. That was 2 decades ago. That was when I gave up using tcsh, and switched to zsh which did not have that limitation. Now I just use plain old bash because it is good enough.

What does & do at the end of a wc command?

I am learning the bash environment and cannot understand what I get when running this command:
wc filename.txt &
It returns an array with a 1-digital integer and another integer, neither of them matches any other result I can get from wc commands (-l, -m, -w, -c).
Besides the second integer is much bigger than for example the bytes counts. So I terribly wonder.
I browsed forums and found some explanations on the multiple uses of the ampersand in a Unix/Linux environment, but there was nothing that I could relate.
I don't need it, but I won't flush this mystery away, I wish to understand!
Thanks
I imagine the integers you see are similar to this:
[1] 1830
& launches a command in the background, and the shell prints its job number (1) and process id (1830). On a longer-running job, you can use those two numbers to control its execution. See the JOB CONTROL section of the bash man page for more details.
An ampersand at the end of a WC command tells the shell to start executing the command in the background and to get ready for further command line commands.

Resources