zsh: argument list too long: sudo - shell

I've a command which I need to run in which one of the args is a list of comma separated ids. The list of ids is over 50k. I've the stored the list of ids in a file and I'm running the command in the following way:
sudo ./mycommand --ids `cat /tmp/ids.txt`
However I get an error zsh: argument list too long: sudo
This I believe is because the kernel has a max size of arguments it can take. One option for me is to manually split the file into smaller pieces (since the ids are comma separated I can't just break it evenly) and then run the command each time for each file.
Is there a better approach?
ids.txt file looks like this:
24342,24324234,122,54545,565656,234235

Converting comments into a semi-coherent answer.
The file ids.txt contains a single line of comma-separated values, and the total size of the file can be too big to be the argument list to a program.
Under many circumstances, using xargs is the right answer, but it relies on being able to split the input up in to manageable chunks of work, and it must be OK to run the program several times to get the job done.
In this case, xargs doesn't help because of the size and format of the file.
It isn't stated absolutely clearly that all the values in the file must be processed in a single invocation of the command. It also isn't absolutely clear whether the list of numbers must all be in a single argument or whether multiple arguments would work instead. If multiple invocations are not an issue, it is feasible to reformat the file so that it can be split by xargs into manageable chunks. If need be, it can be done to create a single comma-separated argument.
However, it appears that these options are not acceptable. In that case, something has to change.
If you must supply a single argument that is too big for your system, you're hosed until you change something — either the system parameters or your program.
Changing the program is usually easier than reconfiguring the o/s, especially if you take into account reconfiguring upgrades to the o/s.
One option worth reviewing is changing the program to accept a file name instead of the list of numbers on the command line:
sudo ./mycommand --ids-list=/tmp/ids.txt
and the program opens and reads the ID numbers from the file. Note that this preserves the existing --ids …comma,separated,list,of,IDs notation. The use of the = is optional; a space also works.
Indeed, many programs work on the basis that arguments provided to it are file names to be processed (the Unix filter programs — think grep, sed, sort, cat, …), so simply using:
sudo ./mycommand /tmp/ids.txt
might be sufficient, and you could have multiple files in a single invocation by supplying multiple names:
sudo ./mycommand /tmp/ids1.txt /tmp/ids2.txt /tmp/ids3.txt …
Each file could be processed in turn. Whether the set of files constitutes a single batch operation or each file is its own batch operation depends on what mycommand is really doing.

Related

How can I pass a file argument to my bash script twice and modify the filename

We have a large number of files in a directory which need to be processed by a program called process, which takes two aguments infile and outfile.
We want to name the outfile after infile and add a suffix. E.g. for processing a single file, we would do:
process somefile123 somefile123-processed
How can we process all files at once from the command line?
(This is in a bash command line)
As #Cyrus says in the comments, the usual way to do that would be:
for f in *; do process "$f" "${f}-processed" & done
However, that may be undesirable and lead to a "thundering herd" of processes if you have thousands of files, so you might consider GNU Parallel which is:
more controllable,
can give you progress reports,
is easier to type
and by default runs one process per CPU core to keep them all busy. So, that would become:
parallel process {} {}-processed ::: *

Multiple bash scripts to execute at a time in a single command

I have three bash scripts which will give output with options.
./script_1 list
./script_2 list
./script_3 list
./script_9 list
and these numbers differs with servers but the first word in every server is "script".
now I want to run all these scripts together with same option 'list'. I need something like ./script_* list ?? or a command with ls or awk or anything else..
Running the process in background is not fulfilling my solution as I appended it into another script.
for i in ./script_*; do
$i list
done
If this is not the use case you're seeking, you'll have to be more specific in your question.

What is the standard usage argument style?

I'm making some command-line tools for some research I'm doing. I'd like these tools to follow commonly used conventions regarding command line programs in Unix.
Should I use flags or just list parameters?
program one two three
program -a one -b two -c three
Where in the list of commands does the input file normally go, or is it better to < it into the program?
What about the output filename?
Should I specify the file extension for the output format, or have my program automatically put the correct extension on?
When the user enters an invalid command, is there a prototypical "correct usage" message?
Is "--help" or "-h" required?
Also, is there some sort of header file I can include that would help with managing these?
If you're looking for a "standard", then you could do worse than look at GNU's Standards for Command Line Interfaces. Other standards are available.
As far as coding for this goes, take a look at boost::program_options. Not only will this save you rolling a lot of your own code, but it does a good job of formatting the options for presenting to the user (the prototypical "correct usage" message, you asked for).
In answer to your specific questions:
Where in the list of commands does the input file normally go, or is it better to < it into the program?
I would expect these to come at the end of a command line. Like in GNU grep. If you are only processing one file and would like to make stdin available as an input source, that would not surprise most users.
If your command processes lots of files, then it would be unusual to have to specify a switch before the filenames. Think cat.
What about the output filename?
A -o or --output option is fairly common. If your file takes exactly one input and one output, then program inputfile outputfile would not surprise many users. If no output file is specified, perhaps you'll output to stdout; that would not be unusual behaviour and would allow your users to pipe the output through other commands (such as grep, less, etc...), They could also redirect stdout to a file using >.
Should I specify the file extension for the output format, or have my program automatically put the correct extension on?
This is probably a matter for debate. If I specified an output filename, I would expect to find that file created (or replaced, after a prompt) without the program changing the name.
When the user enters an invalid command, is there a prototypical "correct usage" message?
Using GNU grep as an example again:
grep: unrecognized option '--incorrect'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
This wouldn't surprise too many users and points them in the right direction if they've made a typo without swamping them with information.
Is "--help" or "-h" required?
That depends on your customer! I find it frustrating when this option isn't available.
Usually speaking, flags are there for providing options and parameter are for passing information. If you have input,output file as command line argument, use flags like -i -o, so sequence will not matter. -h is required if you want to (and need to) give documentation.

Speeding up file comparisons (with `cmp`) on Cygwin?

I've written a bash script on Cygwin which is rather like rsync, although different enough that I believe I can't actually use rsync for what I need. It iterates over about a thousand pairs of files in corresponding directories, comparing them with cmp.
Unfortunately, this seems to run abysmally slowly -- taking about ten (Edit: actually 25!) times as long as it takes to generate one of the sets of files using a Python program.
Am I right in thinking that this is surprisingly slow? Are there any simple alternatives that would go faster?
(To elaborate a bit on my use-case: I am autogenerating a bunch of .c files in a temporary directory, and when I re-generate them, I'd like to copy only the ones that have changed into the actual source directory, leaving the unchanged ones untouched (with their old creation times) so that make will know that it doesn't need to recompile them. Not all the generated files are .c files, though, so I need to do binary comparisons rather than text comparisons.)
Maybe you should use Python to do some - or even all - of the comparison work too?
One improvement would be to only bother running cmp if the file sizes are the same; if they're different, clearly the file has changed. Instead of running cmp, you could think about generating a hash for each file, using MD5 or SHA1 or SHA-256 or whatever takes your fancy (using Python modules or extensions, if that's the correct term). If you don't think you'll be dealing with malicious intent, then MD5 is probably sufficient to identify differences.
Even in a shell script, you could run an external hashing command, and give it the names of all the files in one directory, then give it the names of all the files in the other directory. Then you can read the two sets of hash values plus file names and decide which have changed.
Yes, it does sound like it is taking too long. But the trouble includes having to launch 1000 copies of cmp, plus the other processing. Both the Python and the shell script suggestions above have in common that they avoid running a program 1000 times; they try to minimize the number of programs executed. This reduction in the number of processes executed will give you a pretty big bang for you buck, I expect.
If you can keep the hashes from 'the current set of files' around and simply generate new hashes for the new set of files, and then compare them, you will do well. Clearly, if the file containing the 'old hashes' (current set of files) is missing, you'll have to regenerate it from the existing files. This is slightly fleshing out information in the comments.
One other possibility: can you track changes in the data that you use to generate these files and use that to tell you which files will have changed (or, at least, limit the set of files that may have changed and that therefore need to be compared, as your comments indicate that most files are the same each time).
If you can reasonably do the comparison of a thousand odd files within one process rather than spawning and executing a thousand additional programs, that would probably be ideal.
The short answer: Add --silent to your cmp call, if it isn't there already.
You might be able to speed up the Python version by doing some file size checks before checking the data.
First, a quick-and-hacky bash(1) technique that might be far easier if you can change to a single build directory: use the bash -N test:
$ echo foo > file
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
newer than last read
$ cat file
foo
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
older than last read
$ echo blort > file # regenerate the file here
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
newer than last read
$
Of course, if some subset of the files depend upon some other subset of the generated files, this approach won't work at all. (This might be reason enough to avoid this technique; it's up to you.)
Within your Python program, you could also check the file sizes using os.stat() to determine whether or not you should call your comparison routine; if the files are different sizes, you don't really care which bytes changed, so you can skip reading both files. (This would be difficult to do in bash(1) -- I know of no mechanism to get the file size in bash(1) without executing another program, which defeats the whole point of this check.)
The cmp program will do the size comparison internally IFF you are using the --silent flag and both files are regular files and both files are positioned at the same place. (This is set via the --ignore-initial flag.) If you're not using --silent, add it and see what the difference is.

Get last bash command including pipes

I wrote a script that's retrieving the currently run command using $BASH_COMMAND. The script is basically doing some logic to figure out current command and file being opened for each tmux session. Everything works great, except when user runs a piped command (i.e. cat file | less), in which case $BASH_COMMAND only seems to store the first command before the pipe. As a result, instead of showing the command as less[file] (which is the actual program that has the file open), the script outputs it as cat[file].
One alternative I tried using is relying on history 1 instead of $BASH_COMMAND. There are a couple issues with this alternative as well. First, it does not auto-expand aliases, like $BASH_COMMAND does, which in some cases could cause the script to get confused (for example, if I tell it to ignore ls, but use ll instead (mapped to ls -l), the script will not ignore the command, processing it anyway), and including extra conditionals for each alias doesn't seem like a clean solution. The second problem is that I'm using HISTIGNORE to filter out some common commands, which I still want the script to be aware of, using history will just make the script ignore the last command unless it's tracked by history.
I also tried using ${#PIPESTATUS[#]} to see if the array length is 1 (no pipes) or higher (pipes used, in which case I would retrieve the history instead), but it seems to always only be aware of 1 command as well.
Is anyone aware of other alternatives that could work for me (such as another variable that would store $BASH_COMMAND for the other subcalls that are to be executed after the current subcall is complete, or some way to be aware if the pipe was used in the last command)?
i think that you will need to change a bit your implementation and use "history" command to get it to work. Also, use the command "alias" to check all of the configured alias.. the command "which" to check if the command is actually stored in any PATH dir. good luck

Resources