Why is xargs not replacing the second {} - xargs

I'm using xargs to try to echo the name of a file, followed by its contents. Here is the command
find output/ -type f | xargs -I {} sh -c "echo {} ; cat {}"
However for some reason, the second replace after cat is not being replaced. Only for some files, so some files do work correctly.
To be clear, I'm not looking for a command that lets me echo the name of a file followed by its contents, I'm trying to understand why this specific command does not work.

Turns out that the command was too long, so it was working with shorter file names and failing for longer ones. From man xargs
-I replstr
Execute utility for each input line, replacing one or more occurrences of replstr in up to replacements (or 5 if no -R flag is specified) arguments to utility with the
entire line of input. The resulting arguments, after replacement is done, will not be allowed to grow beyond 255 bytes; this is implemented by concatenating as much
of the argument containing replstr as possible, to the constructed arguments to utility, up to 255 bytes. The 255 byte limit does not apply to arguments to utility
which do not contain replstr, and furthermore, no replacement will be done on utility itself. Implies -x.

The root cause of the problem is pointed out in Carlos' answer, but without a solution.
After some googling, I couldn't find a way to lift up the 255 characters limit.
So a probable way to workaround it, is to use shell variable as a substitution.
Example:
find . | xargs -I% sh -c 'F="%";iconv -f gb2312 -t utf-8 "$F">"$F.out";mv "$F.out" "$F"'
Remember to use single quotes at the outermost sh -c parameter string, we don't want the $F inside to be replaced by our parent shell.

Is it files with white space in the name that create problems? Try adding \", like this:
find output/ -type f | xargs -I {} sh -c "echo \"{}\" ; cat \"{}\""
This worked for me using Bash.

Related

how to find mis-spell words in bunch of files

I have around 10k java files I need to find mis-spelled words in those java files for the strings which are in double-quotes
Following is giving me strings in double-quotes
find . -name "*.java" -exec grep -Po '".*?"' {} \;
But I do not know how to use spell on top of this.
I have only available Linux and ispell so if you are not on Linux the following might not work for you (as is). If you just want to find mis-spelled words and get proposals listed then you could use
find . -name "*.java" -exec grep -Po '"([^"\\]|\\.)*"' {} \; \
| ispell -a -S
The -a selects pipe-mode, -S disables sorting which tends to list better replacements first.
If you want to fix the strings in-place, then you may want to use something like
TEMP=`mktemp`
find . -name "*.java" | xargs grep -l '"...*"' \
| xargs echo /usr/bin/ispell -F ./so20836228-java-deformatter.sh > $TEMP
source $TEMP
This generates spell-checking commands which use the following ispell Java "deformatter":
#!/bin/sh
# Experimental Java ispell deformatter: use at your own risk!
/bin/sed -e '1,$ {
# introduce per-character state
s/\(.\)/\1_/g
# mark string literals
s/"_\(\(\([^"\\]_\|\\_._\)\)*\)"_/"B\1"E/g
# wipe out chars before string literals
:b s/._\(.\)B/ B\1B/g ; t b
# wipe out chars after string literals
:e s/\(.\)E._/\1E E/g ; t e
# remove per-character state
s/\(.\)./\1/g
# get rid of escape sequences
s/\\./ /g
}'
Use this experimental deformatter at your own risk.
Backup files before you work on them.
(Errors in the deformatter may damage spell-checked files.
See ispell manual page:
The program must produce exactly one character of output for each character of input, or ispell will lose synchronization and corrupt the output file.
)

Remove everything but one file tcsh

I basically want to do the following bash command but in tcsh:
rm !(file1)
Thanks
You can use ls -1 (that's the number one, not the lowercase letter L) to list one file per line, and then use grep -vx <pattern> to exclude (-v) lines that exactly (-x) match <pattern>, and then xargs it to your command, rm. For example,
ls -1 | grep -vx file1 | xargs rm
In case your version of grep doesn't support the -x option, you can use anchors:
ls -1 | grep -vx '^file1$' | xargs rm
To use this with commands other than rm that may not take an arbitrary number of arguments, remember to add the -n 1 option to xargs so that arguments are handled one by one:
ls -1 | grep -vx '^file1$' | xargs -n 1 rm
I believe you can also achieve this using find's -name option to specify a parameter by negation, i.e. the find utility itself may support expressions like !(file1), though you'll still have to pipe the results to xargs.
tcsh has a special ^ syntax for glob patterns (not supported in csh, sh, or bash). Prefixing a glob pattern with ^ negates it, causing to match all file names that don't match the pattern.
Quoting the tcsh manual:
An entire glob-pattern can also be negated with `^':
> echo *
bang crash crunch ouch
> echo ^cr*
bang ouch
A single file name is not a glob pattern, and so the ^ prefix doesn't apply to it, but it can be turned into one by, for example, surrounding the first character with square brackets.
So this:
rm ^[f]ile1
should remove all files in the current directory other than file1.
I strongly recommend testing this before using it, either by using an echo command first:
echo ^[f]ile1
or by using Ctrl-X * to expand the pattern to a list of files before hitting Enter.
UPDATE: I've since learned that bash supports similar functionality but with a different syntax. In bash, !(PATTERN) matches anything not matched by the pattern. This is not recognized unless the extglob shell option is enabled. Unlike tcsh's ^ syntax, the pattern can be a single file name. This isn't relevant to what you're asking, but it could be useful if you ever decide to switch to bash.
zsh probably has something similar.

xargs input involving spaces

I am working on a Mac using OSX and I'm using bash as my shell. I have a script that goes something to the effect of:
VAR1="pass me into parallel please!"
VAR2="oh me too, and there's actually a lot of us, but its best we stay here too"
printf "%s\n" {0..249} | xargs -0 -P 8 -n 1 . ./parallel.sh
I get the error: xargs: .: Permission denied. The purpose is to run a another script in parallel (called parallel.sh) which get's fed the numbers 0-249. Additionally I want to make sure that parallel can see and us VAR1 and VAR2. But when I try to source the script parallel with . ./parallel, xargs doesn't like that. The point of sourcing is because the script has other variables I wish parallel to have access to.
I have read something about using print0 since xargs separates it's inputs by spaces, but I really didn't understand what -print0 does and how to use it. Thanks for any help you guys can offer.
If you want the several processes running the script, then they can't be part of the parent process and therefore they can't access the exact same variables. However, if you export your variables, then each process can get a copy of them:
export VAR1="pass me into parallel please!"
export VAR2="oh me too, and there's actually a lot of us, but its best we stay here too"
printf "%s\n" {0..249} | xargs -P 8 -n 1 ./parallel.sh
Now you can just drop the extra dot since you aren't sourcing the parallel.sh script, you are just running it.
Also there is no need to use -0 since your input is just a series of numbers, one on each line.
To avoid the space problem I'd use new line character as separator for xargs with the -d option:
xargs -d '\n' ...
i think you have permission issues , try getting a execute permission for that file "parallel.sh"
command works fine for me :
Kaizen ~/so_test $ printf "%s\n" {0..4} | xargs -0 -P 8 -n 1 echo
0
1
2
3
4
man find :
-print0
True; print the full file name on the standard output, followed by a
null character (instead of the newline character that -print uses).
This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find
output. This option corresponds to the -0 option of xargs.
for print0 use : check the link out : there is a question for it in stack overflow
Capturing output of find . -print0 into a bash array
The issue of passing arguments is related to xarg's interpretation of white space. From the xargs man page:
-0 Change xargs to expect NUL (``\0'') characters as separators, instead of spaces and newlines.
The issue of environment variables can be solved by using export to make the variables available to subprocesses:
say.sh
echo "$1 $V"
result
bash$ export V=whatevs
bash$ printf "%s\n" {0..3} | xargs -P 8 -n 1 ./say.sh
1 whatevs
2 whatevs
0 whatevs
3 whatevs

To understand xargs better

I want to understand the use of xargs man in Rampion's code:
screen -t man /bin/sh -c 'xargs man || read'
Thanks to Rampion: we do not need cat!
Why do we need xargs in the command?
I understand the xargs -part as follows
cat nothing to xargs
xargs makes a list of man -commands
I have had an idea that xargs makes a list of commands. For instance,
find . -type f -print0 | xargs -0 grep masi
is the same as a list of commands:
find fileA AND grep masi in it
find fileB AND grep masi in it
and so on for fileC, fileD, ...
No, I don't cat nothing. I cat whatever input I get after I run the command. cat is actually extraneous here, so let's ignore it.
xargs man waits on user input. Which is necessary. Since in the script you grabbed that from, I can't paste in the argument for man until after I create the window. So the command that runs in the window needs to wait for me to give it something, before it tries to run man.
If we just ran screen /bin/sh -d 'man || read', it would always complain "What manual page do you want?" since we never told it.
xargs gathers arguments from stdin and executes the command given with those arguments.
so cat is waiting for something to be typed, and then xargs is running man with that input.
xargs is useful if you have a lot of files to process, I often use it with output from find.
xargs will stuff as many arguments as it can onto the command line.
It's great for doing something like
find . -name '*.o' -print | xargs rm
The cat command does not operate on nothing; it operates on standard input, up until it is told that the input is ended. As Rampion notes, the cat command is not necessary here, but it is operating on its implicit input (standard input), not on nothing.
The xargs command reads the output from cat, and groups the information into arguments to the man command specified as its (only) argument. When it reaches a limit (configurable on the command line), it will execute the man command.
The find ... -print0 | xargs -0 ... idiom deals with file names that contain awkward characters such as blanks, tabs and newlines. The find command prints each filename followed by an ASCII NUL ('\0'); this is one of two characters that cannot appear in a simple file name - the other being '/' (which appears in path names, of course, but not in simple file names). It is not directly equivalent to the sequence you provide; xargs groups collections of file names into a single argument list, up to a size limit. If the names are short enough (they usually are), then there will be fewer executions of grep than there are file names.
Note, too, the grep only prints the file name where the material is found if it has more than one file to search -- or if it supports an option so that it always prints the file names and the option is used: '-H' is a GNU extension to grep that does this. The portable way to ensure that the file names always appear is to list /dev/null as the first file (so 'xargs grep something /dev/null'); it doesn't take long to search /dev/null.

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources