I am having trouble with this simple task:
cat file | grep -E ^[0-9]+$ > file_grep
diff file file_grep
Problem is, I want to do this without file_grep
I have tried:
diff file `cat file | grep -E ^[0-9]+$`
and
diff file "`cat file | grep -E ^[0-9]+$`"
and a few other combinations :-) but I can't get it to work.
I always get an error, when the diff gets extra argument which is content of file filtered by grep.
Something similar always worked for me, when I wanted to echo command outputs from within a script like this (using backtick escapes):
echo `ls`
Thanks
If you're using bash:
diff file <(grep -E '^[0-9]+$' file)
The <(COMMAND) sequence expands to the name of a pseudo-file (such as /dev/fd/63) from which you can read the output of the command.
But for this particular case, ruakh's solution is simpler. It takes advantage of the fact that - as an argument to diff causes it to read its standard input. The <(COMMAND) syntax becomes more useful when both arguments to diff are command output, such as:
diff <(this_command) <(that_command)
The simplest approach is:
grep -E '^[0-9]+$' file | diff file -
The hyphen - as the filename is a specific notation that tells diff "use standard input"; it's documented in the diff man-page. (Most of the common utilities support the same notation.)
The reason that backticks don't work is that they capture the output of a command and pass it as an argument. For example, this:
cat `echo file`
is equivalent to this:
cat file
and this:
diff file "`cat file | grep -E ^[0-9]+$`"
is equivalent to something like this:
diff file "123
234
456"
That is, it actually tries to pass 123234345 (plus newlines) as a filename, rather than as the contents of a file. Technically, you could achieve the latter by using Bash's "process substitution" feature that actually creates a sort of temporary file:
diff file <(cat file | grep -E '^[0-9]+$')
but in your case it's not needed, because of diff's support for -.
grep -E '^[0-9]+$' file | diff - file
where - means "read from standard input".
Try process substitution:
$ diff file <(grep -E "^[0-9]+$" file)
From the bash manpage:
Process Substitution
Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of
naming open files. It takes the form of <(list) or >(list). The process list is run with its input or
output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to
the current command as the result of the expansion. If the >(list) form is used, writing to the file
will provide input for list. If the <(list) form is used, the file passed as an argument should be read
to obtain the output of list.
In bash, the syntax is
diff file <(cat file | grep -E ^[0-9]+$)
Related
Okay, I am a newbie to Unix scripting. I was given the task to find a temporary work around for this:
cat /directory/filename1.xml |sed -e "s/ABCXYZ/${c}/g" > /directory/filename2.xml
$c is a variable from a sqlplus count query. I totally understand how this sed command is working. But here is where I am stuck. I am storing the count associated with the variable in another file called filename3 as count[$c] where $c is replaced with a number. So my question is how can I update this sed command to substitute ABCXYZ with the count from file3?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
UPDATE: In case anyone has a similar issue I got mine to work using:
rm /directory/folder/variablefilename.dat
echo $c >> /directory/folder/variablefilename.dat
d=$(grep [0-9] /directory/folder/variablefilename.dat)
sed -3 "s/ABC123/${d}/g" /directory/folder/inputfile.xml >> /directory/folder/outputfile.xml
thank you to Kaz for pointing me in the right direction
Store the count in filename3 using the syntax c=number. Then you can source the file as a shell script:
. /filename3 # get c variable
sed -e "s/ABCXYZ/${c}/g" /directory/filename1.xml > /directory/filename2.xml
If you can't change the format of filename3, you can write a shell function which scrapes the number out of that file and sets the c variable. Or you can scrape the number out with an external program like grep, and then interpolate its output into a variable assignment using command substitution: $(command arg ...) syntax.
Suppose we can rely on file3 to contain exactly one line of the form count[42]. Then we can just extract the digits with grep -o:
c=$(grep -E -o '[0-9]+' filename3)
sed -e "s/ABCXYZ/$c/g" /directory/filename1.xml > /directory/filename2.xml
The c variable can be eliminated, of course; you can stick the $(grep ...) into the sed command line in place of $c.
A file which contains numerous instances of syntax like count[42] for various variables could be transformed into a set of shell variable assignments using sed, and then sourced into the current shell to make those assignments happen:
$ sed -n -e 's/^\([A-Za-z_][A-Za-z0-9_]\+\)\[\(.*\)\]/\1=\2/p' filename3 > vars.sh
$ . ./vars.sh
you can use sed like this
sed -r "s/ABCXYZ/$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)/g" path_to_file
the expression is double quoted which allow the shell to execute below and find the number in count[$c] in the file and use it as a substitute
$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)
i was trying to add the lines from the text file to the sed command
observered_list.txt
Uncaught SlingException
cannot render resource
IncludeTag Error
Recursive invocation
Reference component error
i need it to be coded like the following
sed '/Uncaught SlingException\|cannot render resource\|IncludeTag Error\|Recursive invocation\|Reference component error/ d'
help me to do this.
I would suggest you create a sed script and delete each pattern consecutively:
while read -r pattern; do
printf "/%s/ d;\n" "$pattern"
done < observered_list.txt >> remove_patterns.sed
# now invoke sed on the file you want to modify
sed -f remove_patterns.sed file_to_clean
Alternatively you could construct the sed command like this:
pattern=
while read -r line; do
pattern=$pattern'\|'$line
done < observered_list.txt
# strip of first and last \|
pattern=${pattern#\\\|}
pattern=${pattern%\\\|}
printf "sed '/%s/ d'\n" "$pattern"
# you still need to invoke the command, it's just printed
You can use grep for that:
grep -vFf /file/with/patterns.txt /file/to/process.txt
Explanation:
-v excludes lines of process.txt which match one of the patterns from output
-F treats patterns in patterns.txt as fixed strings instead of regexes (looks like this is desired here)
-f reads patterns from patterns.txt
Check man grep for further information.
I am using the code below to grep some string:
grep 'string' *.log | grep -v 'string1'
I am getting output in particular file. My requirement is to extract that file name to a variable. How I can do that effectively?
In general, you can capture the output of any command into a shell variable via command substitution like this:
variable=$(command arg1 arg2)
This is appropriate for your particular case if you are sure that there will be only one file name produced by the grep pipeline. In that case, you capture its name into shell variable fname via:
fname=$(grep -lZ string *.log | xargs -0 grep -lv string1)
This is safe for difficult file names because, via the -Z and -0 options, we use NUL-separated lists. The -l option to grep is useful here because it suppresses the normal grep output and just prints the file names.
If there might be multiple file matches, then, if you can use an advanced shell like bash, try:
grep -lZ string *.log | xargs -0 grep -lvZ string1 | while IFS= read -r -d $'\0' fname
do
# Process file "$fname"
done
This is also safe for difficult file names because, throughout the pipeline, it uses NUL-separated lists.
For a POSIX shell, read works with newline-separated input. To make the above safe for difficult file names, the -d option is used which is supported by bash, zsh, and other advanced shells.
use command
basename "filePath" "fileExtension"
ex: basename /home/john/xyz.txt .txt
output: xyz
I am using the below script. When I have it echo $f as shown below, it gives the correct result:
#/bin/bash
var="\/home\/"
while read p; do
f=$(echo $p | sed "s/${var}/\\n/g")
f=${f%.sliced.bam}.fastq
echo $f
~/bin/samtools view $p | awk '{print "#"$1"\n"$10"\n+\n"$11}' > $f
./run.sh $f ${f%.fastq}
rm ${f%.sliced.bam}.fastq
done < $1
I get the output as expected
test.fastq
But the file being created by awk > $f has the name
?test.fastq
Note that the overall goal here is to run this loop on every file listed in a file with absolute paths but then write locally (which is what the sed call is for)
edit: Run directly on the command line (without variables) the samtools | awk line runs correctly.
Awk cannot possibly have anything to do with your problem. The shell is completely responsible for file redirection, so f MUST have a weird character in it.
Most likely whatever you are sending to this script has a special character in it (e.g. perhaps a UTF character, and your terminal is showing ASCII only). When you do the echo, the shell doesn't know how to display the char, and probably just shows it as whitespace, and when you send it through ls (which might be doing things like colorization) it combines in a strange way and ends up showing the ?.
Oh wait...why are you putting a newline into the filename with sed??? That is possibly your problem...try just:
sed "s/${var}//g"
How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.