Problems using grep and sed in bash - bash

I'm extracting domains, subdomains and ips from a text file using:
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "extra-domains.txt" | sed 's/www.//' | sort -u > outputfile.txt
And I'm using this bash to run it quicker as: extract-domains.sh text-with-domains.txt
#!/bin/bash
FILE="$1"
while read LINE; do
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "$LINE" | sed 's/www.//' | sort -u > outputfile.txt
done < ${FILE}
but I keep getting multiple errors with "No such file or directory" when running the bash.
Can anyone give me a hand? Thanks.

The way you wrote it, grep takes "$LINE" as a filename. Is that what it is supposed to do ?
edit : There is no point in making a while loop and reading your file line by line. It will be much slower. You should probably write you script like this:
#!/bin/bash
grep -oE '[[:alnum:]]+[.][[:alnum:]_.-]+' "$1" |
sed 's/www.//' |
sort -u
and call it :
extract-domains.sh "extra-domains.txt" > outputfile.txt

Related

How to apply a command to each line in pipe?

I want to apply a command to each line of piped stdin like so:
cat file.txt | grep ... | ./filter | wc -l
the problem is ./filter accepts only a single line of input and gives a single line of output. I've tried xargs but it spawns a subshell and I can't capture it's output to continue working with the result. Is there an easy way to do that?
If it accepts a single line, then you should put it in a loop if you want to process multiple lines,
cat file.txt |
grep ... |
while read line ; do
echo "$line" | ./filter
done |
wc -l
To call a command for each line you can read a line into a variable and use the variable as a standard input. (Also, let’s avoid UUOC.)
grep ... < file.txt |
while IFS= read -r line; do
./filter <<< "$line"
done |
wc -l
In this case it looks like things may get way easier if you instead write the whole filter in awk. Because it will give you wc -l for free (NR), plus line and record splitting and filtering better than what grep can do.

Multiple output in single line Shell commend with pipe only

For example:
ls -l -d */ | wc -l | awk '{print $1}' | tee /dev/tty | ls -l
This shell command print the result of wc and ls -l with single line, but tee is used.
Is it possible to using one Shell commend line to achieve multiple output without using “&&” “||” “>” “>>” “<” “;” “&”,tee and temp file?
When you want the output of date and ls -rtl | head -1 on one line, you can use
echo "$(date): $(ls -rtl | head -1)"
Yes, you can achieve writing to multiple files with awk which is not in the list of things you appear not to like:
echo hi | awk '{print > "a.txt"; print > "b.txt"}'
Then check a.txt and b.txt.

Pipe output to terminal and file using tee from grep and sed pipe

I'm trying to get the output from a grep and sed pipe to go to the terminal and a text file.
Neither
grep -Filr "string1" * 2>&1 | tee ~/outputfile.txt | sed -i "s|string1|string2|g"
nor
grep -Filr "string1" * | sed -i "s|string1|string2|g" 2>&1 | tee ~/outputfile.txt
work. I get "sed: no input files" going to the terminal so sed is not getting the correct input. I just want to see and write out to a text file which files are modified from the search and replace. I know using find instead of grep would be more efficient since the search wouldn't be done twice, but I'm not sure how to output the file name using find and sed when there is a search hit.
EDIT:
Oops I forgot to include xargs in the code. It should have been:
grep -Filr "string1" * 2>&1 | tee ~/outputfile.txt | xargs sed -i "s|string1|string2|g"
and
grep -Filr "string1" * | xargs sed -i "s|string1|string2|g" 2>&1 | tee ~/outputfile.txt
To be clear, I'm looking for a solution that modifies the matched files with the search and replace, and then outputs the modified files' file names to the terminal and a log file.
The -i option to sed is only useful when sed operates on a file, not on standard input. Drop it, and your first option is correct.
I'd use a loop:
for i in `grep -lr string1 *`; do sed -i . 's/string1/string2/g' $i; echo $i >> ~/outputfile.txt; done
I'd advise against using the 'i' option for grep, because it would match files which the sed command won't actually modify.
You can do the same with find and exec, but that's a dangerous tool.
I almost forgot about this. I eventually went with a for loop in a bash script:
#!/bin/bash
for i in $( grep -Flr "string1" * ); do
sed -i "s|string1|string2|g" $i
echo $i
echo $i >> ~/outputfile.txt
done
I'm using the vertical pipe | as the separator, because I'm replacing URL paths with lots of forward slashes.
Thank you both for your help.

Getting head to display all but the last line of a file: command substitution and standard I/O redirection

I have been trying to get the head utility to display all but the last line of standard input. The actual code that I needed is something along the lines of cat myfile.txt | head -n $(($(wc -l)-1)). But that didn't work. I'm doing this on Darwin/OS X which doesn't have the nice semantics of head -n -1 that would have gotten me similar output.
None of these variations work either.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
I tested out more variations and in particular found this to work:
cat <<EOF | echo $(($(wc -l)-1))
>Hola
>Raul
>Como Esta
>Bueno?
>EOF
3
Here's something simpler that also works.
echo "hello world" | echo $(($(wc -w)+10))
This one understandably gives me an illegal line count error. But it at least tells me that the head program is not consuming the standard input before passing stuff on to the subshell/command substitution, a remote possibility, but one that I wanted to rule out anyway.
echo "hello" | head -n $(cat && echo 1)
What explains the behavior of head and wc and their interaction through subshells here? Thanks for your help.
head -n -1 will give you all except the last line of its input.
head is the wrong tool. If you want to see all but the last line, use:
sed \$d
The reason that
# Sample of incorrect code:
echo "hello" | head -n $(wc -l | sed -E -e 's/\s//g')
fails is that wc consumes all of the input and there is nothing left for head to see. wc inherits its stdin from the subshell in which it is running, which is reading from the output of the echo. Once it consumes the input, it returns and then head tries to read the data...but it is all gone. If you want to read the input twice, the data will have to be saved somewhere.
Using sed:
sed '$d' filename
will delete the last line of the file.
$ seq 1 10 | sed '$d'
1
2
3
4
5
6
7
8
9
For Mac OS X specifically, I found an answer from a comment to this Q&A.
Assuming you are using Homebrew, run brew install coreutils then use the ghead command:
cat myfile.txt | ghead -n -1
Or, equivalently:
ghead -n -1 myfile.txt
Lastly, see brew info coreutils if you'd like to use the commands without the g prefix (e.g., head instead of ghead).
cat myfile.txt | echo $(($(wc -l)-1))
This works. It's overly complicated: you could just write echo $(($(wc -l)-1)) <myfile.txt or echo $(($(wc -l <myfile.txt)-1)). The problem is the way you're using it.
cat myfile.txt | head -n $(wc -l | sed -E -e 's/\s//g')
wc consumes all the input as it's counting the lines. So there is no data left to read in the pipe by the time head is started.
If your input comes from a file, you can redirect both wc and head from that file.
head -n $(($(wc -l <myfile.txt) - 1)) <myfile.txt
If your data may come from a pipe, you need to duplicate it. The usual tool to duplicate a stream is tee, but that isn't enough here, because the two outputs from tee are produced at the same rate, whereas here wc needs to fully consume its output before head can start. So instead, you'll need to use a single tool that can detect the last line, which is a more efficient approach anyway.
Conveniently, sed offers a way of matching the last line. Either printing all lines but the last, or suppressing the last output line, will work:
sed -n '$! p'
sed '$ d'
Here is a one-liner that can get you the desired output, and it can be used more generally for getting all lines from a file except the last n lines.
grep -n "" myfile.txt \ # output the line number for each line
| sort -nr \ # reverse the file by using those line numbers
| sed '1,4d' \ # delete first 4 lines (last 4 of the original file)
| sort -n \ # reverse the reversed file (correct the line order)
| sed 's/^[0-9]*://' # remove the added line numbers
Here is the above command in an actual single line and runnable (can't execute the above due to the added comments):
grep -n "" myfile.txt | sort -nr | sed '1,4d' | sort -n | sed 's/^[0-9]*://'
It's a little cumbersome, and this problem can be solved with more comprehensive commands like ghead, but when you can't or don't want to download such tools, it's nice to be able to do this with the more basic options. I've been in situations where it's simply not an option to get better tools.
awk 'NR>1{print p}{p=$0}'
For this job, an awk one-liner is a bit longer than a sed one.

shell script to read contain from file and grep on other file

I am working on shell, I want to write one liner which will read the file contents of file A and execute grep command on file B.
for example, suppose there are two file
dataFile.log which have following value
abc
xyz
... and so on
now read abc and grep on searchFile.log like grep abc searchFile.log
I have shell script for the same but want one liner for it
for i in "cat dataFile.log" do grep $i searchFile.log done;
try this:
grep -f dataFile.log searchFile.log
Note that if you want to grep as fixed string, you need -F, if you want to match the text in dataFile.log as regex, use -E or -P
How about the following: it even ignores blank lines and # comments:
while read FILE; do if [[ "$FILE" != [/a-zA-Z0-9]* ]]; do continue; fi; grep -h pattern "$FILE"; done;
Beware: have not compiled this.
You can use grep -f option:
cat dataFile.log | grep -f searchFile.log
Edit
OK, now I understand the problem. You want to use every line from dataFile.log to grep in searchFile.log. I also see you have value1|value2|..., so instead of grep you need egrep.
Try with this:
for i in `cat dataFile.log`
do
egrep "$i" searchFile.log
done
Edit 2
Following chepner suggestion:
egrep -f dataFile.log searchFile.log

Resources