Print the contents of files from the output of a program - bash

Let's say I have a program foo that finds files with a certain specification and that the output of running foo is:
file1.txt
file2.txt
file3.txt
I want to print the contents of each of those files (preferably with the file name prepended). How would I do this? I would've thought piping it to cat like so:
foo | cat
would work but it doesn't.
EDIT:
My solution to this problem prints out each file and prepends the filename to each line of output is:
foo | xargs grep .
This gets output similar to:
file1.txt: Hello world
file2.txt: My name is foobar.

<your command> | xargs cat

You need xargs here:
foo | xargs cat

In order to allow for file names that have spaces in them, you'll need something like this:
#/bin/bash
while read -r file
do
# Check for existence of the file before using cat on it.
if [[ -f $file ]]; then
cat "$file"
# Don't bother with empty lines
elif [[ -n $file ]]; then
echo "There is no file named '$file'"
fi
done
Put this a script. Let's call it myscript.sh. Then, execute:
foo | myscript.sh

foo | xargs grep '^' /dev/null
why grep on ^ ? to display also empty lines (replace with "." if you want only non-empty lines)
why is there a /dev/null ? so that, in addition to any filename provided in "foo" output, there is at least 1 additionnal file (and a file NOT maching anything, such as /dev/null). That way there is AT LEAST 2 filenames given to grep, and thus grep will always show the matching filename.

Related

Copy file with filename based on grep output

I have a collection of files that all have a specific sequence in them. The files are named sequentially, and I want to copy over the first instance of each file that has a unique sequence.
For example,
1.txt Content: 1[Block]Alpha[/Block]1
2.txt Content: 2[Block]Beta[/Block]2
3.txt Content: 3[Block]Charlie[/Block]3
4.txt Content: 4[Block]Alpha[/Block]4
I want the output to be
Alpha.txt Content: 1[Block]Alpha[/Block]1
Beta.txt Content: 2[Block]Beta[/Block]2
Charlie.txt Content: 3[Block]Charlie[/Block]3
4.txt is missing, as it has 'Alpha' in it which a previous file already matched on.
Currently, I Have the following:
ls | sort -r | xargs grep -oE -m 1 '[Block].{0,40}[/Block]'
#which returns:
1.txt:[Block]Alpha[Block]
2.txt:[Block]Beta[Block]
3.txt:[Block]Charlie[Block]
4.txt:[Block]Alpha[Block]
I want to separate the filename from the left of the ':' and rename it to either everything to the right of it (including Block).txt, or just Alpha.txt (for example).
cp has -n flag for no overwriting, so as long as I do it in sequence i should have no issue there, but I am a bit lost how to continue
Here is a solution that uses one awk process to do the search and extract the filenames and the text between blocks. For the first occurence, it checks if the matched text has been used already, if not it prints, and goes to next file. Output is piped to xargs -n2 with the cp command.
#!/bin/bash
awk '/\[Block\].*\[\/Block\]/ {
gsub(/^.*\[Block\]/,""); gsub(/\[\/Block\].*$/,"")
if (!a[$0]++) print FILENAME, $0 ".txt"; nextfile
}' *.txt | xargs -n2 echo cp -n --
Note: remove echo after you are done with testing.
Testing with your sample files:
> sh test.sh
cp -n -- 1.txt Alpha.txt
cp -n -- 2.txt Beta.txt
cp -n -- 3.txt Charlie.txt
I your case, you want to rename your files in a directory with pattern matched from content of those files, and remove a file that duplicated with other?
I have tested on directory /tmp/test. In this dir, i have 4 file (1.txt 2.txt 3.txt, 4.txt) and write a shell script to perform requirement.
shell script as below:
#/bin/bash
cd /tmp/test
files=$(ls)
for i in $files; do
pattern=$(cat $i | sed "s/Block//g" | grep -o "[a-Z][a-Z]*")
if ! echo $pattern_list | grep -w $pattern; then
echo "Rename $i to ${pattern}.txt"
mv $i ${pattern}.txt
pattern_list+="$pattern "
else
rm $i
fi
done
Brief explain:
List all current file in /tmp/test
Read each file to capture file name and pattern (Alpha, Beta,
Charlie, ...)
Rename the file with new pattern
Remove the file if pattern is duplicated
The Result as below:
sh /tmp/myscript.sh
Rename 1.txt to Alpha.txt
Rename 2.txt to Beta.txt
Rename 3.txt to Charlie.txt
Alpha Beta Charlie
ls
Alpha.txt Beta.txt Charlie.txt

cat multiple files in separate directories file1 file2 file3....file100 using loop in bash script

I have several files in multiple directories like in directory 1/file1 2/file2 3/file3......100/file100. I want to cat all those files to a single file using loop over index in bash script. Is there easy loop for doing so?
Thanks,
seq 100 | sed 's:.*:dir&/file&:' | xargs cat
seq 100 generates list of numbers from 1 to 100
sed
s substitutes
: separates parts of the command
.* the whole line
: separator. Usually / is used, but it's used in replacement string.
dir&/file& by dir<whole line>/file<whole line>
: separator
so it generates list of dir1/file1 ... dir100/file100
xargs - pass input as arguments to ...
cat - so it will execute cat dir1/file1 dir2/file2 ... dir100/file100.
This code should do the trick;
for((i=1;i<=`ls -l | wc -l`;i++)); do cat dir${i}/file${i} >> output; done
I made an example of what you're describing about your directory structure and files. Create directories and files with It's own content.
for ((i=1;i<=100;i++)); do
mkdir "$i" && touch "$i/file$i" && echo content of "$(pwd) $i" > "$i/file$i"
done
Check the created directories.
ls */*
ls */* | sort -n
If you see that the directories and files are created then proceed to the next step.
This solution does not involve any external command from the shell except of course cat :-)
Now we can check the contents of each files using bash syntax.
i=1
while [[ -e "$i" ]]; do
cat "$i"/*
((i++))
done
This code was tested in dash.
i=1
while [ -e "$i" ]; do
cat "$i"/*
i=$((i+1))
done
Just add the redirection of the output to the file after the done.
You can add some more test if you like see help test
One more thing :-), you can just check the contents using tail and brace expansion
tail -n +1 {1..100}/*
Using cat also you can redirect the output already, just remember brace expansion is bash3+ feature/syntax.
cat {1..100}/*

How to determine if a line contains a character in bash?

I would like to make a bash script that uses 2 arguments, file1 file2 that copies all lines from the file1 that contains the letter b into file2 . I have found the solution to determine if a string is contains the letter
if [[ $string == *"b"* ]]; then
echo "It's there!"
fi
I just can figure how to apply this code to my problem, and run through each line of a random file.
In the course description i have found that this problem can be solved with the usage of head -n tail -n cat echo wc -c wc -l wc -w if case test , but we don't have to limit ourselves to the usage of just these commands.
This is the reason why grep has been invented:
grep "b" file1.txt >>file2.txt
(This copies all lines from file1.txt, containing the character b, to file2.txt)

shell script to read contain from file and grep on other file

I am working on shell, I want to write one liner which will read the file contents of file A and execute grep command on file B.
for example, suppose there are two file
dataFile.log which have following value
abc
xyz
... and so on
now read abc and grep on searchFile.log like grep abc searchFile.log
I have shell script for the same but want one liner for it
for i in "cat dataFile.log" do grep $i searchFile.log done;
try this:
grep -f dataFile.log searchFile.log
Note that if you want to grep as fixed string, you need -F, if you want to match the text in dataFile.log as regex, use -E or -P
How about the following: it even ignores blank lines and # comments:
while read FILE; do if [[ "$FILE" != [/a-zA-Z0-9]* ]]; do continue; fi; grep -h pattern "$FILE"; done;
Beware: have not compiled this.
You can use grep -f option:
cat dataFile.log | grep -f searchFile.log
Edit
OK, now I understand the problem. You want to use every line from dataFile.log to grep in searchFile.log. I also see you have value1|value2|..., so instead of grep you need egrep.
Try with this:
for i in `cat dataFile.log`
do
egrep "$i" searchFile.log
done
Edit 2
Following chepner suggestion:
egrep -f dataFile.log searchFile.log

Bash and filenames with spaces

The following is a simple Bash command line:
grep -li 'regex' "filename with spaces" "filename"
No problems. Also the following works just fine:
grep -li 'regex' $(<listOfFiles.txt)
where listOfFiles.txt contains a list of filenames to be grepped, one
filename per line.
The problem occurs when listOfFiles.txt contains filenames with
embedded spaces. In all cases I've tried (see below), Bash splits the
filenames at the spaces so, for example, a line in listOfFiles.txt
containing a name like ./this is a file.xml ends up trying to run
grep on each piece (./this, is, a and file.xml).
I thought I was a relatively advanced Bash user, but I cannot find a
simple magic incantation to get this to work. Here are the things I've
tried.
grep -li 'regex' `cat listOfFiles.txt`
Fails as described above (I didn't really expect this to work), so I
thought I'd put quotes around each filename:
grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Bash interprets the quotes as part of the filename and gives "No such
file or directory" for each file (and still splits the filenames with
blanks)
for i in $(<listOfFiles.txt); do grep -li 'regex' "$i"; done
This fails as for the original attempt (that is, it behaves as if the
quotes are ignored) and is very slow since it has to launch one 'grep'
process per file instead of processing all files in one invocation.
The following works, but requires some careful double-escaping if
the regular expression contains shell metacharacters:
eval grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Is this the only way to construct the command line so it will
correctly handle filenames with spaces?
Try this:
(IFS=$'\n'; grep -li 'regex' $(<listOfFiles.txt))
IFS is the Internal Field Separator. Setting it to $'\n' tells Bash to use the newline character to delimit filenames. Its default value is $' \t\n' and can be printed using cat -etv <<<"$IFS".
Enclosing the script in parenthesis starts a subshell so that only commands within the parenthesis are affected by the custom IFS value.
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -li 'regex'
The -0 option on xargs tells xargs to use a null character rather than white space as a filename terminator. The tr command converts the incoming newlines to a null character.
This meets the OP's requirement that grep not be invoked multiple times. It has been my experience that for a large number of files avoiding the multiple invocations of grep improves performance considerably.
This scheme also avoids a bug in the OP's original method because his scheme will break where listOfFiles.txt contains a number of files that would exceed the buffer size for the commands. xargs knows about the maximum command size and will invoke grep multiple times to avoid that problem.
A related problem with using xargs and grep is that grep will prefix the output with the filename when invoked with multiple files. Because xargs invokes grep with multiple files one will receive output with the filename prefixed, but not for the case of one file in listOfFiles.txt or the case of multiple invocations where the last invocation contains one filename. To achieve consistent output add /dev/null to the grep command:
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -i 'regex' /dev/null
Note that was not an issue for the OP because he was using the -l option on grep; however it is likely to be an issue for others.
This works:
while read file; do grep -li dtw "$file"; done < listOfFiles.txt
With Bash 4, you can also use the builtin mapfile function to set an array containing each line and iterate on this array:
$ tree
.
├── a
│ ├── a 1
│ └── a 2
├── b
│ ├── b 1
│ └── b 2
└── c
├── c 1
└── c 2
3 directories, 6 files
$ mapfile -t files < <(find -type f)
$ for file in "${files[#]}"; do
> echo "file: $file"
> done
file: ./a/a 2
file: ./a/a 1
file: ./b/b 2
file: ./b/b 1
file: ./c/c 2
file: ./c/c 1
Though it may overmatch, this is my favorite solution:
grep -i 'regex' $(cat listOfFiles.txt | sed -e "s/ /?/g")
Do note that if you somehow ended up with a list in a file which has Windows line endings, \r\n, NONE of the notes above about the input file separator $IFS (and quoting the argument) will work; so make sure that the line endings are correctly \n (I use scite to show the line endings, and easily change them from one to the other).
Also cat piped into while file read ... seems to work (apparently without need to set separators):
cat <(echo -e "AA AA\nBB BB") | while read file; do echo $file; done
... although for me it was more relevant for a "grep" through a directory with spaces in filenames:
grep -rlI 'search' "My Dir"/ | while read file; do echo $file; grep 'search\|else' "$ix"; done

Resources