Copy file with filename based on grep output - bash

I have a collection of files that all have a specific sequence in them. The files are named sequentially, and I want to copy over the first instance of each file that has a unique sequence.
For example,
1.txt Content: 1[Block]Alpha[/Block]1
2.txt Content: 2[Block]Beta[/Block]2
3.txt Content: 3[Block]Charlie[/Block]3
4.txt Content: 4[Block]Alpha[/Block]4
I want the output to be
Alpha.txt Content: 1[Block]Alpha[/Block]1
Beta.txt Content: 2[Block]Beta[/Block]2
Charlie.txt Content: 3[Block]Charlie[/Block]3
4.txt is missing, as it has 'Alpha' in it which a previous file already matched on.
Currently, I Have the following:
ls | sort -r | xargs grep -oE -m 1 '[Block].{0,40}[/Block]'
#which returns:
1.txt:[Block]Alpha[Block]
2.txt:[Block]Beta[Block]
3.txt:[Block]Charlie[Block]
4.txt:[Block]Alpha[Block]
I want to separate the filename from the left of the ':' and rename it to either everything to the right of it (including Block).txt, or just Alpha.txt (for example).
cp has -n flag for no overwriting, so as long as I do it in sequence i should have no issue there, but I am a bit lost how to continue

Here is a solution that uses one awk process to do the search and extract the filenames and the text between blocks. For the first occurence, it checks if the matched text has been used already, if not it prints, and goes to next file. Output is piped to xargs -n2 with the cp command.
#!/bin/bash
awk '/\[Block\].*\[\/Block\]/ {
gsub(/^.*\[Block\]/,""); gsub(/\[\/Block\].*$/,"")
if (!a[$0]++) print FILENAME, $0 ".txt"; nextfile
}' *.txt | xargs -n2 echo cp -n --
Note: remove echo after you are done with testing.
Testing with your sample files:
> sh test.sh
cp -n -- 1.txt Alpha.txt
cp -n -- 2.txt Beta.txt
cp -n -- 3.txt Charlie.txt

I your case, you want to rename your files in a directory with pattern matched from content of those files, and remove a file that duplicated with other?
I have tested on directory /tmp/test. In this dir, i have 4 file (1.txt 2.txt 3.txt, 4.txt) and write a shell script to perform requirement.
shell script as below:
#/bin/bash
cd /tmp/test
files=$(ls)
for i in $files; do
pattern=$(cat $i | sed "s/Block//g" | grep -o "[a-Z][a-Z]*")
if ! echo $pattern_list | grep -w $pattern; then
echo "Rename $i to ${pattern}.txt"
mv $i ${pattern}.txt
pattern_list+="$pattern "
else
rm $i
fi
done
Brief explain:
List all current file in /tmp/test
Read each file to capture file name and pattern (Alpha, Beta,
Charlie, ...)
Rename the file with new pattern
Remove the file if pattern is duplicated
The Result as below:
sh /tmp/myscript.sh
Rename 1.txt to Alpha.txt
Rename 2.txt to Beta.txt
Rename 3.txt to Charlie.txt
Alpha Beta Charlie
ls
Alpha.txt Beta.txt Charlie.txt

Related

How to save a list of all files in a directory in a single text file and add prefixes and suffixes?

I am trying to save a list of files in a directory into a single file using
ls > output.txt
Let's say we have in the directory:
a.txt
b.txt
c.txt
I want to modify the names of these files in the output.txt to be like:
1a.txt$
1b.txt$
1c.txt$
Another easy way use AWK to change content and save to file via .tmp
This script will print content how you want. Just add "1" and "$" to begining and ending accordingly.
cat output.txt | awk '{print "1"$1"$"}'
And then you can save to original file as you want by extending command && (if success then next )
cat output.txt | awk '{print "1"$1"$"}' > output.txt.tmp && mv output.txt.tmp output.txt
#!/bin/sh -x
for f in *.txt
do
nf=$(echo "${f}" | sed 's#^#1#')
mv -v "${f}" "${nf}"
done

cat multiple files in separate directories file1 file2 file3....file100 using loop in bash script

I have several files in multiple directories like in directory 1/file1 2/file2 3/file3......100/file100. I want to cat all those files to a single file using loop over index in bash script. Is there easy loop for doing so?
Thanks,
seq 100 | sed 's:.*:dir&/file&:' | xargs cat
seq 100 generates list of numbers from 1 to 100
sed
s substitutes
: separates parts of the command
.* the whole line
: separator. Usually / is used, but it's used in replacement string.
dir&/file& by dir<whole line>/file<whole line>
: separator
so it generates list of dir1/file1 ... dir100/file100
xargs - pass input as arguments to ...
cat - so it will execute cat dir1/file1 dir2/file2 ... dir100/file100.
This code should do the trick;
for((i=1;i<=`ls -l | wc -l`;i++)); do cat dir${i}/file${i} >> output; done
I made an example of what you're describing about your directory structure and files. Create directories and files with It's own content.
for ((i=1;i<=100;i++)); do
mkdir "$i" && touch "$i/file$i" && echo content of "$(pwd) $i" > "$i/file$i"
done
Check the created directories.
ls */*
ls */* | sort -n
If you see that the directories and files are created then proceed to the next step.
This solution does not involve any external command from the shell except of course cat :-)
Now we can check the contents of each files using bash syntax.
i=1
while [[ -e "$i" ]]; do
cat "$i"/*
((i++))
done
This code was tested in dash.
i=1
while [ -e "$i" ]; do
cat "$i"/*
i=$((i+1))
done
Just add the redirection of the output to the file after the done.
You can add some more test if you like see help test
One more thing :-), you can just check the contents using tail and brace expansion
tail -n +1 {1..100}/*
Using cat also you can redirect the output already, just remember brace expansion is bash3+ feature/syntax.
cat {1..100}/*

How to rename a CSV file from a value in the CSV file

I have 100 1-line CSV files. The files are currently labeled AAA.txt, AAB.txt, ABB.txt (after I used split -l 1 on them). The first field in each of these files is what I want to rename the file as, so instead of AAA, AAB and ABB it would be the first value.
Input CSV (filename AAA.txt)
1234ABC, stuff, stuff
Desired Output (filename 1234ABC.csv)
1234ABC, stuff, stuff
I don't want to edit the content of the CSV itself, just change the filename
something like this should work:
for f in ./* ; do new_name=$(head -1 $f | cut -d, -f1); cp $f dir/$new_name
move them into a new dir just in case something goes wrong, or you need the original file names.
starting with your original file before splitting
$ awk -F, '{print > ($1".csv")}' originalFile.csv
and do all in one shot.
This will store the whole input file into the colum1.csv of the inputfile.
awk -F, '{print $0 > $1".csv" }' aaa.txt
In a terminal, changed directory, e.g. cd /path/to/directory that the files are in and then use the following compound command:
for f in *.txt; do echo mv -n "$f" "$(awk -F, '{print $1}' "$f").cvs"; done
Note: There is an intensional echo command that is there for you to test with, and it will only print out the mv command for you to see that it's the outcome you wish. You can then run it again removing just echo from the compound command to actually rename the files as desired via the mv command.

Remove exact name from txt file

I have the following code which removes certain lines from the file
ret=$(grep -m2 "$var" log.txt | tail -n1 )
mv $var $ret
grep -F -v $var log.txt > log.txt.tmp
mv log.txt.tmp log.txt
my log file looks like this
2.txt
/home/etc/2.txt
basically the file name and its original location
I want to restore the file, and my program does do that, but I also want
to delete the name of the file and its location from log.txt file
now the above code works, but it removes all instances for example
it will remove 2.txt and the path which is fine, but if i had a file called 22.txt a completely different file it removes that as well. I need it to just remove 2.txt but appears to be removing line with "2.txt' in it.
is it possible to just remove 2.txt and since the directory has /2.txt in it remove that entire line as well?
So in you log.txt file, you want to :
match 2.txt on a single line
match 2.txt preceded by some path.
You can do this in grep:
# match all but lines beginning with $var, so 2.txt
grep -F -v "^$var" log.txt > log.txt.tmp
# match all but lines containing "/$var", so "/2.txt"
grep -F -v "/$var" log.txt.tmp > log.txt

Print the contents of files from the output of a program

Let's say I have a program foo that finds files with a certain specification and that the output of running foo is:
file1.txt
file2.txt
file3.txt
I want to print the contents of each of those files (preferably with the file name prepended). How would I do this? I would've thought piping it to cat like so:
foo | cat
would work but it doesn't.
EDIT:
My solution to this problem prints out each file and prepends the filename to each line of output is:
foo | xargs grep .
This gets output similar to:
file1.txt: Hello world
file2.txt: My name is foobar.
<your command> | xargs cat
You need xargs here:
foo | xargs cat
In order to allow for file names that have spaces in them, you'll need something like this:
#/bin/bash
while read -r file
do
# Check for existence of the file before using cat on it.
if [[ -f $file ]]; then
cat "$file"
# Don't bother with empty lines
elif [[ -n $file ]]; then
echo "There is no file named '$file'"
fi
done
Put this a script. Let's call it myscript.sh. Then, execute:
foo | myscript.sh
foo | xargs grep '^' /dev/null
why grep on ^ ? to display also empty lines (replace with "." if you want only non-empty lines)
why is there a /dev/null ? so that, in addition to any filename provided in "foo" output, there is at least 1 additionnal file (and a file NOT maching anything, such as /dev/null). That way there is AT LEAST 2 filenames given to grep, and thus grep will always show the matching filename.

Resources