Bash for loop pull text from file recursively - bash

I am having trouble writing a Bash for loop script that can extract the contents of a specific file that is common to many child directories under a parent directory.
Directory structure:
/Parent/child/grand_child/great_grand_child/file
In which there are many child, grandchild, and great grandchild folders.
I want my script to do (in psuedo-code):
For EVERY grand_child folder, in EVERY child folder:
Search through ONLY ONE great_grand_child folder
find the file named 0001.txt
print the text in row 10 of 0001.txt to an output file
In the next Column of the output file, print the full directory path to the file that the text was extracted from.
My script so far:
for i in /Parent/**; do
if [ -d "$i" ]; then
echo "$i"
fi
done
Can I have some help designing this script?
So far this gives me the path to each grand_child folder, but I don't know how to isolate just one great_grand_child folder, and then ask for text in row 10 of the 0001.txt file inside the great_grand_child folder.

# For every grandchild directory like Parent/Child/Grandchild
for grandchild in Parent/*/*
do
# Look for a file like $grandchild/Greatgrandchild/0001.txt
for file in "$grandchild/"*/0001.txt
do
# If there is no such file, just skip this Grandchild directory.
if [ ! -f "$file" ]
then
echo "Skipping $grandchild, no 0001.txt files" >&2
continue
fi
# Otherwise print the 10th line and the file that it came from.
awk 'FNR == 10 { print $0, FILENAME }' "$file"
# Don't look at any more 0001.txt files in this Grandchild directory,
# we only care about one of them.
break
done
done

Given that the names are sane (no spaces or other awkward characters), then I'd probably go with:
find /Parent -name '0001.txt' |
sort -t / -k1,1 -k2,2 -k3,3 -u |
xargs awk 'FNR == 10 { print $0, FILENAME }' > output.file
Find the files named 0001.txt under /Parent. Sort the list so that there is just one entry per /Parent/Child/Grandchild. Run awk as often as necessary, printing line 10 of each file along with the file name. Capture the output in output.file.

Related

Batch Rename a file based of part of name of second file

I want to batch rename some file based of Part of Name of other files
let me explain my question with a example, I think its better in this way
I have some file with these name in a folder
sc_gen_08-bigfile4-0-data.txt
signal_1_1-bigfile8.1-0-data.txt
and these file in other folder
sc_gen_08-acaaf2d4180b743b7b642e8c875a9765-1-data.txt
signal_1_1-dacaaf280b743b7b642e8c875a9765-4-data.txt
I want to batch rename first files to name of second files, how can I do this? also file in both first and second folder have common in name
name(common in file in both folder)-[only this part is diffrent in each file]-data.txt
Thanks (sorry if its not a good question for everyone, but its a question for me)
Let's name the original folder as "folder1" and the other folder as "folder2". Then would you please try the following:
#!/bin/bash
folder1="folder1" # the original folder name
folder2="folder2" # the other folder name
declare -A map # create an associative array
for f in "$folder2"/*-data.txt; do # find files in the "other folder"
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
map[$common]=$f # associate common name with full filename
done
for f in "$folder1"/*-data.txt; do # find files in the original folder
f=${f##*/} # remove directory name
common=${f%%-*} # extract the common substring
mv -- "$folder1/$f" "$folder1/${map[$common]}"
# rename the file based on the value in map
done
If your files are all called as you mentioned. I have created the next script.
It is located following the next structure.
root#vm:~/test# ll
folder1/
folder2/
script.sh
The script is the next:
#Declare folders
folder1=./folder1
folder2=./folder2
#Create new folder if it does not exist
if [ ! -d ./new ]; then
mkdir ./new;
fi
#Iterate over first directory
for file1 in folder1/*; do
#Iterate over second directory
for file2 in folder2/*; do
#Compare begining of each file, if they match, they will be copied.
if [[ $(basename $file1 | cut -f1 -d-) == $(basename $file2 | cut -f1 -d-) ]]; then
echo $(basename $file1) $(basename $file2) "Match"
cp folder1/$(basename $file1) new/$(basename $file2)
fi
done
done
It creates a folder called new and will copy all your files there. If you want to delete them, use mv instead. But I didn't want to use mv in the first attempt just in case to get some undesired effect.

Bash; How to combine multiple files into one file

I have multiple file in one directory, I want to combine each file into a single file using Bash. The output need to contain the file name and then list its contents. Example would be
$ cat File 1
store
$ cat File 2
bank
$ cat File 3
car
Desired output is in a single file named master
$ cat master
File 1
store
File 2
bank
File 3
car
for FILE in "File 1" "File 2" "File 3"; do
echo "$FILE"
cat "$FILE"
done > master
What you have asked for is what cat is meant for; it's short for concatenate, because it concatenates the contents of files together.
But it doesn't inject the filenames into the output. If you want the filenames there, your best bet is probably a loop:
for f in "File 1" "File 2" "File 3"; do
printf '%s\n' "$f"
cat "$f"
done > master
This will do the job
for f in File{1..3} ; do
echo $f >> master;
cat $f >> master;
done
With gnu sed
sed -s '1F' *
'-s'
'--separate'
By default, 'sed' will consider the files specified on the command
line as a single continuous long stream. This GNU 'sed' extension
allows the user to consider them as separate files: range addresses
(such as '/abc/,/def/') are not allowed to span several files, line
numbers are relative to the start of each file, '$' refers to the
last line of each file, and files invoked from the 'R' commands are
rewound at the start of each file.
'F'
Print out the file name of the current input file (with a trailing
newline).

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.
The first file will contain a list of files in folder B that are newer than folder A, along with files that are different/orphans in folder B against folder A
The second file will contain a list of files in folder A that are newer than folder B, along with files that are different/orphans in folder A against folder B
How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
I've also tried this
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
List of files in directory B (new/) that are newer than directory A (old/):
find new -newermm old
This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.
Files that are missing in directory B (new/) but are present in directory A (old/):
A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
|sed "/^+\//!d; s::$A/:"
This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).
Here is a bash script that will create the file:
#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories
if [ -n "$3" ]; then # the optional 3rd argument is the output file
OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
OUTPUT="${2////_}-newer-than-${1////_}"
fi
{
find "$2" -newermm "$1"
diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
|sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"
First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.
This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).
You can get your first and second files by changing the order of arguments as you invoke this script.

bash to update filename in directory based on partial match to another

I am trying to use bash to rename/update the filename of a text file in /home/cmccabe/Desktop/percent based on a partial match of digits with another text file in /home/cmccabe/Desktop/analysis.txt. The match will always be in either lines 3,4,or 5 of this file. I am not able to do this but hopefully the 'bash` below is a start. Thank you :).
text file in /home/cmccabe/Desktop/percent - there could be a maximum of 3 files in this directory
00-0000_fbn1_20xcoverage.txt
text file in /home/cmccabe/Desktop/analysis.txt
status: complete
id names:
00-0000_Last-First
01-0101_LastN-FirstN
02-0202_La-Fi
desired result in /home/cmccabe/Desktop/percent
00-0000_Last-First_fbn1_20xcoverage.txt
bash
for filename in /home/cmccabe/Desktop/percent/*.txt; do echo mv \"$filename\" \"${filename//[0-9]-[0-9]/}\"; done < /home/cmccabe/Desktop/analysis.txt
Using a proper Process-Substitution syntax with a while-loop,
You can run the script under /home/cmccabe/Desktop/percent
#!/bin/bash
# ^^^^ needed for associative array
# declare the associative array
declare -A mapArray
# Read the file from the 3rd line of the file and create a hash-map
# as mapArray[00-0000]=00-0000_Last-First and so on.
while IFS= read -r line; do
mapArray["${line%_*}"]="$line"
done < <(tail -n +3 /home/cmccabe/Desktop/analysis.txt)
# Once the hash-map is constructed, rename the text file accordingly.
# echo the file and the name to be renamed before invoking the 'mv'
# command
for file in *.txt; do
echo "$file" ${mapArray["${file%%_*}"]}"_${file#*_}"
# mv "$file" ${mapArray["${file%%_*}"]}"_${file#*_}"
done
This is another similar bash approach:
while IFS="_" read -r id newname;do
#echo "id=$newid - newname=$newname" #for cross check
oldfilename=$(find . -name "${id}*.txt" -printf %f)
[ -n "$oldfilename" ] && echo mv \"$oldfilename\" \"${id}_${newname}_${oldfilename#*_}\";
done < <(tail -n+3 analysis)
We read the analysis file and we split each line (i.e 00-0000_Last-First) to two fields using _ as delimiter:
id=00-000
newname=Last-First
Then using this file id we read from file "analysis" we check (using find) to see if a file exists starting with the same id.
If such a file exists, it's filename is returned in variable $oldfilename.
If this variable is not empty then we do the mv.
tail -n+3 is used to ignore the first three lines of the file results.txt
Test this solution online here

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positiĆ³n i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

Resources