Get only file name in variable in a for loop - bash

I have a for loop that writes text in a file :
for f in $DATA_DIRECTORY
do
echo ' input.'$f '{'
echo ' copy = ${source.copy}"'$f'.CPY"'
echo ' data = ${source.data}"'$f'.CSV"'
echo ' }'
done
But the "f" variable here looks like this :
/path/to/my/file/FILE.TXT
What i want to get is only the name of the file, not the full path and its extension:
FILE
By the way i tried to change my f variable like this so i dont get the extension but it did not work :
{$f%%.*}

You need two lines; chained operators aren't allowed.
f=${f##*/} # Strip the directory
f=${f%%.*} # Strip the extensions
Or, you can use the basename command to strip the directory and one extension (assuming you know what it is) in one line.
f=$(basename "$f" .txt)

Related

Rename multiple datetime files in Unix by inserting - and _ characters

I have many files in a directory that I want to rename so that they are recognizable according to a certain convention:
SURFACE_OBS:2019062200
SURFACE_OBS:2019062206
SURFACE_OBS:2019062212
SURFACE_OBS:2019062218
SURFACE_OBS:2019062300
etc.
How can I rename them in UNIX to be as follows?
SURFACE_OBS:2019-06-22_00
SURFACE_OBS:2019-06-22_06
SURFACE_OBS:2019-06-22_12
SURFACE_OBS:2019-06-22_18
SURFACE_OBS:2019-06-23_00
A bash shell loop using mv and parameter expansion could do it:
for file in *:[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]
do
prefix=${file%:*}
suffix=${file#*:}
mv -- "${file}" "${prefix}:${suffix:0:4}-${suffix:4:2}-${suffix:6:2}_${suffix:8:2}"
done
This loop picks up every file that matches the pattern:
* -- anything
: -- a colon
[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]] -- 10 digits
... and then renames it by inserting dashes and and underscore in the desired locations.
I've chosen the wildcard for the loop carefully so that it tries to match the "input" files and not the renamed files. Adjust the pattern as needed if your actual filenames have edge cases that cause the wildcard to fail (and thus rename the files a second time).
#!/bin/bash
strindex() {
# get position of character in string
x="${1%%"$2"*}"
[[ "$x" = "$1" ]] && echo -1 || echo "${#x}"
}
get_new_filename() {
# change filenames like: SURFACE_OBS:2019062218
# into filenames like: SURFACE_OBS:2019-06-22_18
src_str="${1}"
# add last underscore 2 characters from end of string
final_underscore_pos=${#src_str}-2
src_str="${src_str:0:final_underscore_pos}_${src_str:final_underscore_pos}"
# get position of colon in string
colon_pos=$(strindex "${src_str}" ":")
# get dash locations relative to colon position
y_dash_pos=${colon_pos}+5
m_dash_pos=${colon_pos}+8
# now add dashes in date
src_str="${src_str:0:y_dash_pos}-${src_str:y_dash_pos}"
src_str="${src_str:0:m_dash_pos}-${src_str:m_dash_pos}"
echo "${src_str}"
}
# accept path as argument or default to /tmp/baz/data
target_dir="${1:-/tmp/baz/data}"
while read -r line ; do
# since file renaming depends on position of colon extract
# base filename without path in case path has colons
base_dir=${line%/*}
filename_to_change=$(basename "${line}")
echo "mv ${line} ${base_dir}/$(get_new_filename "${filename_to_change}")"
# find cmd attempts to exclude files that have already been renamed
done < <(find "${target_dir}" -name 'SURFACE*' -a ! -name '*_[0-9]\{2\}$')

Bash string substitution with %

I have a list of files named with this format:
S2_7-CHX-2-5_Chr5.bed
S2_7-CHX-2-13_Chr27.bed
S2_7-CHX-2-0_Chr1.bed
I need to loop through each file to perform a task. Previously, I had named them without the step 2 indicator ("S2"), and this format had worked perfectly:
for FASTQ in *_clean.bam; do
SAMPLE=${FASTQ%_clean.bam}
echo $SAMPLE
echo $(samtools view -c ${SAMPLE}_clean.bam)
done
But now that I have the S2 preceding what I would like to set as the variable, this returns a list of empty "SAMPLE" variables. How can I rewrite the following code to specify only S2_*.bed?
for FASTQ in S2_*.bed; do
SAMPLE=${S2_FASTQ%.bed}
echo $SAMPLE
done
Edit: I'm trying to isolate the unique name from each file, for example "7-CHX-2-13_Chr27" so that I can refer to it later. I can't use the "S2" as part of this because I want to rename the file with "S3" for the next step, and so on.
Example of what I'm trying to use it for:
for FASTQ in S2_*.bed; do
SAMPLE=${S2_FASTQ%.bed}
echo $SAMPLE
#rename each mapping position with UCSC chromosome name using sed
while IFS=, read -r f1 f2; do
#rename each file
echo " sed "s/${f1}.1/chr${f2}/g" S2_${SAMPLE}_Chr${f2}.bed > S3_${SAMPLE}_Chr${f2}.bed" >> $SCRIPT
done < $INPUT
done
The name of the variable is still $FASTQ, the S2_ is not part of the variable name, but its value.
sample=${FASTQ%.bed}
# ~~~~~|~~~~
# | | |
# Variable | What to remove
# name |
# Remove
# from the right
If you want to remove the S2_ from the $sample, use left hand side removal:
sample=${sample#S2_}
The removals can't be combined, you have to proceed in two steps.
Note that I use lower case variable names. Upper case should be reserved for environment and internal shell variables.

If found a pattern, how to paste the last line before that contain another pattern in bash?

After put a list of all folders and subfolders in a list.txt with the command ls -R, I have this kind of data:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01:
DSCF0214.JPG
DSCF0215.JPG
DSCF0231.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae:
Sp_02
Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02:
DSCF8981.JPG
DSCF8988.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03:
DSCF0638.JPG
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae:
Sp_07
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07:
DSCF0724.JPG
I would like to add a line code that which will allow to add the path before the pictures ("XXX.JPG").
So I tried to say in bash: "if there is the ".JPG" pattern, paste before the picture name the "last line before" that contain "/Sp*". And replace : by /.
In order to obtain this:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0214.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0215.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0231.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae:
Sp_02
Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02/DSCF8981.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02/DSCF8988.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03/DSCF0638.JPG
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae:
Sp_07
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07:
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07/DSCF0724.JPG
I did'nt found a way to explain to bash "the last line before" that contain "/Sp*".
This is my code:
# Find the .JPG pattern and catch the picture name ("(.*\).JPG") and add "the last line before" that contain "/Sp*" and reput the .JPG pattern with the picture name:
sed 's/\(.*\).JPG/"the last line before" that contain "/Sp*""\1.JPG/' list.txt > list2.txt
sed -e 's/\:/\//g' list2.txt > list3.txt
Any advice to help me to complete this part of code is greatly appreciated.
While there are be a better alternative for getting the list of files, if that is not an option, for you specific problem if would write a simple bash script.
prefix=""
outfile=list2.txt
> $outfile # clean any existing file content, remove if not expected
while read -r line; do
if [[ $line =~ (.*):$ ]]; then
echo $line >> $outfile
prefix="${BASH_REMATCH[1]}"
elif [[ $line =~ \.JPG$ ]]; then
echo "${prefix}/${line}" >> $outfile
else
echo "${line}" >> $outfile
fi
done < list.txt
If I understand your question correctly you are actually looking for a way to find all files in this folder and all sub-folders and get the full path to them. If that is the case you should use find instead of ls. Like:
find .
or if you do want the full path from root you could do:
find /home/yourname/thedirectory/you/are/looking/in
if your data in 'd' file, try gnu sed:
sed -E '/Sp_[0-9]+:$/{h;p;:c N;/\.JPG$/{s!:\n\s*!/!p;g;bc}; z}' d
Although misguided, it is possible to do with sed :
sed -n -e '/:$/{p;s#:$#/#;h}' -e '/\.JPG$/{H;x;h;s/\n//;p;x;s/\n.*//;h}'
You can try it here.
The first expression is used when a directory is encountered (based on the fact that the line ends with :), prints it and saves the directory path in the hold buffer after having replaced the : by the / path-separator.
The second expression is used when a .JPG file is encountered, and does this sequence of action :
appends the line to the hold buffer (pattern space : picture.JPG ; hold buffer : dir/\npicture.JPG)
exchange the pattern space and the hold buffer (pattern space : dir/\npicture.JPG ; hold buffer : picture.jpg)
saves the pattern space to the hold buffer (pattern space : dir/\npicture.JPG ; hold buffer : dir/\npicture.JPG)
removes the linefeed from the pattern space (pattern space : dir/picture.JPG ; hold buffer : dir/\npicture.JPG)
prints the pattern space (buffers unchanged)
exchange the hold buffer and pattern space (pattern space : dir/\npicture.JPG ; hold buffer : dir/picture.JPG)
removes the linefeed and what follows from the pattern space (pattern space : dir/ ; hold buffer : dir/picture.JPG)
saves the pattern space to the hold buffer (pattern space : dir/ ; hold buffer : dir/)

Can I find similar named files ignoring case, dashes, spaces or other characters?

EDIT 2:
lets say I have 2 directories one contains:
/dir1/Test File Name.txt
/dir1/This is anotherfile.txt
/dir1/And-Another File.txt
Directory 2 looks like:
/dir2/test-File_Name.txt
/dir2/test file_Name.txt
/dir2/This Is another file.txt
/dir2/And another_file.txt
How can I find (or match) files that are named similar, in this example file 1 from dir1 would match with file 1 and 2 on dir2 and so on
Trying to do this in bash. Say I have a file named "Test File 1.txt" I want to find any file that is named similar like:
test-file 1.txt
test file 1.txt
Test-file-1.txt
test-file_1.zip
etc etc
I can ignore case with find ./files/ -maxdepth 1 -iname $FILE but don't know how to ignore all the other characters.
Is there a way I can do this in bash?
EDIT:
Sorry, I forgot to mention that I need to iterate on all files, the file name is not always the same, I just used an example.
so it could be named "Test File 1.txt" or it could also be named something completely different "Something Else.txt"
So I want to look for all similar named files using a complete file name as base, but this file name can be different, hope I make more sense.
If Perl is your option, please try the following:
perl -e '
#files1 = glob "dir1/*";
#files2 = glob "dir2/*";
foreach (#files2) {
$f2 = $_;
s#.*/##; # remove directory name
# s#\..*?$##; # remove extension (wrong)
s#\.[^.]*$##; # remove extension (corrected)
s#[\W_]#[\\W_]?#g; # replace non-alphanumric chars
$pat = $_ . "\\.\\w+\$";
# print $pat, "\n"; # uncomment to see the regex pattern
foreach $f1 (#files1) {
if ($f1 =~ m#/$pat#i) {
print "$f1 <=> $f2\n";
}
}
}'
Output:
dir1/And-Another File.txt <=> dir2/And another_file.txt
dir1/Test File Name.txt <=> dir2/test file_Name.txt
dir1/Test File Name.txt <=> dir2/test-File_Name.txt
dir1/This is anotherfile.txt <=> dir2/This Is another file.txt
[Explanations]
The concept is to generate a regex pattern on the fly from a filename
in one directory and match it with the files in the other directory.
File extension is replaced with a pattern which matches it.
Non-alphanumeric character and underscore are replaced with a pattern
which matches them including the case the character is missing so that
anotherfile and another file match.
i option added to the pattern enables case-insensitive match.
You can see the generated regex by uncommenting the noted line.
The possible problem is we can not generate a pattern which matches with
another file from the filename anotherfile. In other words, the
matching is one-directional. A possible workaround is to neglect non-alphanumeric characters and underscores at all in matching. It may result in unexpected overmatching depending on the word and punctuation. We will need to specifically define the similarity to step further.
[Edit]
In order to get the result back to bash variables, please try:
while read -r -d "" line; do
# do something with the bash variable "line"
echo "$line"
done < <(
perl -e '
#files1 = glob "dir1/*";
#files2 = glob "dir2/*";
foreach (#files2) {
$f2 = $_;
s#.*/##; # remove directory name
# s#\..*?$##; # remove extension (wrong)
s#\.[^.]*$##; # remove extension (corrected)
s#[\W_]#[\\W_]?#g; # replace non-alphanumric chars
$pat = $_ . "\\.\\w+\$";
# print $pat, "\n"; # uncomment to see the regex pattern
foreach $f1 (#files1) {
if ($f1 =~ m#/$pat#i) {
push(#result, "$f1 <=> $f2");
# if you want just the list of filenames, comment out the line above
# and uncomment the line below
#push(#result, $f1, $f2);
}
}
}
print join("\0", #result) . "\0";
')
The results is stored in the bash variable line in line by line.
If you want to tweak the output format, please modify the line push(#result, ...).
[EDIT]
Modified to work with the following filename pairs:
"Sample Filename.txt" <=> "Sample Filename (100).txt"
"Sample.Filename.txt" <=> "Sample Filename.txt"
Here's the updated code:
while read -r -d "" line; do
# do something with the bash variable "line"
echo $line
done < <(
perl -e '
#files1 = glob "dir1/*";
#files2 = glob "dir2/*";
foreach (#files2) {
$f2 = $_;
s#.*/##; # remove directory name
s#\.[^.]*$##; # remove extension
s#\s*\(.*?\)##; # remove parenthesis if any
s#\s*\[.*?\]##; # remove square bracket if any
s#[\W_]#[\\W_]?#g; # replace non-alphanumric chars
$pat = $_ . "\\s?((\\(.*?\\))|(\\[.*?\\]))?" . "\\.\\w+\$";
#print $pat . "\n"; # uncomment to see the regex pattern
foreach $f1 (#files1) {
if ($f1 =~ m#/$pat#i) {
push(#result, "$f1 <=> $f2");
# if you want just the list of filenames, comment out the line above
# and uncomment the line below
#push(#result, $f1, $f2);
}
}
}
print join("\0", #result) . "\0";
')

Adding file information to an AWK comparison

I'm using awk to perform a file comparison against a file listing in found.txt
while read line; do
awk 'FNR==NR{a[$1]++;next}$1 in a' $line compare.txt >> $CHECKFILE
done < found.txt
found.txt contains full path information to a number of files that may contain the data. While I am able to determine that data exists in both files and output that data to $CHECKFILE, I wanted to be able to put the line from found.txt (the filename) where the line was found.
In other words I end up with something like:
File " /xxxx/yyy/zzz/data.txt "contains the following lines in found.txt $line
just not sure how to get the /xxxx/yyy/zzz/data.txt information into the stream.
Appended for clarification:
The file found.txt contains the full path information to several files on the system
/path/to/data/directory1/file.txt
/path/to/data/directory2/file2.txt
/path/to/data/directory3/file3.txt
each of the files has a list of parameters that need to be checked for existence before appending additional information to them later in the script.
so for example, file.txt contains the following fields
parameter1 = true
parameter2 = false
...
parameter35 = true
the compare.txt file contains a number of parameters as well.
So if parameter35 (or any other parameter) shows up in one of the three files I get it's output dropped to the Checkfile.
Both of the scripts (yours and the one I posted) will give me that output but I would also like to echo in the line that is being read at that point in the loop. Sounds like I would just be able to somehow pipe it in, but my awk expertise is limited.
It's not really clear what you want but try this (no shell loop required):
awk '
ARGIND==1 { ARGV[ARGC] = $0; ARGC++; next }
ARGIND==2 { keys[$1]; next }
$1 in keys { print FILENAME, $1 }
' found.txt compare.txt > "$CHECKFILE"
ARGIND is gawk-specific, if you don't have it add FNR==1{ARGIND++}.
Pass the name into awk inside a variable like this:
awk -v file="$line" '{... print "File: " file }'

Resources