Bash removes underscores from my filenames? - bash

I'm trying to move files from one S3 bucket to another and put them into a folder structure by date. Put simply all the files at the moment are going into one folder and that folder has over 500,000 files inside of and I now need to sort through all these files and put them into folders by month.
The file names are similar to:
"This_is_a_file_20150403.xml"
So I loop through all the files within an S3 bucket, tokensize and get the date. I create a yearmonth variable ignoring the day and move them into another S3 bucket. But the filename changes to:
"This is a file 20150403.xml"
So when I try to move it, AWS can't find the file. Why has bash removed the underscores from the filename? I tried temporarily storing the filename in tempFilename but it still had the underscores removed.
The code I have at the moment is:
#!/bin/bash
count=0
for filename in $(aws s3 ls s3://stagingbucket)
do
echo $filename
tempFilename=$filename
(IFS='_'; for word in $filename;
do
echo $filename
if [ "$count" -eq 2 ]; then
yearmonth=${word:0:6}
echo $tempFilename
aws s3 cp s3://stagingbucket/$filename s3://archivebucket/$yearmonth/
fi
count=$((count + 1))
done)
done
Any ideas?

Let's look at what your code actually does.
echo $foo
String-splits $foo, breaking it into pieces based on characters in IFS
Evaluates each piece as a glob expression
Passes each result of those glob expressions to echo as a separate argument
echo then prints those arguments with spaces between them.
Instead, use:
echo "$foo"
...to keep your string together -- and, likewise, quote all your other expansions as well.
For much the same reasons (unintended glob expressions), for word in $filename is evil; don't do that.
IFS=_ read -a words <<<"$filename"
for word in "${words[#]}"; do
echo "Processing $word"
done

Related

Bash script MV is disappearing files

I've written a script to go through all the files in the directory the script is located in, identify if a file name contains a certain string and then modify the filename. When I run this script, the files that are supposed to be modified are disappearing. It appears my usage of the mv command is incorrect and the files are likely going to an unknown directory.
#!/bin/bash
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]];then
filename= echo $FILE | head -c 15
combined_name="$filename$file_extension"
echo $combined_name
mv $FILE $combined_name
echo $FILE
fi
fi
done
I've done my best to go through the possible errors I've made in the MV command but I haven't had any success so far.
There are a couple of problems and several places where your script can be improved.
filename= echo $FILE | head -c 15
This pipeline runs echo $FILE adding the variable filename having the null string as value in its environment. This value of the variable is visible only to the echo command, the variable is not set in the current shell. echo does not care about it anyway.
You probably want to capture the output of echo $FILE | head -c 15 into the variable filename but this is not the way to do it.
You need to use command substitution for this purpose:
filename=$(echo $FILE | head -c 15)
head -c outputs only the first 15 characters of the input file (they can be on multiple lines but this does not happen here). head is not the most appropriate way for this. Use cut -c-15 instead.
But for what you need (extract the first 15 characters of the value stored in the variable $FILE), there is a much simpler way; use a form of parameter expansion called "substring expansion":
filename=${FILE:0:15}
mv $FILE $combined_name
Before running mv, the variables $FILE and $combined_name are expanded (it is called "parameter expansion"). This means that the variable are replaced by their values.
For example, if the value of FILE is abc def and the value of combined_name is mnp opq, the line above becomes:
mv abc def mnp opq
The mv command receives 4 arguments and it attempts to move the files denoted by the first three arguments into the directory denoted by the fourth argument (and it probably fails).
In order to keep the values of the variables as single words (if they contain spaces), always enclose them in double quotes. The correct command is:
mv "$FILE" "$combined_name"
This way, in the example above, the command becomes:
mv "abc def" "mnp opq"
... and mv is invoked with two arguments: abc def and mnp opq.
combined_name="$filename$file_extension"
There isn't any problem in this line. The quotes are simply not needed.
The variables filename and file_extension are expanded (replaced by their values) but on assignments word splitting is not applied. The value resulted after the replacement is the value assigned to variable combined_name, even if it contains spaces or other word separator characters (spaces, tabs, newlines).
The quotes are also not needed here because the values do not contain spaces or other characters that are special in the command line. They must be quoted if they contain such characters.
string_contains="dummy_axial_y_position"
string_dontwant="dummy_axial_y_position_time"
file_extension=".csv"
It is not not incorrect to quote the values though.
for FILE in *
do
if [[ "$FILE" == *"$string_contains"* ]];then
if [[ "$FILE" != *"$string_dontwant"* ]]; then
This is also not wrong but it is inefficient.
You can use the expression from the if condition directly in the for statement (and get rid of the if statement):
for FILE in *"$string_contains"*; do
if [[ "$FILE" != *"$string_dontwant"* ]]; then
...
If you have read and understood the above (and some of the linked documentation) you will be able to figure out yourself where were your files moved :-)

Bash Parameter Expansion - get part of directory path

Lets say I have this variable
FILE=/data/tenant/dummy/TEST/logs/2020-02-03_16.20-LSTRM.log
I'm trying to get the name of the 4th sub directory, in this example TEST
job=${FILE%/*/*} # This gives me /data/tenant/dummy/TEST
Then
name=${job##*/}
This give me exactly what I want - Test
However when I try to use this in for loop like this:
for FILE in "/data/tenant/dummy/*/logs/*$year-*"; do
job=${FILE%/*/*}
echo $job # /data/tenant/dummy/TEST(and few other directories, so far so good)
name=${job##*/}
echo $name
done
The result of echo $name shows the list of files in my current directory I'm in instead of TEST
Main issue is the double quotes groups all files into a single long string:
for FILE in "/data/tenant/dummy/*/logs/*$year-*"
You can see this if you do something like:
for FILE in "/data/tenant/dummy/*/logs/*$year-*"
do
echo "+++++++++++++++++++"
echo "${FILE}"
done
This should print ++++++++++++++ once followed by a single line with a long list of file names.
To process the files individually remove the double quotes, eg:
for FILE in /data/tenant/dummy/*/logs/*$year-*
It's also good practice to wrap individual variable references in double quotes, eg:
job="${FILE%/*/*}"
echo "$job"
name="${job##*/}"
echo "$name"

Comparing files in the same directory with same name different extension [duplicate]

This question already has answers here:
Looping over pairs of values in bash [duplicate]
(6 answers)
Closed 6 years ago.
I have a bash script that looks through a directory and creates a .ppt from a .pdf, but i want to be able to check to see if there is a .pdf already for the .ppt because if there is I don't want to create one and if the .pdf is timestamped older then the .ppt I want to update it. I know for timestamp I can use (date -r bar +%s) but I cant seem how to figure out how to compare the files with the same name if they are in the same folder.
This is what I have:
#!/bin/env bash
#checks to see if argument is clean if so it deletes the .pdf and archive files
if [ "$1" = "clean" ]; then
rm -f *pdf
else
#reads the files that are PPT in the directory and copies them and changes the extension to .pdf
ls *.ppt|while read FILE
do
NEWFILE=$(echo $FILE|cut -d"." -f1)
echo $FILE": " $FILE " "$NEWFILE: " " $NEWFILE.pdf
cp $FILE $NEWFILE.pdf
done
fi
EDITS:
#!/bin/env bash
#checks to see if argument is clean if so it deletes the .pdf and archive files
if [ "$1" = "clean" ]; then
rm -f *pdf lectures.tar.gz
else
#reads the files that are in the directory and copies them and changes the extension to .pdf
for f in *.ppt
do
[ "$f" -nt "${f%ppt}pdf" ] &&
nf="${f%.*}"
echo $f": " $f " "$nf: " " $nf.pdf
cp $f $nf.pdf
done
To loop through all ppt files in the current directory and test to see if they are newer than the corresponding pdf and then do_something if they are:
for f in *.ppt
do
[ "$f" -nt "${f%ppt}pdf" ] && do_something
done
-nt is the bash test for one file being newer than another.
Notes:
Do not parse ls. The output from ls often contains a "displayable" form of the filename, not the actual filename.
The construct for f in *.ppt will work reliably all file names, even ones with tabs, or newlines in their names.
Avoid using all caps for shell variables. The system uses all caps for its variables and you do not want to accidentally overwrite one. Thus, use lower case or mixed case.
The shell has built-in capabilities for suffix removal. So, for example, newfile=$(echo $file |cut -d"." -f1) can be replaced with the much more efficient and more reliable form newfile="${file%%.*}". This is particularly important in the odd case that the file's name ends with a newline: command substitution removes all trailing newlines but the bash variable expansions don't.
Further, note that cut -d"." -f1 removes everything after the first period. If a file name has more than one period, this is likely not what you want. The form, ${file%.*}, with just one %, removes everything after the last period in the name. This is more likely what you want when you are trying to remove standard extensions like ppt.
Putting it all together
#!/bin/env bash
#checks to see if argument is clean if so it deletes the .pdf and archive files
if [ "$1" = "clean" ]; then
rm -f ./*pdf lectures.tar.gz
else
#reads the files that are in the directory and copies them and changes the extension to .pdf
for f in ./*.ppt
do
if [ "$f" -nt "${f%ppt}pdf" ]; then
nf="${f%.*}"
echo "$f: $f $nf: $nf.pdf"
cp "$f" "$nf.pdf"
fi
done
fi

Handling spaces with 'cp'

I've got an external drive with over 1TB of project files on it. I need to reformat this drive so I can reorganize it, however before I do that I need to transfer everything. The issue is I'm on a Mac and the drive is formatted as NTFS so all I can do is copy from it. I have tried to simply just copy and paste in Finder but the drive seems to lock up after roughly 15 min of copying that way. So I decided to write a bash script to iterate through the all 1000+ files one at a time. This seems to work for files that are without spaces but skips when it hits one.
Here is what I've hacked together so far.. I'm not too advanced in bash so any suggestions would be great on how you handle the spaces.
quota=800
size=`du -sg /Users/work/Desktop/TEMP`
files="/Volumes/Lacie/EXR_files/*"
for file in $files
do
if [[ ${size%%$'\t'*} -lt $quota ]];
then
echo still under quota;
cp -v $file /Users/work/Desktop/TEMP_EXR;
du -sg /Users/work/Desktop/TEMP_EXR;
else
echo over quota;
fi
done
(I'm checking for directory size because I'm having to split this temporary copy onto a few different place before I copy it all back onto the one reformatted drive.)
Hope I'm not misunderstanding. If you have problem with space character in filename, quote it. If you want bash to expand parameters inside it, use double quote.
cp -v "$file" /Users/work/Desktop/TEMP_EXR
You can put all the file names in an array, then iterate over that.
quota=800
size=`du -sg /Users/work/Desktop/TEMP`
files=( /Volumes/Lacie/EXR_files/* )
for file in "${files[#]}"
do
if [[ ${size%%$'\t'*} -lt $quota ]];
then
echo still under quota;
cp -v "$file" /Users/work/Desktop/TEMP_EXR;
du -sg /Users/work/Desktop/TEMP_EXR;
else
echo over quota;
fi
done
The two things to note are 1) quoting the array expansion in the for list, and 2) quoting $file for the cp command.

Basename puts single quotes around variable

I am writing a simple shell script to make automated backups, and I am trying to use basename to create a list of directories and them parse this list to get the first and the last directory from the list.
The problem is: when I use basename in the terminal, all goes fine and it gives me the list exactly as I want it. For example:
basename -a /var/*/
gives me a list of all the directories inside /var without the / in the end of the name, one per line.
BUT, when I use it inside a script and pass a variable to basename, it puts single quotes around the variable:
while read line; do
dir_name=$(echo $line)
basename -a $dir_name/*/ > dir_list.tmp
done < file_with_list.txt
When running with +x:
+ basename -a '/Volumes/OUTROS/backup/test/*/'
and, therefore, the result is not what I need.
Now, I know there must be a thousand ways to go around the basename problem, but then I'd learn nothing, right? ;)
How to get rid of the single quotes?
And if my directory name has spaces in it?
If your directory name could include spaces, you need to quote the value of dir_name (which is a good idea for any variable expansion, whether you expect spaces or not).
while read line; do
dir_name=$line
basename -a "$dir_name"/*/ > dir_list.tmp
done < file_with_list.txt
(As jordanm points out, you don't need to quote the RHS of a variable assignment.)
Assuming your goal is to populate dir_list.tmp with a list of directories found under each directory listed in file_with_list.txt, this might do.
#!/bin/bash
inputfile=file_with_list.txt
outputfile=dir_list.tmp
rm -f "$outputfile" # the -f makes rm fail silently if file does not exist
while read line; do
# basic syntax checking
if [[ ! ${line} =~ ^/[a-z][a-z0-9/-]*$ ]]; then
continue
fi
# collect targets using globbing
for target in "$line"/*; do
if [[ -d "$target" ]]; then
printf "%s\n" "$target" >> $outputfile
fi
done
done < $inputfile
As you develop whatever tool will process your dir_list.tmp file, be careful of special characters (including spaces) in that file.
Note that I'm using printf instead of echo so that targets whose first character is a hyphen won't cause errors.
This might work
while read; do
find "$REPLY" >> dir_list.tmp
done < file_with_list.txt

Resources