I want to read all file names form a particular directory and then create new files with those names by appending some string to them in another directory.
e.g > 'A', 'B', 'C' are in 'logs' directory
then script should create 'A_tmp', 'B_tmp', 'C_tmp' in 'tmp' directory
what i am using is -
tempDir=./tmp/
logDir=./logs/
for file in $( find `echo $logDir` -type f )
do
name=eval basename $file
echo $name
name=$(echo $name | sed 's/.$//')
echo $tempDir
opFile=$tempDir$name
echo $opFile
done
But what I understood is, $file is containing '\n' as last character and I am unable to concatenate the string.
right now I am not creating files, just printing all the names.
So, how I can remove the '\n' from the file name, and is my understanding correct ?
Analysis
There are multiple issues to address in your script. Let's take it step by step:
tempDir=./tmp/
logDir=./logs/
for file in $( find `echo $logDir` -type f )
This scheme assumes no spaces in the file names (which is not an unusual restriction; avoiding problems with spaces in names is relatively tricky). Also, there's no need for the echo; just write:
for file in $(find "$logDir" -type f)
Continuing:
do
name=eval basename $file
This runs the basename command with the environment variable name set to the value eval and the argument $file. What you need here is:
name=$(basename "$file")
where the double quotes aren't strictly necessary because the name can't contain spaces (but it's not a bad habit to get into to quote all file names because sometimes the names do contain spaces).
echo $name
This would echo a blank line because name was not set.
name=$(echo $name | sed 's/.$//')
If name was set, this would chop off the last character, but if the name was A, you'd have nothing left.
echo $tempDir
opFile=$tempDir$name
echo $opFile
done
Give or take double quotes and the fact that you've not added the _tmp suffix to opFile, there's nothing wrong with the rest.
Synthesis
Putting the changes together, you end up with:
tempDir=./tmp/
logDir=./logs/
for file in $(find "$logDir" -type f)
do
name=$(basename "$file")
echo "$name" # Debug only
echo "$tempDir" # Debug only
opFile="$tempDir${name}_tmp"
echo "$opFile"
done
That shows all the intermediate results. You could perfectly well compress that down to:
tempDir=./tmp/
logDir=./logs/
for file in $(find "$logDir" -type f)
do
opFile="$tempDir"$(basename "$file")"_tmp"
echo "$opFile"
done
Or, using a simpler combination of double quotes because the names contain no spaces:
tempDir=./tmp/
logDir=./logs/
for file in $(find "$logDir" -type f)
do
opFile="$tempDir$(basename $file)_tmp"
echo "$opFile"
done
The echo is there as a surrogate for the copy or move operation you plan to execute, of course.
EDIT: ...and to remove restrictions on file names containing spaces and globbing characters, do it as:
tempDir=./tmp/
logDir=./logs/
find "$logDir" -type f |
while IFS= read -r file
do
opFile="${tempDir}${file##*/}_tmp"
echo "$opFile"
done
It will still fail for file names containing newlines. If you want to handle that then investigate a solution using find ... -print0 | xargs -0 or find ... -exec.
Try the following.
#!/bin/sh
tmpDir=./tmp/
logDir=./logs/
# list all files in log directory, pipe into a loop that reads each path line
# by line..
# Also note that there is no newline in this case since it is swallowed by 'read'.
find $logDir -type f | while read path; do
# get the basename of the path
name=`basename $path`
# copy the found file to the temporary directory.
dest="$tmpDir/${name}_tmp"
echo $dest
done
Shell scripts have the ability to concatenate strings easily in statements, as demonstrated with $tmpDir/${name}_tmp, there is no need for replacing the output since read swallows any newlines.
find ... while read is a very useful construct when you want to read multiple lines of anything, it even works for files.
while read line; do
echo $line
done < filename.txt
Edit: clarified
Try something like this:
tempDir=./tmp/
logDir=./logs/
for file in $( find `echo $logDir` -type f )
do
name=`eval basename $file|tr -d "\n"`_tmp
echo $name
done
If you change
name=eval basename $file
to
name=`eval basename $file`
then afterwads name contains what you want.
Related
I have a parent directory with over 800+ directories, each of these has a unique name. Some of these directories house a sub-directory called y in which a file called z, (if it exists) can be found.
I need to script a loop that will check each of the 800+ for z, and if it's there, I need to append the name of the directory (the directory before y) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
Let's assume that any foo/y/z is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[#]%%/*}" > IDlist.txt
Explanation: the paths of all z files are first stored in array names, thanks to a glob pattern: */y/z. Then, a pattern substitution is applied to each array element to suppress the /y/z part: "${names[#]%%/*}". The result is printed, one name per line: printf '%s\n'.
If you also had directories named z, or if you had millions of files, find could be used, instead, with a bit of awk to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by #LéaGris, printf is better than echo because if d is the -e string, for instance, echo "$d" interprets it as an option of the echo command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by #LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt file. This will open and close the file only once, instead of once per iteration.
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf instead of echo, as in the comment below.
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo ensure that the directory name is output correctly even if it contains two successive spaces.
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep, sed, awk, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
The first example doesn't check that z is a file, but I think it's worth showing compgen:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
Doing glob expansion, file check and path splitting with perl only:
perl -E 'foreach $p (glob "*/y/z") {say substr($p, 0, index($p, "/")) if -f $p}' > IDlist.txt
I constantly get a bunch of files named "Unknown.png" into a folder, and often times they get renamed "unknown (1).png, unknown (2).png" etc. This is a bit of a problem as sometimes when cleaning up files and moving them somewhere else I get asked if I want to replace or rename, etc.
So I decided to make a crontab task that renames the files to CB_RANDOM this way I don't even have to worry about potentially overwriting two files with the same name.
I could figure it so far, I find the files, replace the name Unknown to CB_ and add a random number.
the problem comes to (x) at the end of the filename. I managed to figure out also how to solve it I just strip away any parenthesis and numbers.
The problem is I can't figure out how to make the rename function to follow both rules.
for u in (find -name unknown*); do
rCode = random
rename -v 's/unknown/CB_$rCode' $u
rename -v 's/[ ()0123456789]//g' $u
Ideally I'd like to be able to follow both rules on the same line of code, specially since once it runs the first line, then $u wont be able to find the file for the second step.
No need for a loop:
find -name 'unknown*' -exec rename 's/unknown \([0-9]+\)\.(.*)$/"CB_".sprintf("%04s",int(rand(10000))).".".$1/e' {} \;
find all the files, starting in the current directory, recursively, with names similar to "unknown (1).png"
rename them with a resulting filename similar to "CB_0135.png"
This produces an error message if a filename already exists.
Your code should first be changed into
# find is a subcommand, use $()
# find a file with wildcard, use quotes
for u in $(find -name "unknown*"); do
# Is random a command? Use $()
rCode=$(random)
# Debug with echo, will show other problem
echo "File $u"
# $rCode will not be replaced by its value in single quotes
# Write a filename in double quotes, so it will not be split by a space
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done
The new line with echo shows that the loop is breaking up the filenames at the spaces. You can change this in
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
echo "File $u"
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done < <(find -name "unknown*")
I never use rename and would use
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
# construct new filename.
# Restriction: Path to file is without newlines, spaces or parentheses
newfile=$(sed 's/[ ()]//g; s/.*unknown/&_'"${rCode}"'_/' <<< "$u")
echo "Moving file $u to ${newfile}"
mv "$u" to "${newfile}"
done < <(find -name "unknown*")
EDIT:
I removed a sed command for renaming files with (something) in it:
# Removed command
newfile=$(sed 's/\(.*\)(\(.*\))/\1'"${rCode}"'_\2/' <<< "$u")
I found an old past paper question with content not covered in my course. I hope I don't get examined with that but what does this bash script do? I know grep takes user input and outputs the line containing the input and echo just repeats the input and cat just displays the input. But I have on idea what this does as a whole. Any help please?
#!/bin/bash
outputFile=$1
for file in $(find -name '*txt' | grep data)
do
echo $file >> $outputFile
cat $file >> $outputFile
done
Each line:
#!/bin/bash
Hash-bang the script to use bash
outputFile=$1
Set the variable named "outputFile" to the first parameter passed into the script. Running the script would look like bash myScript.sh "/some/file/to/output.txt"
for file in $(find -name '*txt' | grep data)
do
Loop through every file in this directory and it's subdirectories looking for a file that ends in with the characters "txt" and contains the characters "data" somewhere in the name. For each iteration of the for loop/file found, set the file name to the variable "file"
echo $file >> $outputFile
Echo out/print the file name stored in the variable "file" to the outputFile
cat $file >> $outputFile
Take the contents of the file and stick it in the outputFile.
done
End the For Loop
There's some issues with this script though. If $outputFile or $file have a space in their name or path, then it will fail. It's good practice to toss double quotes around variables like:
cat "$file" >> "$outputFile"
#!/bin/bash
The shebang. If this script is executable an invoked directly as in ./this_script or found in the PATH, it will be invoked with /bin/bash.
outputFile=$1
Assign the first argument to the name outputFile.
++ find -name '*txt'
Recursively list all files with a name ending in "txt". It would be more standard to include the path and write this as find . -name '*.txt'.
+ … | grep data
Filter the previous list of file names. Only list those containing the string "data" in their names. This pipe could be eliminated by writing find . -name '*data*txt'.
for file in $(find -name '*txt' | grep data)
For every word in the output of the find | grep pipeline, assign that word to the name file and run the loop. This can break down if any of the found names have whitespace or glob characters in them. It would be better to use find's native -exec flag to handle this.
echo $file >> $outputFile
Append the expansion of the variable "file" to a new or existing file at the path found by expanding $outputFile. If the former expansion starts with a dash, it could cause echo to treat it as an argument. If the latter expansion has whitespace or a glob character in it, this may cause an "ambiguous redirect" error. It would be better to quote the expansions, and use printf to avoid the argument edge-case to echo, as in printf '%s\n' "$file" >> "$outputFile".
cat $file >> $outputFile
Append the contents of the file found at the expansion of the variable "file" to the path found by expanding $outputFile, or cause another ambiguous redirect error. It would be better to quote the expansions, like cat "$file" >> "$outputFile".
Assuming that none of the aforementioned expansion edge-cases were expected, it would be better to write this entire script like this:
find . -name '*data*txt' -print -exec cat {} \; >> "$1"
I have got a directory with files in which some of then end with an underscore.
I would like to test each file to see if it ends with an underscore and then strip off the underscore.
I am currently running the following code:
for file in *;do
echo $file;
if [[ "${file:$length:1}" == "_" ]];then
mv $file $(echo $file | sed "s/.$//g");
fi
done
But it does not seem to be renaming the files with underscore. For example if i have a file called all_indoors_ I expect it to give me all_indoors.
You could use built-in string substitution:
for file in *_; do
mv "$file" "${file%_}"
done
Just use a regex to check the string:
for file in *
do
[[ $file =~ "_$" ]] && echo mv "$file" "${file%%_}"
done
Once you are sure it works as intended, remove the echo so that the mv command executes!
It may even be cleaner to use *_ so that the for will just loop over the files with a name ending with _, as hek2mgl suggests in comments.
for file in *_
do
echo mv "$file" "${file%%_}"
done
You can use which will be recursive:
while read f; do
mv "$f" "${f:0:-1}"; # Remove last character from $f
done < <(find . -type f -name '*_')
Although not a pure bash approach, you can use rename.ul (written by Larry Wall, the person behind perl). Rename is not part of the default linux environment, but is part of util-linux.
You use rename with:
rename perlexpr files
(some flags ommitted).
So you could use:
rename 's/_$//' *
if you want to remove all characters including and after the underscore.
As #hek2mgl points out, there are multiple rename commands (see here), so first test if you have picked the right one.
I'm having an error trying to find a way to replace a string in a directory path with another string
sed: Error tryning to read from {directory_path}: It's a directory
The shell script
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
for file in $(find $R2K_SOURCE )
do
if [ -d $file ]
then
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
# find $R2K_PROCCESED -type f -size -200c -delete
i'm understanding that the rror it's in this line
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
but i don't know how to tell sh that treats $file variable as string and not as a directory object.
If you want ot replace part of path name you can echo path name and take it to sed over pipe.
Also you must enable globbing by placing sed commands into double quotes instead of single and change separator for 's' command like that:
R2K_TEMP_DIR=$(echo "$file" | sed "s:$R2K_SOURCE:$R2K_PROCESSED:g")
Then you will be able to operate with slashes inside 's' command.
Update:
Even better is to remove useless echo and use "here is string" instead:
R2K_TEMP_DIR=$(sed "s:$R2K_SOURCE:$R2K_PROCESSED:g" <<< "$file")
First, don't use:
for item in $(find ...)
because you might overload the command line. Besides, the for loop cannot start until the process in $(...) finishes. Instead:
find ... | while read item
You also need to watch out for funky file names. The for loop will cough on all files with spaces in them. THe find | while will work as long as files only have a single space in their name and not double spaces. Better:
find ... -print0 | while read -d '' -r item
This will put nulls between file names, and read will break on those nulls. This way, files with spaces, tabs, new lines, or anything else that could cause problems can be read without problems.
Your sed line is:
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
What this is attempting to do is edit your $file which is a directory. What you want to do is munge the directory name itself. Therefore, you have to echo the name into sed as a pipe:
R2K_TEMP_DIR=$(echo $file | sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g')
However, you might be better off using environment variable parameters to filter your environment variable.
Basically, you have a directory called source/ and all of the files you're looking for are under that directory. You simply want to change:
source/foo/bar
to
processed/foo/bar
You could do something like this ${file#source/}. The # says this is a left side filter and it will remove the least amount to match the glob expression after the #. Check the manpage for bash and look under Parameter Expansion.
This, you could do something like this:
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
find $R2K_SOURCE -print0 | while read -d '' -r file
do
if [ -d $file ]
then
R2K_TEMP_DIR="processed/${file#source/}"
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
R2K_TEMP_DIR="processed/${file#source/}" removes the source/ from the start of $file and you merely prepend processed/ in its place.
Even better, it's way more efficient. In your original script, the $(..) creates another shell process to run your echo in which then pipes out to another process to run sed. (Assuming you use loentar's solution). You no longer have any subprocesses running. The whole modification of your directory name is internal.
By the way, this should also work too:
R2K_TEMP_DIR="$R2K_PROCESSED/${file#$R2K_SOURCE}"
I just didn't test that.