bash check for new directories and do a diff - bash

I need to write a Bash script to check if there are any new folders in a path, if yes do something and if not then simply exit.
My thinking is to create a text file to keep track of all old folders and do a diff, if there something new then perform action. Please help me achieve this:
I've tried to use two file tracking but I don't think I've got this right.
The /tmp/ folder has multiple sub folders
#/bin/sh
BASEDIR=/tmp/
cd $BASEDIR
ls -A $BASEDIR >> newfiles.txt
DIRDIFF=$(diff oldfiles.txt newfiles.txt | cut -f 2 -d "")
for file in $DIRDIFF
do
if [ -e $BASEDIR/$file ]
then echo $file
fi
done

Generally don't use ls in scripts. Here is a simple refactoring which avoids it.
#!/bin/sh
printf '%s\n' /tmp/.[!.]* /tmp/* >newfiles.txt
if cmp -13 oldfiles.txt newfiles.txt | grep .; then
rc=0
rm newfiles.txt
else
rc=$?
mv newfiles.txt oldfiles.txt
fi
exit "$rc"
Using comm instead of diff simplifies processing somewhat (the wildcard should expand the files in sorted order, so the requirement for sorted input will be satisfied) and keeping the files in the current directory (instead of in /tmp) should avoid having the script trigger itself. The output from comm will be the new files, so there is no need to process it further. The grep . checks if there are any output lines, so that we can set the exit status to reflect whether or not there were new entries.
Your script looks for files, not directories. If you really want to look for new directories, add a slash after each wildcard expression:
printf '%s\n' /tmp/.[!.]*/ /tmp/*/ >newfiles.txt
This will not notice if an existing file or directory is modified. Probably switch to inotifywait if you need anything more sophisticated (or perhaps even if this is all you need).

Let's assume you are interested in sub-directories only. Let's assume too that you do not have directory names with newlines in them.
Do not process ls output, it is for humans only. Prefer find. Then, simply sort the old and new lists with sort and compare them with comm, keeping only the lines found in the new list but not in the old list (man comm will tell you why the -13 option does this). In the following the do something is just echo, replace by whatever is needed:
#!/bin/sh
BASEDIR=/tmp/
cd "$BASEDIR"
find . -type d | sort > new.txt
if [ -f old.txt ]; then
comm -13 old.txt new.txt | while IFS= read -r name; do
echo "$name"
done
fi
mv new.txt old.txt
This will explore the complete hierarchy under the starting point, and consider any directory. If you do not want to explore the complete hierarchy under the starting point but only the current level:
find . -maxdepth 1 -type d | sort > new.txt

Related

linux show head of the first file from ls command

I have a folder, e.g. named 'folder'. There are 50000 txt files under it, e.g, '00001.txt, 00002.txt, etc'.
Now I want to use one command line to show the head 10 lines in '00001.txt'. I have tried:
ls folder | head -1
which will show the filename of the first:
00001.txt
But I want to show the contents of folder/00001.txt
So, how do I do something like os.path.join(folder, xx) and show its head -10?
The better way to do this is not to use ls at all; see Why you shouldn't parse the output of ls, and the corresponding UNIX & Linux question Why not parse ls (and what to do instead?).
On a shell with arrays, you can glob into an array, and refer to items it contains by index.
#!/usr/bin/env bash
# ^^^^- bash, NOT sh; sh does not support arrays
# make array files contain entries like folder/0001.txt, folder/0002.txt, etc
files=( folder/* ) # note: if no files found, it will be files=( "folder/*" )
# make sure the first item in that array exists; if it does't, that means
# the glob failed to expand because no files matching the string exist.
if [[ -e ${files[0]} || -L ${files[0]} ]]; then
# file exists; pass the name to head
head -n 10 <"${files[0]}"
else
# file does not exist; spit out an error
echo "No files found in folder/" >&2
fi
If you wanted more control, I'd probably use find. For example, to skip directories, the -type f predicate can be used (with -maxdepth 1 to turn off recursion):
IFS= read -r -d '' file < <(find folder -maxdepth 1 -type f -print0 | sort -z)
head -10 -- "$file"
Although hard to understand what you are asking but I think something like this will work:
head -10 $(ls | head -1)
Basically, you get the file from $(ls | head -1) and then print the content.
If you invoke the ls command as ls "$PWD"/folder, it will include the absolute path of the file in the output.

How to compare filenames in bash

I need to compare if "dir1" has the same files as "dir2" and ideally remove the similar contents in "dir2".
So far i have tried using the find command:
$ find "$dir1" "$dir2/" "$dir2/" -printf '%P\n' | sort | uniq -u^C
But this doesn't work cause, while the filename are similar, the extension of the files are different in the two folders.
so how do i go about comparing filenames in bash?
Sounds like you just need to use a loop:
for path in "$dir1"/*; do
base=${path##*/} # remove everything up to and including the last / to get the name
if [ -e "$dir2/$base" ]; then
echo rm -r "$dir2/$base"
fi
done
Loop through everything in $dir1 and if $dir2 has a file with the same name, then remove it.
Remove the echo when you're happy that the script is going to remove the right files.

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

How to cd to a random path?

I want to generate a random existing path to cd and write file in it, if possible random from a specified root (the point is that can be used like cd /home/foobar/$RANDOM).
I think I can do this by listing all paths with ls and awk with this command, according to ls command: how can I get a recursive full-path listing, one line per file?
ls -R /path | awk '
/:$/&&f{s=$0;f=0}
/:$/&&!f{sub(/:$/,"");s=$0;f=1;next}
NF&&f{ print s"/"$0 }'
And giving result line per line to $(dirname "${LINE_FILE}") and finishing by purging dupes.
It should work but I need generate a lot of random existing paths and I hope a quicker solution exist (this generation may take a lot of time, worky but dirty). Any better ideas?
Another variant with find + sort -R:
startDir='/your/path'
maxPathRange=10
endPath=$startDir
for (( i=1; i<=$((1 + RANDOM % $maxPathRange)); i++ )); do
directory=$(find "$endPath" -maxdepth 1 -type d ! -path "$endPath" -printf "%f\n" | sort -R | head -n 1)
if [[ -z $directory ]]; then break; fi
endPath=$endPath/$directory
done <<< "$directory"
cd "$endPath" && pwd
I'm not too proficient in bash, so I won't give you a straight coded answer, but here's how you can achieve it.
From what I understand you are using 1 ls command to list the whole file system.
That will take a long time. Instead, use ls to list only folders in the current directory, pick a random one and then do the same to the one you picked.
This could go on until you get to a folder with no folders inside it.
If you want to have the possibility of picking a folder which does contain other folders and stop there, include in your random picking, an option that will not pick any file and stop in current directory. This option should have a small probability of occuring.
Here is a quick and dirty attempt.
randomcd () {
local p=$(printf '%s\n' . */ | sort -R | head -n 1)
case $p in '.' | '*/') pwd; return;; esac
( cd "$p"; randomcd )
}
Unfortunately sort -R is not entirely portable. If you don't have shuf either, something like this might work:
shuffle () {
perl -MList::Util -e 'print List::Util::shuffle <>'
}

Bash: remove first line of file, create new file with prefix in new dir

I have a bunch of files in a directory, old_dir. I want to:
remove the first line of each file (e.g. using "sed '1d'")
save the output as a new file with a prefix, new_, added to the original filename (e.g. using "{,new_}old_filename")
add these files to a different directory, new_dir, overwriting any conflicting filenames
How do I do this with a Bash script? Having trouble putting the pieces together.
#!/usr/bin/env bash
old_dir="/path/to/somewhere"
new_dir="/path/to/somewhere_else"
prefix="new_"
if [ ! -d "$old_dir" -o ! -d "$new_dir" ]; then
echo "ERROR: We're missing a directory. Aborting." >&2
exit 1
fi
for file in "$old_dir"/*; do
tail +2 "$file" > "$new_dir"/"${prefix}${file##*/}"
done
The important parts of this are:
The for loop, which allows you do to work on each $file.
tail +2 which is notation which should remove the first line of the file. If your tail does not support this, you can get the same result with sed -e 1d.
${file##*/} which is functionally equivalent to basename "$file" but without spawning a child.
Really, none of this is bash-specific. You could run this in /bin/sh in most operating systems.
Note that the code above is intended to explain a process. Once you understand that process, you may be able to come up with faster, shorter strategies for achieving the same thing. For example:
find "$old_dir" -depth 1 -type f -exec sh -c "tail +2 \"{}\" > \"$new_dir/$prefix\$(basename {})\"" \;
Note: I haven't tested this. If you plan to use either of these solutions, do make sure you understand them before you try, so that you don't clobber your data by accident.

Resources