How to cd to a random path? - bash

I want to generate a random existing path to cd and write file in it, if possible random from a specified root (the point is that can be used like cd /home/foobar/$RANDOM).
I think I can do this by listing all paths with ls and awk with this command, according to ls command: how can I get a recursive full-path listing, one line per file?
ls -R /path | awk '
/:$/&&f{s=$0;f=0}
/:$/&&!f{sub(/:$/,"");s=$0;f=1;next}
NF&&f{ print s"/"$0 }'
And giving result line per line to $(dirname "${LINE_FILE}") and finishing by purging dupes.
It should work but I need generate a lot of random existing paths and I hope a quicker solution exist (this generation may take a lot of time, worky but dirty). Any better ideas?

Another variant with find + sort -R:
startDir='/your/path'
maxPathRange=10
endPath=$startDir
for (( i=1; i<=$((1 + RANDOM % $maxPathRange)); i++ )); do
directory=$(find "$endPath" -maxdepth 1 -type d ! -path "$endPath" -printf "%f\n" | sort -R | head -n 1)
if [[ -z $directory ]]; then break; fi
endPath=$endPath/$directory
done <<< "$directory"
cd "$endPath" && pwd

I'm not too proficient in bash, so I won't give you a straight coded answer, but here's how you can achieve it.
From what I understand you are using 1 ls command to list the whole file system.
That will take a long time. Instead, use ls to list only folders in the current directory, pick a random one and then do the same to the one you picked.
This could go on until you get to a folder with no folders inside it.
If you want to have the possibility of picking a folder which does contain other folders and stop there, include in your random picking, an option that will not pick any file and stop in current directory. This option should have a small probability of occuring.

Here is a quick and dirty attempt.
randomcd () {
local p=$(printf '%s\n' . */ | sort -R | head -n 1)
case $p in '.' | '*/') pwd; return;; esac
( cd "$p"; randomcd )
}
Unfortunately sort -R is not entirely portable. If you don't have shuf either, something like this might work:
shuffle () {
perl -MList::Util -e 'print List::Util::shuffle <>'
}

Related

bash check for new directories and do a diff

I need to write a Bash script to check if there are any new folders in a path, if yes do something and if not then simply exit.
My thinking is to create a text file to keep track of all old folders and do a diff, if there something new then perform action. Please help me achieve this:
I've tried to use two file tracking but I don't think I've got this right.
The /tmp/ folder has multiple sub folders
#/bin/sh
BASEDIR=/tmp/
cd $BASEDIR
ls -A $BASEDIR >> newfiles.txt
DIRDIFF=$(diff oldfiles.txt newfiles.txt | cut -f 2 -d "")
for file in $DIRDIFF
do
if [ -e $BASEDIR/$file ]
then echo $file
fi
done
Generally don't use ls in scripts. Here is a simple refactoring which avoids it.
#!/bin/sh
printf '%s\n' /tmp/.[!.]* /tmp/* >newfiles.txt
if cmp -13 oldfiles.txt newfiles.txt | grep .; then
rc=0
rm newfiles.txt
else
rc=$?
mv newfiles.txt oldfiles.txt
fi
exit "$rc"
Using comm instead of diff simplifies processing somewhat (the wildcard should expand the files in sorted order, so the requirement for sorted input will be satisfied) and keeping the files in the current directory (instead of in /tmp) should avoid having the script trigger itself. The output from comm will be the new files, so there is no need to process it further. The grep . checks if there are any output lines, so that we can set the exit status to reflect whether or not there were new entries.
Your script looks for files, not directories. If you really want to look for new directories, add a slash after each wildcard expression:
printf '%s\n' /tmp/.[!.]*/ /tmp/*/ >newfiles.txt
This will not notice if an existing file or directory is modified. Probably switch to inotifywait if you need anything more sophisticated (or perhaps even if this is all you need).
Let's assume you are interested in sub-directories only. Let's assume too that you do not have directory names with newlines in them.
Do not process ls output, it is for humans only. Prefer find. Then, simply sort the old and new lists with sort and compare them with comm, keeping only the lines found in the new list but not in the old list (man comm will tell you why the -13 option does this). In the following the do something is just echo, replace by whatever is needed:
#!/bin/sh
BASEDIR=/tmp/
cd "$BASEDIR"
find . -type d | sort > new.txt
if [ -f old.txt ]; then
comm -13 old.txt new.txt | while IFS= read -r name; do
echo "$name"
done
fi
mv new.txt old.txt
This will explore the complete hierarchy under the starting point, and consider any directory. If you do not want to explore the complete hierarchy under the starting point but only the current level:
find . -maxdepth 1 -type d | sort > new.txt

linux show head of the first file from ls command

I have a folder, e.g. named 'folder'. There are 50000 txt files under it, e.g, '00001.txt, 00002.txt, etc'.
Now I want to use one command line to show the head 10 lines in '00001.txt'. I have tried:
ls folder | head -1
which will show the filename of the first:
00001.txt
But I want to show the contents of folder/00001.txt
So, how do I do something like os.path.join(folder, xx) and show its head -10?
The better way to do this is not to use ls at all; see Why you shouldn't parse the output of ls, and the corresponding UNIX & Linux question Why not parse ls (and what to do instead?).
On a shell with arrays, you can glob into an array, and refer to items it contains by index.
#!/usr/bin/env bash
# ^^^^- bash, NOT sh; sh does not support arrays
# make array files contain entries like folder/0001.txt, folder/0002.txt, etc
files=( folder/* ) # note: if no files found, it will be files=( "folder/*" )
# make sure the first item in that array exists; if it does't, that means
# the glob failed to expand because no files matching the string exist.
if [[ -e ${files[0]} || -L ${files[0]} ]]; then
# file exists; pass the name to head
head -n 10 <"${files[0]}"
else
# file does not exist; spit out an error
echo "No files found in folder/" >&2
fi
If you wanted more control, I'd probably use find. For example, to skip directories, the -type f predicate can be used (with -maxdepth 1 to turn off recursion):
IFS= read -r -d '' file < <(find folder -maxdepth 1 -type f -print0 | sort -z)
head -10 -- "$file"
Although hard to understand what you are asking but I think something like this will work:
head -10 $(ls | head -1)
Basically, you get the file from $(ls | head -1) and then print the content.
If you invoke the ls command as ls "$PWD"/folder, it will include the absolute path of the file in the output.

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Find all duplicate subdirectories in directory

I need to make a shell script that "lists all identical sub-directories (recursively) under the current working directory."
I'm new to shell scripts. How do I approach this?
To me, this means:
for each directory starting in some starting directory, compare it to every other directory it shares by name.
if the other directory has the same name, check size.
if same size also, recursively compare contents of each directory item by item, maybe by md5sum(?) and continuing to do so for each subdirectory within the directories (recursively?)
then, continue by recursively calling this on every subdirectory encountered
then, repeat for every other directory in the directory structure
It would have been the most complicated program I'd have ever written, so I assume I'm just not aware of some shell command to do most of it for me?
I.e., how should I have approached this? All the other parts were about googling until I discovered the shell command that did it 90% of it for me.
(For a previous assignment that I wasn't able to finish, took a zero on this part, need to know how to approach it in the future.)
I'd be surprised to hear that there is a special Unix tool or special usage of a standard Unix tool to do exactly what you describe. Maybe your understanding of the task is more complex than what the task giver intended. Maybe with "identical" something concerning linking was meant. Normally, hardlinking directories is not allowed, so this probably also isn't meant.
Anyway, I'd approach this task by creating checksums for all nodes in your tree, i. e. recursively:
For a directory take the names of all entries and their checksums (recursion) and compute a checksum of them,
for a plain file compute a checksum of its contents,
for symlinks and special files (devices, etc.) consider what you want (I'll leave this out).
After creating checksums for all elements, search for duplicates (by sorting a list of all and searching for consecutive lines).
A quick solution could be like this:
#!/bin/bash
dirchecksum() {
if [ -f "$1" ]
then
checksum=$(md5sum < "$1")
elif [ -d "$1" ]
then
checksum=$(
find "$1" -maxdepth 1 -printf "%P " \( ! -path "$1" \) \
-exec bash -c "dirchecksum {}" \; |
md5sum
)
fi
echo "$checksum"
echo "$checksum $1" 1>&3
}
export -f dirchecksum
list=$(dirchecksum "$1" 3>&1 1>/dev/null)
lastChecksum=''
while read checksum _ path
do
if [ "$checksum" = "$lastChecksum" ]
then
echo "duplicate found: $path = $lastPath"
fi
lastChecksum=$checksum
lastPath=$path
done < <(sort <<< "$list")
This script uses two tricks which might not be clear, so I mention them:
To pass a shell function to find -exec one can export -f it (done below it) and then call bash -c ... to execute it.
The shell function has two output streams, one for returning the result checksum (this is via stdout, i. e. fd 1), and one for giving out each checksum found on the way to this (this is via fd 3).
The sorting at the end uses the list given out via fd 3 as input.
Maybe something like this:
$ find -type d -exec sh -c "echo -n {}\ ; sh -c \"ls -s {}; basename {}\"|md5sum " \; | awk '$2 in a {print "Match:"; print a[$2], $1; next} a[$2]=$1{next}'
Match:
./bar/foo ./foo
find all directories: find -type d, output:
.
./bar
./bar/foo
./foo
ls -s {}; basename {} will print the simplified directory listing and the basename of the directory listed, for example for directory foo: ls -s foo; basename foo
total 0
0 test
foo
Those will cover the files in each dir, their sizes and the dir name. That output will be sent to md5sum and that along the dir:
. 674e2573b49826d4e32dfe81d9680369 -
./bar 4c2d588c5fa9781ad63ad8e86e575e01 -
./bar/foo ff8d1569685be86366f18ea89851db35 -
./foo ff8d1569685be86366f18ea89851db35 -
will be sent to awk:
$2 in a { # hash as array key
print "Match:" # separate hits in output
print a[$2], $1 # print matching dirscompared to
next # next record
}
a[$2]=$1 {next} # only first match is stored and
Test dir structure:
$ mkdir -p test/foo; mkdir -p test/bar/foo; touch test/foo/test; touch test/bar/foo/test
$ find test/
test/
test/bar
test/bar/foo
test/bar/foo/test # touch test
test/foo
test/foo/test # touch test

How to get the number of files in a folder as a variable?

Using bash, how can one get the number of files in a folder, excluding directories from a shell script without the interpreter complaining?
With the help of a friend, I've tried
$files=$(find ../ -maxdepth 1 -type f | sort -n)
$num=$("ls -l" | "grep ^-" | "wc -l")
which returns from the command line:
../1-prefix_blended_fused.jpg: No such file or directory
ls -l : command not found
grep ^-: command not found
wc -l: command not found
respectively. These commands work on the command line, but NOT with a bash script.
Given a file filled with image files formatted like 1-pano.jpg, I want to grab all the images in the directory to get the largest numbered file to tack onto the next image being processed.
Why the discrepancy?
The quotes are causing the error messages.
To get a count of files in the directory:
shopt -s nullglob
numfiles=(*)
numfiles=${#numfiles[#]}
which creates an array and then replaces it with the count of its elements. This will include files and directories, but not dotfiles or . or .. or other dotted directories.
Use nullglob so an empty directory gives a count of 0 instead of 1.
You can instead use find -type f or you can count the directories and subtract:
# continuing from above
numdirs=(*/)
numdirs=${#numdirs[#]}
(( numfiles -= numdirs ))
Also see "How can I find the latest (newest, earliest, oldest) file in a directory?"
You can have as many spaces as you want inside an execution block. They often aid in readability. The only downside is that they make the file a little larger and may slow initial parsing (only) slightly. There are a few places that must have spaces (e.g. around [, [[, ], ]] and = in comparisons) and a few that must not (e.g. around = in an assignment.
ls -l | grep -v ^d | wc -l
One line.
How about:
count=$(find .. -maxdepth 1 -type f|wc -l)
echo $count
let count=count+1 # Increase by one, for the next file number
echo $count
Note that this solution is not efficient: it spawns sub shells for the find and wc commands, but it should work.
file_num=$(ls -1 --file-type | grep -v '/$' | wc -l)
this is a bit lightweight than a find command, and count all files of the current directory.
The most straightforward, reliable way I can think of is using the find command to create a reliably countable output.
Counting characters output of find with wc:
find . -maxdepth 1 -type f -printf '.' | wc --char
or string length of the find output:
a=$(find . -maxdepth 1 -type f -printf '.')
echo ${#a}
or using find output to populate an arithmetic expression:
echo $(($(find . -maxdepth 1 -type f -printf '+1')))
Simple efficient method:
#!/bin/bash
RES=$(find ${SOURCE} -type f | wc -l)
Get rid of the quotes. The shell is treating them like one file, so it's looking for "ls -l".
REmove the qoutes and you will be fine
Expanding on the accepted answer (by Dennis W): when I tried this approach I got incorrect counts for dirs without subdirs in Bash 4.4.5.
The issue is that by default nullglob is not set in Bash and numdirs=(*/) sets an 1 element array with the glob pattern */. Likewise I suspect numfiles=(*) would have 1 element for an empty folder.
Setting shopt -s nullglob to disable nullglobbing resolves the issue for me. For an excellent discussion on why nullglob is not set by default on Bash see the answer here: Why is nullglob not default?
Note: I would have commented on the answer directly but lack the reputation points.
Here's one way you could do it as a function. Note: you can pass this example, dirs for (directory count), files for files count or "all" for count of everything in a directory. Does not traverse tree as we aren't looking to do that.
function get_counts_dir() {
# -- handle inputs (e.g. get_counts_dir "files" /path/to/folder)
[[ -z "${1,,}" ]] && type="files" || type="${1,,}"
[[ -z "${2,,}" ]] && dir="$(pwd)" || dir="${2,,}"
shopt -s nullglob
PWD=$(pwd)
cd ${dir}
numfiles=(*)
numfiles=${#numfiles[#]}
numdirs=(*/)
numdirs=${#numdirs[#]}
# -- handle input types files/dirs/or both
result=0
case "${type,,}" in
"files")
result=$((( numfiles -= numdirs )))
;;
"dirs")
result=${numdirs}
;;
*) # -- returns all files/dirs
result=${numfiles}
;;
esac
cd ${PWD}
shopt -u nullglob
# -- return result --
[[ -z ${result} ]] && echo 0 || echo ${result}
}
Examples of using the function :
folder="/home"
get_counts_dir "files" "${folder}"
get_counts_dir "dirs" "${folder}"
get_counts_dir "both" "${folder}"
Will print something like :
2
4
6
Short and sweet method which also ignores symlinked directories.
count=$(ls -l | grep ^- | wc -l)
or if you have a target:
count=$(ls -l /path/to/target | grep ^- | wc -l)

Resources