I have a directory full of directories containing exam subjects I would like to work on randomly to simulate the real exam.
They are classified by difficulty level:
0-0, 0-1 .. 1-0, 1-1 .. 2-0, 2-1 ..
I am trying to write a shell script allowing me to pick one subject (directory) randomly based on the parameter I pass when executing the script (0, 1, 2 ..).
I can't quite figure it, here is my progress so far:
ls | find . -name "1$~" | sort -r | head -n 1
What am I missing here?
There's no need for any external commands (ls, find, sort, head) for this at all:
#!/usr/bin/env bash
set -o nullglob # make globs expand to nothing, not themselves, when no matches found
dirs=( "$1"*/ ) # list directories starting with $1 into an array
# Validate that our glob actually had at least one match
(( ${#dirs[#]} )) || { printf 'No directories start with %q at all\n' "$1" >&2; exit 1; }
idx=$(( RANDOM % ${#dirs[#]} )) # pick a random index into our array
echo "${dirs[$idx]}" # and look up what's at that index
Related
I'm trying to create a shell script that will create multiple files (or a batch of files) of a specified amount. When the amount is reached, script stops. When the script is re-executed, the files pick up from the last file created. So if the script creates files 1-10 on first run, then on the next script execution should create 11-20, and so on.
enter code here
#!/bin/bash
NAME=XXXX
valid=true
NUMBER=1
while [ $NUMBER -le 5 ];
do
touch $NAME$NUMBER
((NUMBER++))
echo $NUMBER + "batch created"
if [ $NUMBER == 5 ];
then
break
fi
touch $NAME$NUMBER
((NUMBER+5))
echo "batch complete"
done
Based on my comment above and your description, you can write a script that will create 10 numbered files (by default) each time it is run, starting with the next available number. As mentioned, rather than just use a raw-unpadded number, it's better for general sorting and listing to use zero-padded numbers, e.g. 001, 002, ...
If you just use 1, 2, ... then you end up with odd sorting when you reach each power of 10. Consider the first 12 files numbered 1...12 without padding. a general listing sort would produce:
file1
file11
file12
file2
file3
file4
...
Where 11 and 12 are sorted before 2. Adding leading zeros with printf -v avoids the problem.
Taking that into account, and allowing the user to change the prefix (first part of the file name) by giving it as an argument, and also change the number of new files to create by passing the count as the 2nd argument, you could do something like:
#!/bin/bash
prefix="${1:-file_}" ## beginning of filename
number=1 ## start number to look for
ext="txt" ## file extension to add
newcount="${2:-10}" ## count of new files to create
printf -v num "%03d" "$number" ## create 3-digit start number
fname="$prefix$num.$ext" ## form first filename
while [ -e "$fname" ]; do ## while filename exists
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
while ((newcount--)); do ## loop newcount times
touch "$fname" ## create filename
((! newcount)) && break; ## newcount 0, break (optional)
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
Running the script without arguments will create the first 10 files, file_001.txt - file_010.txt. Run a second time, it would create 10 more files file_011.txt to file_020.txt.
To create a new group of 5 files with the prefix of list_, you would do:
bash scriptname list_ 5
Which would result in the 5 files list_001.txt to list_005.txt. Running again with the same options would create list_006.txt to list_010.txt.
Since the scheme above with 3 digits is limited to 1000 files max (if you include 000), there isn't a big need to get the number from the last file written (bash can count to 1000 quite fast). However, if you used 7-digits, for 10 million files, then you would want to parse the last number with ls -1 | tail -n 1 (or version sort and choose the last file). Something like the following would do:
number=$(ls -1 "$prefix"* | tail -n 1 | grep -o '[1-9][0-9]*')
(note: that is ls -(one) not ls -(ell))
Let me know if that is what you are looking for.
This question already has answers here:
Is there a way to get the git root directory in one command?
(22 answers)
Closed 2 years ago.
I'm attempting to find the "root" of a folder. I'm doing this in a Bash script with the following (at least in my head):
# Get current directory (e.g. /foo/bar/my/subdir)
CURR_DIR = `cwd`
# Break down into array of folder names
DIR_ARRAY=(${CURR_DIR//\// })
# Iterate over items in DIR_ARRAY starting with "subdir"
<HELP WITH FOR LOOP SYNTAX>
# Each loop:
# build path to current item in DIR_ITER; e.g.
# iter N: DIR_ITER=/foo/bar/my/subdir
# iter N-1: DIR_ITER=/foo/bar/my
# iter N-2: DIR_ITER=/foo/bar
# iter 0: DIR_ITER=/foo
# In each loop:
# get the contents of directory using "ls -a"
# look for .git
# set ROOT=DIR_ITER
export ROOT
I've Googled for looping in Bash but it all uses the "for i in ARRAY" form, which doesn't guarantee reverse iteration order. What's the recommended way to achieve what I want to do?
One idea on reverse index referencing.
First our data:
$ CURR_DIR=/a/b/c/d/e/f
$ DIR_ARRAY=( ${CURR_DIR//\// } )
$ typeset -p DIR_ARRAY
declare -a DIR_ARRAY=([0]="a" [1]="b" [2]="c" [3]="d" [4]="e" [5]="f")
Our list of indices:
$ echo "${!DIR_ARRAY[#]}"
0 1 2 3 4 5
Our list of indices in reverse:
$ echo "${!DIR_ARRAY[#]}" | rev
5 4 3 2 1 0
Looping through our reverse list of indices:
$ for i in $(echo "${!DIR_ARRAY[#]}" | rev)
do
echo $i
done
5
4
3
2
1
0
As for working your way up the directory structure using this 'reverse' index strategy:
$ LOOP_DIR="${CURR_DIR}"
$ for i in $(echo "${!DIR_ARRAY[#]}" | rev)
do
echo "${DIR_ARRAY[${i}]}:${LOOP_DIR}"
LOOP_DIR="${LOOP_DIR%/*}"
done
f:/a/b/c/d/e/f
e:/a/b/c/d/e
d:/a/b/c/d
c:/a/b/c
b:/a/b
a:/a
Though we could accomplish the same thing a) without the array and b) using some basic parameter expansions, eg:
$ LOOP_DIR="${CURR_DIR}"
$ while [ "${LOOP_DIR}" != '' ]
do
subdir="${LOOP_DIR##*/}"
echo "${subdir}:${LOOP_DIR}"
LOOP_DIR="${LOOP_DIR%/*}"
done
f:/a/b/c/d/e/f
e:/a/b/c/d/e
d:/a/b/c/d
c:/a/b/c
b:/a/b
a:/a
You can use dirname in a loop, to find the parent folder, then move up until you e.g., find the .git folder.
Quick example:
#!/usr/bin/env bash
set -eu
for arg in "$#"
do
current=$arg
while true
do
if [ -d "$current/.git" ]
then
echo "$arg: .git in $current"
break
fi
parent="$(dirname "$current")"
if [ "$parent" == "$current" ]
then
echo "No .git in $arg"
break
fi
current=$parent
done
done
For each parameter you pass to this script, it will print where it found the .git folder up the directory tree, or print an error if it didn't find it.
I have a set of data files across a number of directories with format
ls lcp01/output/
> dst000.dat dst001.dat ... dst075.dat nn000.dat nn001.dat ... nn036.dat aa000.dat aa001.dat ... aa040.dat
That is to say, there are a set of directories lcp01 through lcp25 with a collection of different data files in their output folders. I want to know what the highest number dstXXX.dat file is in each directory (in the example shown the result would be 75).
I wrote a script which achieves this, but I'm not satisfied with the final step which feels a bit hacky:
#!/bin/bash
for i in `seq -f "%02g" 1 25`; #specify dir extensions 1 through 25
do
echo " "
echo $i
names=($(ls lcp$i/output | grep dst )) #dir containing dst files
NUMS=()
for j in "${names[#]}";
do
temp="$(echo $j | tr -dc '0-9' && printf " ")" # record suffixes for each dst file
NUMS+=("$((10#$temp))") #force base 10 interpretation of dst suffixes
done
numList="$(echo "${NUMS[*]}" | sort -nr | head -n1)"
echo ${numList:(-3)} #print out the last 3 characters of the sorted list - the largest file suffix
done
The final two steps organise the list of output indices, then I show the last 3 characters of that list which will be my largest file number (providing the file numbers are smaller than 100).
Is there a cleaner way of doing this? Ideally I would like more control over the output format, but mainly it's the step of reading the last 3 characters out. I would like to be able to just output the largest number, which should be the last element of the list but I cannot figure out how.
You could do something like the following:
for d in lc[0-9][0-9]; do find $d -name 'dst*.dat' -print | sort -u | tail -n1; done
Above command will only work if the numbering has the same number of digits (dst001..999.dat), as it is sorted as a string; if that's not the case:
for d in lc[0-9][0-9]; do echo -n $d: ; find $d -name 'dst*.dat' -print | grep -o '[0-9]*.dat' | sort -n | tail -n1; done
using filename expansions
for d in lcp*/output; do
files=( $d/dst*.dat )
file=${files[-1]}
[[ -e $file ]] || continue
file=${file#dst*}
echo ${file%.dat}
done
or with extension option to restrict pattern to numbers
shopt -s extglob
... lcp*([0-9])/output
... $d/dst*([0-9]).dat
...
file=${file##dst*(0)}
...
I have this in my local directory ~/Report:
Rep_{ReportType}_{Date}_{Seq}.csv
Rep_0001_20150102_0.csv
Rep_0001_20150102_1.csv
Rep_0102_20150102_0.csv
Rep_0503_20150102_0.csv
Rep_0503_20150102_0.csv
Using shell-script,
How do I get multiple files from a local directory with a fixed batch size?
How do I segregate/group the files together by report type (0001 files are grouped together, 0102 grouped together, 0503 grouped together, etc.)
I will generate a sequence file (using forqlift) for EACH group/report type. The output would be Report0001.seq, Report0102.seq, Report0503.seq (3 sequence files). In which I will save to a different directory.
Note: In sequence files, the key is the filename of csv (Rep_0001_20150102.csv), and the value is the content of the file. It is stored as [String, BytesWritable].
This is my code:
1 reportTypes=(0001 0102 8902)
2
3 # collect all files matching expression into an array
4 filesWithDir=(~/Report/Rep_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-1].csv)
5
6 # take only the first hundred
7 filesWithDir =( "${filesWithDir[#]:0:100}" )
8
9 # files="${filesWithDir[#]##*/}" #### commented out since forqlift cannot create sequence file without the path/to/file
10 # echo ${files[#]}
11
12 shopt -s nullglob
13
14 # Line 21 is commented out since it has a bug. It collects files in
15 # current directory when it should be filtering the "files array" created
16 # in line 7
17
18
19 for i in ${reportTypes[#]}; do
20 printf -v val '%04d' "$i"
21 # files=("Rep_${val}_"*.csv)
# solution to BUG: (filter files array)
groupFiles=( $( for j in ${filesWithDir[#]} ; do echo $j ; done | grep ${val} ) )
22
23 # Generate sequence file for EACH Report Type
24 forqlift create --file="Report${val}.seq" "${groupFiles[#]}"
25 done
(Note: The sequence file output should be in current directory, not in ~/Report)
It's easy to take only a subset of an array:
# collect all files matching expression into an array
files=( ~/Report/Rep_[0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )
# take only the first hundred
files=( "${files[#]:0:100}" )
The second part is trickier: Bash has associative arrays ("maps"), but the only legal values which can be stored in arrays are strings -- not other arrays -- so you can't store a list of filenames as a value associated with a single entry (without serializing the array to and from a string -- a moderately tricky thing to do safely, since file paths in UNIX can contain any character other than NUL, newlines included).
It's better, then, to just generate the array as you need it.
shopt -s nullglob # allow a glob to expand to zero arguments
for ((i=1; i<=1000; i++)); do
printf -v val '%04d' "$i" # pad digits: 12 -> 0012
files=( "Rep_${val}_"*.csv ) # collect files that match
## emit NUL-separated list of files, if any were found
#(( ${#files[#]} )) && printf '%s\0' "${files[#]}" >"Reports.$val.txt"
# Create a sequence file with forqlift
forqlift create --file="Reports-${val}.seq" "${files[#]}"
done
If you really don't want to do that, then we can put something together that uses namevars for redirection:
#!/bin/bash
# This only works with bash 4.3
re='^REP_([[:digit:]]{4})_[[:digit:]]{8}.csv$'
counter=0
for f in *; do
[[ $f =~ $re ]] || continue # skip files not matching regex
if ((++counter > 100)); then break; fi # stop after 100 files
group=${BASH_REMATCH[1]} # retrieve first regex group
declare -g -a "array${group}" # declare an array
declare -n group_arr="array${group}" # redirect group_arr to that array
group_arr+=( "$f" ) # append to the array
done
for varname in "${!array#}"; do
declare -n group_arr="$varname"
## NUL-delimited form
#printf '%s\0' "${group_arr[#]}" \
# >"collection${varname#array}" # write to files named collection0001, etc.
# forqlift sequence file form
forqlift create --file="Reports-${varname#array}.seq" "${group_arr[#]}"
done
I would move away from shell scripts and start to look towards perl.
#!/usr/bin/env perl
use strict;
use warnings;
my %groups;
while ( my $filename = glob ( "~/Reports/Rep_*.csv" ) ) {
my ( $group, $id ) = ( $filename =~ m,/Rep_(\d{4})_(\d{8})\.csv$, );
next unless $group; #undefined means it didn't match;
#anything past 100 in a group is discarded:
if ( #{$groups{$group}} < 100 ) {
push ( #{$groups{$group}}, $filename );
}
}
foreach my $group ( keys %groups ) {
print "$group contains:\n";
print join ("\n", #{$groups{$group});
}
Another alternative is to clobber some bash commands together with regexp.
See implementation below
# Explanation:
# ls -p = List all files and directories in local directory by path
# grep -v / = ignore subdirectories
# grep "^Rep_\d{4}_\d{8}\.csv$" = Look for files matching your regexp
# tail -100 = get 100 results
for file in $(ls -p | grep -v / | grep "^Rep_\d{4}_\d{8}\.csv$" | tail -100);
do echo $file;
# Use reg exp to extract the desired sequence
re="^Rep_([[:digit:]]{4})_([[:digit:]]{8}).csv$";
if [[ $name =~ $re ]]; then
sequence = ${BASH_REMATCH[1};
# Didn't end up using date, but in case you want it
# date = ${BASH_REMATCH[2]};
# Just in case the sequence file doesn't exist
if [ ! -f "$sequence" ] ; then
touch "$sequence"
fi
# Output/Concat your filename to the sequence file, which you can
# read in later to do whatever administrative tasks you wish to do
# to them
echo "$file" >> "$sequence"
fi
done;
I am creating a script to run on OS X which will be run often by a novice user, and so want to protect a directory structure by creating a fresh one each time with an n+1 over the last:
target001 with the next run creating target002
I have so far:
lastDir=$(find /tmp/target* | tail -1 | cut -c 6-)
let n=$n+1
mkdir "$lastDir""$n"
However, the math isn't working here.
What about
mktemp?
Create a temporary file or directory, safely, and print its name.
TEMPLATE must contain at least 3 consecutive `X's in last component.
If TEMPLATE is not specified, use tmp.XXXXXXXXXX, and --tmpdir is
implied. Files are created u+rw, and directories u+rwx, minus umask
restrictions.
Use this line to calculate the new sequence number:
...
n=$(printf "%03d" $(( 10#$n + 1 )) )
mkdir "$lastDir""$n"
10# to force base 10 arithmetic. Provided $n beeing the last secuence already e.g. "001".
No pipes and subprocesses:
targets=( /tmp/target* ) # all dirs in an array
lastdir=${targets[#]: (-1):1} # select filename from last array element
lastdir=${lastdir##*/} # remove path
lastnumber=${lastdir/target/} # remove 'target'
lastnumber=00$(( 10#$lastnumber + 1 )) # increment number (base 10), add leading zeros
mkdir /tmp/target${lastnumber: -3} # make dir; last 3 chars from lastnumber
A version with 2 parameters:
path='/tmp/x/y/z' # path without last part
basename='target' # last part
targets=( $path/${basename}* ) # all dirs in an array
lastdir=${targets[#]: (-1):1} # select path from last entry
lastdir=${lastdir##*/} # select filename
lastnumber=${lastdir/$basename/} # remove 'target'
lastnumber=00$(( 10#$lastnumber + 1 )) # increment number (base 10), add leading zeros
mkdir $path/$basename${lastnumber: -3} # make dir; last 3 chars from lastnumber
Complete solution using extended test [[ and BASH_REMATCH :
[[ $(find /tmp/target* | tail -1) =~ ^(.*)([0-9]{3})$ ]]
mkdir $(printf "${BASH_REMATCH[1]}%03d" $(( 10#${BASH_REMATCH[2]} + 1 )) )
Provided /tmp/target001 is your directory pattern.
Like this:
lastDir=$(find /tmp/target* | tail -1)
let n=1+${lastDir##/tmp/target}
mkdir /tmp/target$(printf "%03d" $n)