Using awk to print ALL spaces within filenames which have a varied number of spaces - bash

I'm executing the following using bash and awk to get the potentially space-full filename, colon, file size. (Column 5 contains the space delimited size, and 9 to EOL the file name):
src="Desktop"
echo "Constructing $src files list. `date`"
cat /dev/null > "$src"Files.txt
find -s ~/"$src" -type f -exec ls -l {} \; |
awk '{for(i=9;i<=NF;i++) {printf("%s", $i " ")} print ":" $5}' |
grep -v ".DS_Store" | grep -v "Icon\r" |
while read line ; do filespacesize=`basename "$line"`; filesize=`echo "$filespacesize" |
sed -e 's/ :/:/1'`
path=`dirname "$line"`; echo "$filesize:$path" >> "$src"Files.txt ;
done
And it works fine, BUT…
If a filename has > 1 space between parts, I only get 1 space between filename parts, and the colon, followed by the filesize.
How can I get the full filename, :, and then the file size?

It seems you want the following (provided your find handles the printf option with the %f, %s and %h modifiers):
src=Desktop
echo "Constructing $src files list. $(date)"
find ~/"$src" -type f -printf '%f:%s:%h\n' > "$src"Files.txt
Much shorter and much more efficient than your method!
This will not discard the .DS_STORE and Icon\r things… but I'm not really sure what you really want to discard. If you want to discard the .DS_STORE directory altogether:
find ~/"$src" -name '.DS_STORE' -type d -prune -o -type f -printf '%f:%s:%h\n' > "$src"Files.txt
#guido seems to have guessed what you mean by grep -v "Icon\r": ignore files ending with Icon; if this his guess is right, then this will do:
find ~/"$src" -name '.DS_STORE' -type d -prune -o ! -name '*Icon' -type f -printf '%f:%s:%h\n' > "$src"Files.txt

Related

sed to replace string in file only displayed but not executed

I want to find all files with certain name (Myfile.txt) that do not contain certain string (my-wished-string) and then do a sed in order to do a replace in the found files. I tried with:
find . -type f -name "Myfile.txt" -exec grep -H -E -L "my-wished-string" {} + | sed 's/similar-to-my-wished-string/my-wished-string/'
But this only displays me all files with wished name that miss the "my-wished-string", but does not execute the replacement. Do I miss here something?
With a for loop and invoking a shell.
find . -type f -name "Myfile.txt" -exec sh -c '
for f; do
grep -H -E -L "my-wished-string" "$f" &&
sed -i "s/similar-to-my-wished-string/my-wished-string/" "$f"
done' sh {} +
You might want to add a -q to grep and -n to sed to silence the printing/output to stdout
You can do this by constructing two stacks; the first containing the files to search, and the second containing negative hits, which will then be iterated over to perform the replacement.
find . -type f -name "Myfile.txt" > stack1
while read -r line;
do
[ -z $(sed -n '/my-wished-string/p' "${line}") ] && echo "${line}" >> stack2
done < stack1
while read -r line;
do
sed -i "s/similar-to-my-wished-string/my-wished-string/" "${line}"
done < stack2
With some versions of sed, you can use -i to edit the file. But don't pipe the list of names to sed, just execute sed in the find:
find . -type f -name Myfile.txt -not -exec grep -q "my-wished-string" {} \; -exec sed -i 's/similar-to-my-wished-string/my-wished-string/g' {} \;
Note that any file which contains similar-to-my-wished-string also contains the string my-wished-string as a substring, so with these exact strings the command is a no-op, but I suppose your actual strings are different than these.

Get files from directories alphabetically sorted with bash

I have this code that works in the directory that I execute:
pathtrabajo=.
filedirs="files.txt"
dirs=$(find . -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in $dirs
do
echo "${entry}" >> "${filedirs}"
find "$entry" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${filedirs}"
done
IFS=$SAVEIFS
But when I try to make it global to work with variables, find gives error:
pathtrabajo="/path/to/a/files"
filedirs="files.txt"
dirs=$(find "${pathtrabajo}" -maxdepth 1 -mindepth 1 -type d | sort -n)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for entry in "${dirs[#]}"
do
echo "${entry}" >> "${pathtrabajo}"/"${filedirs}"
find "${entry}" -maxdepth 1 -mindepth 1 -name '*.md' -printf '%f\n' | sort | sed 's/\.md$//1' | awk '{print "- [["$0"]]"}' >> "${pathtrabajo}"/"${filedirs}"
done
IFS=$SAVEIFS
What did I do wrong?
It's really not clear why you are using find here at all. The following will probably do what you are trying if I can guess from your code.
dirs=([0-9][!0-9]*/ [0-9][0-9][!0-9]*/ [0-9][0-9][0-9][!0-9]*/ [!0-9]*/)
printf "%s\n" "${dirs[#]}" >"$filedirs"
for dir in "${dirs[#]}"; do
printf "%s\n" "$dir"/*.md |
awk '{ sub(/\.md$/, ""); print "- [["$0"]]" }'
done >>"$filedirs"
The shell already expands wildcards alphabetically. The dirs assignment will expand all directories which start with a single digit, then the ones with two digits, then the ones with three digits -- extend if you need more digits -- then the ones which do not start with a digit.
It would not be hard, but cumbersome, to extend the code to run in an arbitrary directory. My proposed solution would be (for once!) to cd to the directory where you want the listing, then run this script.

How to get list of certain strings in a list of files using bash?

The title is maybe not really descriptive, but I couldn't find a more concise way to describe the problem.
I have a directory containing different files which have a name that e.g. looks like this:
{some text}2019Q2{some text}.pdf
So the filenames have somewhere in the name a year followed by a capital Q and then another number. The other text can be anything, but it won't contain anything matching the format year-Q-number. There will also be no numbers directly before or after this format.
I can work something out to get this from one filename, but I actually need a 'list' so I can do a for-loop over this in bash.
So, if my directory contains the files:
costumerA_2019Q2_something.pdf
costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerB_2019Q3_something.pdf
costumerC_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerD2020Q2something.pdf
I want a for loop that goes over 2019Q2, 2019Q3, 2020Q1, and 2020Q2.
EDIT:
This is what I have so far. It is able to extract the substrings, but it still has doubles. Since I'm already in the loop and I don't see how I can remove the doubles.
find original/*.pdf -type f -print0 | while IFS= read -r -d '' line; do
echo $line | grep -oP '[0-9]{4}Q[0-9]'
done
# list all _filanames_ that end with .pdf from the folder original
find original -maxdepth 1 -name '*.pdf' -type f -print "%p\n" |
# extract the pattern
sed 's/.*\([0-9]{4}Q[0-9]\).*/\1/' |
# iterate
while IFS= read -r file; do
echo "$file"
done
I used -print %p to print just the filename, instead of full path. The GNU sed has -z option that you can use with -print0 (or -print "%p\0").
With how you have wanted to do this, if your files have no newline in the name, there is no need to loop over list in bash (as a rule of a thumb, try to avoid while read line, it's very slow):
find original -maxdepth 1 -name '*.pdf' -type f | grep -oP '[0-9]{4}Q[0-9]'
or with a zero seprated stream:
find original -maxdepth 1 -name '*.pdf' -type f -print0 |
grep -zoP '[0-9]{4}Q[0-9]' | tr '\0' '\n'
If you want to remove duplicate elements from the list, pipe it to sort -u.
Try this, in bash:
~ > $ ls
costumerA_2019Q2_something.pdf costumerB_2019Q2_something.pdf
costumerA_2019Q3_something.pdf other.pdf
costumerA_2020Q1_something.pdf someother.file.txt
~ > $ for x in `(ls)`; do [[ ${x} =~ [0-9]Q[1-4] ]] && echo $x; done;
costumerA_2019Q2_something.pdf
costumerA_2019Q3_something.pdf
costumerA_2020Q1_something.pdf
costumerB_2019Q2_something.pdf
~ > $ (for x in *; do [[ ${x} =~ ([0-9]{4}Q[1-4]).+pdf ]] && echo ${BASH_REMATCH[1]}; done;) | sort -u
2019Q2
2019Q3
2020Q1

Print the content of all the files in the newest directory in BASH [duplicate]

Is there any sort option available in find command to get directory with least access date/time
find . -type d -printf "%A# %p\n" | sort -n | tail -n 1 | cut -d " " -f 2-
If you prefer the filename without leading path, replace %p by %f.
the below linux command displays the access and modified time along with size
stat -f
find -type d -printf '%T+ %p\n' | sort | head -1
source
find -type d -printf '%T+ %p\n' | sort
This sound like more of a job for ls:
ls -ultd *|grep ^d
The problem with using find, at least on my system (cygwin/bash), is that find accesses the dirs, so all access-times result in current time, defeating your apparent purpose.
A simple shell script will also do:
unset -v oldest
for i in "$dir"/*; do
[ "$i" -ot "$oldest" -o "$oldest" = "" ] && oldest="$i"
done
note: to find the oldest directory use "$dir"/*/ above (thanks Cyrus) and -type d below with the find command.
In bash if you need a recursive solution, then you can rewrite it as a while loop with process substitution using find
unset -v oldest
while IFS= read -r i; do
[ "$i" -ot "$oldest" -o "$oldest" = "" ] && oldest="$i"
done < <(find "$dir" -type f)

How to remove files starting with #! or ending with .sh in the name

I am new to shell programming. I want to move any executable file, any file starting with shebang(#!), and any file whose name ends with .sh from a directory to /tmp/backup and log the names of the files moved.
This is what I have done till now
Searching for files with #^
grep -ircl --exclude=*.{png,jpg,gif,html,jar} "^#" /home
Finding executables
find . -type f -perm +111 or find . -type f -perm -u+x
Now I am struggling how to club these two commands get a final output which I can pass to perform backup and remove from current directory
Thanks
Use the xargs command
"find command" | xargs "grep command"
You could put everything in a file, sort it, then process it with Awk:
# Select all files to move
grep -ircl --exclude=*.{png,jpg,gif,html,jar} '^#\!' /home > list.txt
find /home -type f \( -perm -u+x -o -name "*.sh" \) -print >> list.txt
# Feed them to Awk that will log and move the file
sort list.txt | uniq | awk -v LOGFILE="mylog.txt" '
{ print "Moving " $0 >> LOGFILE
"mv -v --backup \"" $0 "\" /tmp/backup" | getline
print >> LOGFILE }'
EDIT: you can make a formal script out of this skeleton, by adding some variables and some additional checks:
#!/bin/bash
LIST="$( mktemp || exit 1 )"
LOG="/tmp/mylog.txt"
SOURCE="/home"
TARGET="/tmp/backup"
mkdir -p "${TARGET}"
cd "${SOURCE}" || exit 1
# Select all files to move
grep -ircl --exclude=*.{png,jpg,gif,html,jar} '^#\!' "${SOURCE}" > "${LIST}"
find "${SOURCE}" -type f \( -perm -u+x -o -name "*.sh" \) -print >> "${LIST}"
# Feed them to Awk that will log and move the file
sort "${LIST}" | uniq | awk -v LOGFILE="${LOG}" -v TARGET="${TARGET}" '
{ print "Moving " $0 >> LOGFILE
"mv -v --backup \"" $0 "\" " TARGET | getline
print >> LOGFILE }'

Resources