Replacing the duplicate uuids across multiple files - bash

I am trying to replace the duplicate UUIDs from multiple files in a directory. Even the same file can have duplicate UUIDs.
I am using Unix utilities to solve this.
Till now I have used grep, cut, sort and uniq to find all the duplicate UUIDs across the folder and store it in a file (say duplicate_uuids)
Then I tried sed to replace the UUIDs by looping through the file.
while read line; do
sed -i'.original' -e "s/$line/$uuid/g" *.java
done < "$filename"
As you would expect, I ended up replacing all the duplicate UUIDs with new UUID but still, it is duplicated throughout the file!
Is there any sed trick that can work for me?

There are a bunch of ways this can likely be done. Taking a multi-command approach using a function might give you greater flexibility if you want to customize things later, for example:
checkdupes() {
for f in $files; do
printf "Searching File: %s\n" "${filename}"
while read -r line; do
arr=( $(grep -n "${line}" "${filename}" | awk 'BEGIN { FS = ":" } ; {print $1" "}') )
for i in "${arr[#]:1}"; do
sed -i '' ''"${i}"'s/'"${line}"'/'"$(uuidgen)"'/g' "${filename}"
printf "Replaced UUID [%s] at line %s, first found on line %s\n" "${line}" "${i}" "${arr[0]}"
done< <( sort "${filename}" | uniq -d )
checkdupes /path/to/*.java
So what this series of commands does is to first sort the duplicates (if any) in whatever file you choose. It takes those duplicates and uses grep and awk to create an array of line numbers which each duplicate is found. Looping through the array (while skipping the first value) will allow the duplicates to be replaced by a new UUID and then re-saving the file.
Using a duplicate list file:
If you want to use a file with a list of dupes to search other files and replace the UUID in each of them that match it's just a matter of changing two lines:
for i in "${arr[#]:1}"; do
for i in "${arr[#]}"; do
done< <( sort "${filename}" | uniq -d )
done< <( cat /path/to/dupes_list )
NOTE: If you don't want to overwrite the file, then remove sed -i '' at the beginning of the command.

This worked for me:
# store file names in array
find . -name "*.java" > file_names
IFS=$'\n' read -d '' -r -a file_list < file_names
# store file duplicate uuids from file to array
IFS=$'\n' read -d '' -r -a dup_uuids < $duplicate_uuid
# loop through all files
for file in "${file_list[#]}"
echo "$file"
# Loop through all repeated uuids
for old_uuid in "${dup_uuids[#]}"
# Get the number of times uuid present in this file
END=$(grep -c $old_uuid $file)
if (( $END > 0 )) ; then
echo " Replacing $old_uuid"
# Loop through them one by one and change the uuid
for (( c=$START; c<=$END; c++ ))
echo " [$c of $END] with $uuid"
sed -i '.original' -e "1,/$old_uuid/s/$old_uuid/$uuid/" $file
rm $file.original
rm file_names


More "random" alternative to shuf for selecting files in a directory

I put together the following Bash function (in my .bashrc) to open a "random" image from a given folder, one at a time until the user types N, after which it exits. The script works fine aside from the actual randomness of the images generated - in a quick test of 10 runs, only 4 images are unique.
Is this simply unavoidable due to the limited number of images in the directory (20), or is there an alternative to the shuf command that will yield more random results?
If it is unavoidable, what's the best way to adapt the function to avoid repeats (i.e. discard images that have already been selected)?
function generate_image() {
while true; do
command cd "D:\Users\Hashim\Pictures\Data" &&
image="$(find . -type f -exec file --mime-type {} \+ | awk -F: '{if ($2 ~/image\//) print $1}' | shuf -n1)" &&
echo "Opening $image" &&
cygstart "$image"
read -p "Open another random image? [Y/n]"$'\n' -n 1 -r
if [[ $REPLY =~ ^[Nn]$ ]]
then exit
One way to handle this is by searching the filesystem and creating an array with a list of files in randomized order, and going through everything in that list before searching again.
Because you go through everything from one batch of shuf output before starting the next batch of shuf output, there's no longer a risk of repeats until everything has been seen.
refresh_image_list() {
# respect prior image_dir value if set before the function is called
readarray -d '' image_list < <(
find "$image_dir" -type f -exec file -0 --mime-type -- {} + \
| while IFS= read -r -d '' filename && IFS= read -r desc; do
[[ $desc = *image* ]] && printf '%s\0' "$filename"
done \
| shuf -z
generate_image() {
while true; do
(( ${#image_list[#]} )) || refresh_image_list # if list is empty, recreate
set -- "${image_list[#]}" # set argument list from image list
while (( $# )); do # argument list isn't empty?
echo "Opening $1" # ...try the first item on it
cygstart "$1"
shift # ...and then discard that item
read -p $'Open another random image? [Y/n]\n' -n 1 -r
if [[ $REPLY = [Nn] ]]; then # user wants to quit?
image_list=( "$#" ) # store unused images back to list
return 0
We can simplify this if we're willing to just stop after the user has seen every image once, instead of generating a new batch, and don't need persistence across invocations:
generate_image() {
while IFS= read -r -d '' filename <&3; do
echo "Opening $filename"
cygstart "$filename"
read -p $'Open another random image? [Y/n]\n' -n 1 -r
[[ $REPLY = [Nn] ]] && return 0
done 3< <(
find "$image_dir" -type f -exec file -0 --mime-type -- {} + \
| while IFS= read -r -d '' filename && IFS= read -r desc; do
[[ $desc = *image* ]] && printf '%s\0' "$filename"
done \
| shuf -z
file listings are rarely so gigantic it can't fit into RAM for awk :
find … -print0 |
mawk 'BEGIN { FS = "\0"
_^= RS = "^$"
} END { printf("%*s", srand()*!_, $(int(rand()*(NF-_))+_)) }'
That'll randomly print out the filename for one of the image files found, with no trailing byte of either \0 or \n, without having to perform any sort of sorting/shuffling.
NF - 1 because find prints out final \0, so NF count is always 1 more than # of files found.
It also protects against an empty input instead of referencing a negative field number - simply nothing gets printed at all.
From there, you can decide you want to open this image file.
Charles' answer is definitely the superior answer here, but for completeness I thought I would also add a middle-ground solution that I stumbled across while experimenting earlier on.
I learnt that shuf can be seeded with an external source of randomness, so by seeding it with /dev/urandom - the randomness generator device available on all UNIX-like systems - it can be made more random:
shuf -n1 --random-source=/dev/urandom
From my tests this appears to result in significantly fewer repeats than a standard shuf command, and could be an ideal solution if you want a little more randomness but can tolerate the occasional repeat.

Rename files matching pattern in a loop - Bash

I have been trying to rename some specific files based on a table but with no success. It either renames all files or gives error.
The directory contains hundreds of files named with long barcodes and I want to rename only files containing the patter _1_.
barcode_1_barcode_SL484171.fastq.gz barcode_2_barcode_SL484171.fastq.gz barcode_1_barcode_SL484370.fastq.gz barcode_2_barcode_SL484370.fastq.gz
Desire output:
Description1.R1.fastq.gz Description2.R1.fastq.gz
As you can see in the table there are two files per description but I only want to rename the ones with the _1_ pattern.
Code I have tried:
for i in *_1_*.fastq.gz; do read oldname newname; mv "$oldname" "$newname".R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do read -r oldname newname; mv ${oldname} ${newname}.R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do oldname=$(cut -f1 $i);newname=$(cut -f2 $i); ln -s ${oldname} ${newname}.R1.fastq.gz; done
while read -r oldname newname
if [[ $oldname =~ "_1_" ]]
mv $oldname $newname
done < mytable.txt
Something like this.
#!/usr/bin/env bash
while IFS= read -r files; do ##: loop through the output of `grep 'barcode_1_barcode.*' table.txt`
while read -ru9 old_name prefix; do ##: loop through the output of `find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt`
if [[ $files == *"$old_name"* ]]; then ##: If the filename from the output of find matches the first field of table.txt (space delimite)
old_filename="${files%.fastq.gz}" ##: Extract the filename without the fast.gz extesntion
extension="${files#"$old_filename"}" ##: Extract the extention .fast.gz without the filename
# mv -v "$files" "$prefix.R1${extension}"
printf '%s %s %s ==> %s\n' mv -v "$files" "$prefix.R1${extension}" ##: Rename the files to the desired output
done 9< <(grep 'barcode_1_barcode.*' table.txt)
done < <(find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt) ) ##: Remain the first column/field of table.txt
Output from the OP's sample data/files.
renamed './barcode_1_barcode_SL484370.fastq.gz' -> 'Description2.R1.fastq.gz'
renamed './barcode_1_barcode_SL484171.fastq.gz' -> 'Description1.R1.fastq.gz'
If you're satisfied with the output either move the # from the front of mv to the
front of printf or just delete the entire line with printf and remove the # from
mv in order for mv to actually rename the files.

Search file of directories and find file names, save to new file - bash

I'm trying to find the paths for some fastq.gz files in a mess of a system.
I have some folder paths in a file called temp (subset):
Let's assume 2 fastq.gz files are found in each directory in temp except for /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/.
I want to find the fastq.gz files and print them (if found) next to the directory I'm searching in.
Ideal output:
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/ not_found
I'm part the way there:
wc -l temp
while read -r line; do cd $line; echo ${line} >> ~/tmp; find `pwd -P` -name "*fastq.gz" >> ~/tmp; done < temp
cd ~
less tmp
Current output:
My code places the directory searched for first, then any matching files on subsequent lines. I'm not sure how to get the output I desire...
Any help, gratefully received!
Not your original script but this version does not run cd and find on each line in this case each directory but the whole directory tree/structure just once and the parsing is done inside the while read loop.
#!/usr/bin/env bash
mapfile -t to_search < temp.txt
while IFS= read -rd '' files; do
if [[ $files == *.fastq.gz ]]; then
printf '%s found %s\n' "${files%/*}/" "$files"
printf '%s not_found!\n' "$files" >&2
done < <(find "${to_search[#]%/*.fastq.gz*}" -print0) | column -t
This is how I would rewrite your script. Using cd in a subshell
#!/usr/bin/env bash
while read -r line; do
if [[ -d "$line" ]]; then
cd "$line" || exit
varname=$(find "$(pwd -P)" -name '*fastq.gz')
if [[ -n $varname ]]; then
printf '%s found %s\n' "$line" "$line${varname#*./}"
printf '%s not_found!\n' "$line"
done < temp.txt | column -t
Given a line -
you can get what you want for the found lines quite easily with sed - just feed the lines to it.
... | sed -e 's#^\(.*/\)\([^/]*\)$#\1 found \1\2#'
However, that doesn't eliminate the line before.
To do that you either use something like awk (and do a simple state machine), or do something like this in sed (general idea here
... | sed -e '#/$#{$!N;#\n.*gz$#!P;D}'
(although I think I have a typo as it is not working for me on osx).
So then you'd be left with the .gz lines already converted, and the lines ending in / where you can also use sed to then append the "not found".
... | sed -e 's#/$#/ not found#'

Grep -rl from a .txt list

I'm trying to locate a list of strings from a .txt file, the search target is a directory of multiple .csv (locating which .csv contain the string)
I already find how to do it manually:
grep -rl doggo C:\dirofcsv\
The next step is to to it from a list of hundreds of terms.
I tried grep -rl -f list.txt C:\dirofcsv < print.txt but I only have the last term printed.. I want to have the results lines by lines.
I'm missing something but I don't know where.
I'm working on windows with a term emulator.
EDIT: I've found how to list the terms from a file.Now I need to see which terms have which result like " doggo => file2, file4" did I need to write a loop ?
Thanks community.
grep -rl -f list.txt C:\dirofcsv >> print.txt
You are looking to append lines to the print.txt file and so will need to use >> as opposed to > which will overwrite what is already in the file.
To get the output listed in the output required in your edited requirement, you can use a loop redirected back into awk:
awk '/^FILE -/ { fil=$3; # When the output start with "FILE -" set fil to the third space delimited field
next # Skip to the next line
{ arr[fil][$0]="" # Set up a 2 dimensional array with the search term (fil) as the first index and the name of the file the second
END { for (i in arr) { # Loop through the array
printf "%s => ",i; First print the search term in the format required
for (j in arr[i]) {
printf "%s,",j # Print the file name followed by a comma
printf "\n" # Print a new line
}' <<< "$(while read line # Read list.txt line by line
echo "FILE - $line"; Echo a marker for identification in awk
grep -l "$line" C:\dirofcsv ; # Grep for the line
done < list.txt)" >> print.txt
One liner:
awk '/^FILE -/ { fil=$3;next } { arr[fil][$0]="" } END { for (i in arr) { printf "%s => ",i;for (j in arr[i]) { printf "%s,",j } printf "\n" } }' <<< "$(while read line;do echo "FILE - $line";grep -l "$line" C:\dirofcsv done < list.txt)" >> print.txt
I think you meant to pass the command as:
grep -rl -f list.txt C:\dirofcsv >> print.txt
Give it a shot. It should take all patterns from list.txt line by line and search in the directory C:\dirofcsv for files with matching patterns and print their names to print.txt file.
Try this for printing without a loop (just like you asked in comments ;-)
One Line Answer
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
Another solution, for explanation say:
unset li
readarray li -u <"$listfile"
quoted_commands="$(jq -R 'reduce inputs as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' \
--arg dir $dir \
<<< $(echo; printf "%s" "${li[#]}"))"
eval $commands
Breaking down the command for better explaination in comments:
# read contents of listfile in li
unset li && readarray li -u <"$listfile"
# add the content to new list so that it prints the list elements in new-lines
# also add a newline at top as it will be discarded by jq (in this case only)
list="$(echo; printf "%s" "${li[#]}";)"
# pass jq command
quoted_commands="$(jq -R 'reduce inputs as $line
([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"])
| (join("; "))' \
--arg dir $dir <<< "$list")"
# the elements are read with reduce filter and converted to JSON Array of corresponding commands to execute
# the commands for all elements of list are joined with join filter
# trim quotes to execute commands properly
commands=$(sed -e 's/^"//' -e 's/"$//' <<< "$quoted_commands")
# run commands
eval "$commands"
You may want to print the above variables. Take care to use quotes in echo/printf while doing so, i.e., echo "$variable".
Replacement of sed command:
echo "$commands"
I am now using the following implementations (though the dictionary implementation uses a for loop, the key : value implementation doesn't, and is a single line command):
# print an Associative bash array as a JSON dictionary
declare -n ref
for k in $(echo "${!ref[#]}")
printf '{"name":"%s", "value":"%s"}\n' "$k" "${ref[$k]}"
done | jq -s 'reduce .[] as $i ({}; .[$] = $i.value)'
# print the grep output in key : value format
function list_grep()
local listfile=$1
local dir=$2
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
# print the grep output as JSON dictionary
function dict_grep()
local listfile=$1
local dir=$2
eval declare -A Arr=\($(eval echo $(jq -Rrs 'split("\n") | map(select(length > 0)) | reduce .[] as $k ([]; . + ["[\($k)]=\\\"$(grep -rl \($k) tmp)\\\""]) | (join(" "))' --arg dir $dir < tmp/list.txt))\)
print_dict Arr
# call:
list_grep $listfile $dir
dict_grep $listfile $dir

Extract a line from a text file using grep?

I have a textfile called log.txt, and it logs the file name and the path it was gotten from. so something like this
basically the file name and its previous location. I want to use grep to grab the file directory save it as a variable and move the file back to its original location.
for var in "$#"
if grep "$var" log.txt
# code if found
# code if not found
this just prints out to the console the 2.txt and its directory since the directory has 2.txt in it.
Maybe flip the logic to make it more efficient?
while read prev
do case "$prev" in
*/*) f="${prev##*/}"; continue;; # remember the name
*) [[ -e "$f" ]] && mv "$f" "$prev";;
done < log.txt
That walks through all the files in the log and if they exist locally, move them back. Should be functionally the same without a grep per file.
If the name is always the same then why save it in the log at all?
If it is, then
while read prev
do f="${prev##*/}" # strip the path info
[[ -e "$f" ]] && mv "$f" "$prev"
done < <( grep / log.txt )
Having the file names on the same line would significantly simplify your script. But maybe try something like
# Convert from command-line arguments to lines
printf '%s\n' "$#" |
# Pair up with entries in file
awk 'NR==FNR { f[$0]; next }
FNR%2 { if ($0 in f) p=$0; else p=""; next }
p { print "mv \"" p "\" \"" $0 "\"" }' - log.txt |
Test it by replacing sh with cat and see what you get. If it looks correct, switch back.
Briefly, something similar could perhaps be pulled off with printf '%s\n' "$#" | grep -A 1 -Fxf - log.txt but you end up having to parse the output to pair up the output lines anyway.
Another solution:
for f in `grep -v "/" log.txt`; do
grep "/$f" log.txt | xargs -I{} cp $f {}
grep -q (for "quiet") stops the output
