Editing line with sed in for loop from other file - bash

im new in bash,
Im trying to edit line with sed with for loop from other file.
Please tell me what i doing wrong in my small code?
Do i missing another loop?
#!/bin/bash
# read and taking the line needed:
for j in `cat /tmp/check.txt`; do
# replacing the old value with and value:
sed -i "s+/tmp/old_name/+/${j}/+gi" file_destantion.txt$$
#giving numbers to the logs for checking
Num=j +1
# moving the changed file to .log number ( as for see that it is changed):
mv file_destantion.txt$$ file_destantion.txt$$.log$Num
#create ne source file to do the next value from /tmp/check:
cp -rp file_destantion.txt file_destantion.txt$$
done
On /tmp/check i have the info that i want to enter on each loop turn.
in /tmp/check:
/tmp/check70
/tmp/check70_1
/tmp/_check7007
In the end this is what i want it to be like:
.log1 > will contain /tmp/check70
.log2 > will contain /tmp/check70_1
.log3 will contain /tmp/check7007

I have found this solution worked for me.
#!/bin/bash
count=0
grep -v '^ *#' < /tmp/check | while IFS= read -r line ;do
cp -rp file_destantion.txt file_destantion.txt$$
sed -i "s+/tmp/old_name/+${line}/+gi" file_destantion.txt$$
(( count++ ))
mv file_destantion.txt$$ "file_destantion.txt$$.log${count}"
cp -rp file_destantion.txt file_destantion.txt$$
done
thank you very much #Cyrus for your guiding.

Related

How to add lines at the beginning of either empty or not file?

I want to add lines at beginning of file, it works with:
sed -i '1s/^/#INFO\tFORMAT\tunknown\n/' file
sed -i '1s/^/##phasing=none\n/' file
However it doesn't work when my file is empty. I found these commands:
echo > file && sed '1s/^/#INFO\tFORMAT\tunknown\n/' -i file
echo > file && sed '1s/^/##phasing=none\n/' -i file
but the last one erase the first one (and also if file isn't empty)
I would like to know how to add lines at the beginning of file either if the file is empty or not
I tried a loop with if [ -s file ] but without success
Thanks!
You can use the insert command (i).
if [ -s file ]; then
sed -i '1i\
#INFO\tFORMAT\tunknown\
##phasing=none' file
else
printf '#INFO\tFORMAT\tunknown\n##phasing=none' > file
fi
Note that \t for tab is not POSIX, and does not work on all sed implementations (eg BSD/Apple, -i works differently there too). You can use a raw tab instead, or a variable: tab=$(printf '\t').
You should use i command in sed:
file='inputFile'
# insert a line break if file is empty
[[ ! -s $file ]] && echo > "$file"
sed -i.bak $'1i\
#INFO\tFORMAT\tunknown
' "$file"
Or you can ditch sed and do it in the shell using printf:
{ printf '#INFO\tFORMAT\tunknown\n'; cat file; } > file.new &&
mv file.new file
With plain bash and shell utilities:
#!/bin/bash
header=(
$'#INFO\tFORMAT\tunknown'
$'##phasing=none'
)
mv file file.bak &&
{ printf '%s\n' "${header[#]}"; cat file.bak; } > file &&
rm file.bak
Explicitely creating a new file, then moving it:
#!/bin/bash
echo -e '#INFO\tFORMAT\tunknown' | cat - file > file.new
mv file.new file
or slurping the whole content of the file into memory:
#!/bin/bash
printf '#INFO\tFORMAT\tunknown\n%s' "$(<file)" > file
It is trivial with ed if available/acceptable.
printf '%s\n' '0a' $'#INFO\tFORMAT\tunknown' $'##phasing=none' . ,p w | ed -s file
It even creates the file if it does not exists.

Search file of directories and find file names, save to new file - bash

I'm trying to find the paths for some fastq.gz files in a mess of a system.
I have some folder paths in a file called temp (subset):
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/
Let's assume 2 fastq.gz files are found in each directory in temp except for /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/.
I want to find the fastq.gz files and print them (if found) next to the directory I'm searching in.
Ideal output:
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/ found /temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/ not_found
I'm part the way there:
wc -l temp
while read -r line; do cd $line; echo ${line} >> ~/tmp; find `pwd -P` -name "*fastq.gz" >> ~/tmp; done < temp
cd ~
less tmp
Current output:
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG167/NG167_S19_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG178/NG178_S1_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG213/NG213_S20_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG230/NG230_S23_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG234/NG234_S18_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG250/NG250_S2_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG251/NG251_S3_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG257/NG257_S4_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R1_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG265/temp/
My code places the directory searched for first, then any matching files on subsequent lines. I'm not sure how to get the output I desire...
Any help, gratefully received!
Thanks,
Not your original script but this version does not run cd and find on each line in this case each directory but the whole directory tree/structure just once and the parsing is done inside the while read loop.
#!/usr/bin/env bash
mapfile -t to_search < temp.txt
while IFS= read -rd '' files; do
if [[ $files == *.fastq.gz ]]; then
printf '%s found %s\n' "${files%/*}/" "$files"
else
printf '%s not_found!\n' "$files" >&2
fi
done < <(find "${to_search[#]%/*.fastq.gz*}" -print0) | column -t
This is how I would rewrite your script. Using cd in a subshell
#!/usr/bin/env bash
while read -r line; do
if [[ -d "$line" ]]; then
(
cd "$line" || exit
varname=$(find "$(pwd -P)" -name '*fastq.gz')
if [[ -n $varname ]]; then
printf '%s found %s\n' "$line" "$line${varname#*./}"
else
printf '%s not_found!\n' "$line"
fi
)
fi
done < temp.txt | column -t
Given a line -
/temp/CC49/DATA/Gh7d/NYSTAG_TSO_Mar16/NG263/NG263_S22_R2_001.fastq.gz
you can get what you want for the found lines quite easily with sed - just feed the lines to it.
... | sed -e 's#^\(.*/\)\([^/]*\)$#\1 found \1\2#'
However, that doesn't eliminate the line before.
To do that you either use something like awk (and do a simple state machine), or do something like this in sed (general idea here https://stackoverflow.com/a/25203093).
... | sed -e '#/$#{$!N;#\n.*gz$#!P;D}'
(although I think I have a typo as it is not working for me on osx).
So then you'd be left with the .gz lines already converted, and the lines ending in / where you can also use sed to then append the "not found".
... | sed -e 's#/$#/ not found#'

Replacing the duplicate uuids across multiple files

I am trying to replace the duplicate UUIDs from multiple files in a directory. Even the same file can have duplicate UUIDs.
I am using Unix utilities to solve this.
Till now I have used grep, cut, sort and uniq to find all the duplicate UUIDs across the folder and store it in a file (say duplicate_uuids)
Then I tried sed to replace the UUIDs by looping through the file.
filename="$1"
re="*.java"
while read line; do
uuid=$(uuidgen)
sed -i'.original' -e "s/$line/$uuid/g" *.java
done < "$filename"
As you would expect, I ended up replacing all the duplicate UUIDs with new UUID but still, it is duplicated throughout the file!
Is there any sed trick that can work for me?
There are a bunch of ways this can likely be done. Taking a multi-command approach using a function might give you greater flexibility if you want to customize things later, for example:
#!/bin/bash
checkdupes() {
files="$*"
for f in $files; do
filename="$f"
printf "Searching File: %s\n" "${filename}"
while read -r line; do
arr=( $(grep -n "${line}" "${filename}" | awk 'BEGIN { FS = ":" } ; {print $1" "}') )
for i in "${arr[#]:1}"; do
sed -i '' ''"${i}"'s/'"${line}"'/'"$(uuidgen)"'/g' "${filename}"
printf "Replaced UUID [%s] at line %s, first found on line %s\n" "${line}" "${i}" "${arr[0]}"
done
done< <( sort "${filename}" | uniq -d )
done
}
checkdupes /path/to/*.java
So what this series of commands does is to first sort the duplicates (if any) in whatever file you choose. It takes those duplicates and uses grep and awk to create an array of line numbers which each duplicate is found. Looping through the array (while skipping the first value) will allow the duplicates to be replaced by a new UUID and then re-saving the file.
Using a duplicate list file:
If you want to use a file with a list of dupes to search other files and replace the UUID in each of them that match it's just a matter of changing two lines:
Replace:
for i in "${arr[#]:1}"; do
With:
for i in "${arr[#]}"; do
Replace:
done< <( sort "${filename}" | uniq -d )
With:
done< <( cat /path/to/dupes_list )
NOTE: If you don't want to overwrite the file, then remove sed -i '' at the beginning of the command.
This worked for me:
#!/bin/bash
duplicate_uuid=$1
# store file names in array
find . -name "*.java" > file_names
IFS=$'\n' read -d '' -r -a file_list < file_names
# store file duplicate uuids from file to array
IFS=$'\n' read -d '' -r -a dup_uuids < $duplicate_uuid
# loop through all files
for file in "${file_list[#]}"
do
echo "$file"
# Loop through all repeated uuids
for old_uuid in "${dup_uuids[#]}"
do
START=1
# Get the number of times uuid present in this file
END=$(grep -c $old_uuid $file)
if (( $END > 0 )) ; then
echo " Replacing $old_uuid"
fi
# Loop through them one by one and change the uuid
for (( c=$START; c<=$END; c++ ))
do
uuid=$(uuidgen)
echo " [$c of $END] with $uuid"
sed -i '.original' -e "1,/$old_uuid/s/$old_uuid/$uuid/" $file
done
done
rm $file.original
done
rm file_names

Remove lines partially matching other lines in a file

I have the following lines in input.txt:
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx;CITICHK;interbridge_hk_eqx
client_dkp_crd;DELIVERTOCOMPID;DESTINATION
client_dkp_crd;NORD;interbridge_fr
client_dkp_crd;NORD;interbridge_fr;broker_nordea_2
client_dkp_crd;AVIA;interbridge_fr
client_dkp_crd;AVIA;interbridge_fr;interbridge_ld
client_dkp_crd;SEBAP;interbridge_fr
client_dkp_crd;SEBAP;interbridge_fr;broker_seb_ss_thl
client_epf_crd;DELIVERTOCOMPID;DESTINATION
I need some bash (awk/sed) script to remove the lines that are partially similar to others. Desired output should be:
client_citic_plat_fix44;CITICHK;interbridge_ulnet_se_eqx;CITICHK;interbridge_hk_eqx
client_dkp_crd;DELIVERTOCOMPID;DESTINATION
client_dkp_crd;NORD;interbridge_fr;broker_nordea_2
client_dkp_crd;AVIA;interbridge_fr;interbridge_ld
client_dkp_crd;SEBAP;interbridge_fr;broker_seb_ss_thl
client_epf_crd;DELIVERTOCOMPID;DESTINATION
Columns 1, 2 and 3 are always similar and I always want to remove the shortest line between the two compared.
Thanks!
Here's a solution using grep and sed:
#!/bin/bash
file="filepath"
while IFS= read -r line;do
(($(grep $line "$file" -c)>1)) && sed -i "/^$line$/d" "$file"
done <"$file"
Note: This will replace your file.
To not replace your file and to put the output to another file, you can do this:
#!/bin/bash
infile="infilepath"
outfile="outfilepath"
cp "$infile" "$outfile"
while IFS= read -r line;do
(($(grep $line "$infile" -c)>1)) && sed -i "/^$line$/d" "$outfile"
done <"$infile"

Bash: iterate over several files and perform a different change for each file

I want to change in several txt-files a variable. But this variable shouldn't be equal in every txt-file.
I tried this, but it doesn't work.
for file in ./*.txt ;do
for value in $(seq 1 5); do
sed -i 's/x_max=.*/x_max='$value'/ ' $file
done
done
So every x_max has got the value:5
This should do the trick. Replace each file only once, with a different value each time.
value=1
for file in *.txt; do
sed -i 's/x_max=.*/x_max='$value'/' $file
value=$((value + 1))
done
That should do it - only iterate once and raise the counter by 1 after each run:
counter=1
for file in ./*.txt ;do
sed -i 's/x_max=.*/x_max='$counter'/ ' $file
(( counter++ ))
done

Resources