Problems with shell scriptings using Sed - shell

Me and a friend are working on a project, and We have to create a script that can go into a file, and replace all occurances of a certain expression/word/letter with another using Sed. It is designed to go through multiple tests replacing all these occurances, and we don't know what they will be so we have to anticipate anything. We are having trouble on a certain test where we need to replace 'l*' with 'L' in different files using a loop. The code that i have is
#!/bin/sh
p1="$1"
shift
p2="$1"
shift
for file in "$#" #for any file in the directory
do
# A="$1"
#echo $A
#B="$2"
echo "$p1" | sed -e 's/\([*.[^$]\)/\\\1/g' > temporary #treat all special characters as plain text
A="`cat 'temporary'`"
rm temporary
echo "$p1"
echo "$file"
sed "s/$p1/$p2/g" "$file" > myFile.txt.updated #replace occurances
mv myFile.txt.updated "$file"
cat "$file"
done
I have tried testing this on practice files that contain different words and also 'l*' But whenever i test it, it deletes all the text in the file. Can someone help me with this, we would like to get it done soon. Thanks

It looks like you are trying to set A to a version of p1 with all special characters escaped. But you use p1 later instead of A. Try using the variable A, and also try setting it without a temporary file:
A=$( echo "$p1" | sed -e 's/\([*.[^$]\)/\\\1/g' )

Related

Adding test_ in front of a file name with path

I have a list of files stored in a text file, and if a Python file is found in that list. I want to the corresponding test file using Pytest.
My file looks like this:
/folder1/file1.txt
/folder1/file2.jpg
/folder1/file3.md
/folder1/file4.py
/folder1/folder2/file5.py
When 4th/5th files are found, I want to run the command pytest like:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Currently, I am using this command:
cat /workspace/filelist.txt | while read line; do if [[ $$line == *.py ]]; then exec "pytest test_$${line}"; fi; done;
which is not working correctly, as I have file path in the text as well. Any idea how to implement this?
Using Bash's variable substring removal to add the test_. One-liner:
$ while read line; do if [[ $line == *.py ]]; then echo "pytest ${line%/*}/test_${line##*/}"; fi; done < file
In more readable form:
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/}"
fi
done < file
Output:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Don't know anything about the Google Cloudbuild so I'll let you experiment with the double dollar signs.
Update:
In case there are files already with test_ prefix, use this bash script that utilizes extglob in variable substring removal:
shopt -s extglob # notice
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/?(test_)}" # notice
fi
done < file
You can easily refactor all your conditions into a simple sed script. This also gets rid of the useless cat and the similarly useless exec.
sed -n 's%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The regular expression matches anything after the last slash, which means the entire line if there is no slash; we include the .py suffix to make sure this only matches those files.
The pipe to xargs is a common way to convert standard input into command-line arguments. The -n 1 says to pass one argument at a time, rather than as many as possible. (Maybe pytest allows you to specify many tests; then, you can take out the -n 1 and let xargs pass in as many as it can fit.)
If you want to avoid adding the test_ prefix to files which already have it, one solution is to break up the sed script into two separate actions:
sed -n '/test_[^/]*\.py/p;t;s%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The first p simply prints the matches verbatim; the t says if that matched, skip the rest of the script for this input.
(MacOS / BSD sed will want a newline instead of a semicolon after the t command.)
sed is arguably a bit of a read-only language; this is already pressing towards the boundary where perhaps you would rewrite this in Awk instead.
You may want to focus on lines that ends with ".py" string
You can achieve that using grep combined with a regex so you can figure out if a line ends with .py - that eliminates the if statement.
IFS=$'\n'
for file in $(cat /workspace/filelist.txt|grep '\.py$');do pytest $file;done

Replacing 1 character with sed

I am trying to process a change of a specific character with regex using sed.
Essentially I am running a bash script that is renaming files that have a specific string and I need to keep this string mostly constant. Here is an example file name:
_FILE20210714.023.jpg
So I am trying to create a variable nfile that is used for the mv command and will convert it to the following:
_FILE20210714.123.jpg
Keep in mind that I only want to change the last 0 to a 1.
I came up with the following regex to grab that specific character, but I'm lost on how to substitute with sed:
_FILE\d{8}\.\K0
nfile=$(echo ${file}| sed -e 's/_FILE\d{8}\.\K0/_FILE\d{8}\.\K1/')
when i then echo the nfile variable i get the original name and i'm not sure how to resolve this.
echo ${file}
echo ${nfile}
/home/user/_FILE20210714.023.jpg
/home/user/_FILE20210714.023.jpg
So essential once I can substitute the 023 to 123 I'm set only problem is I have multiple files that end in like .034.jpg so I can't direct string match it.
sed doesn't support the \d escape sequence, you need to use [0-9].
Unless you use the -E option, you have to escape {} quantifiers.
sed doesn't support \K, but I don't think it's needed here.
You need to use a capture group to copy the digits from the original name to the replacement.
nfile=$(echo "${file}"| sed -E -e 's/(_FILE[0-9]{8}\.)0/\11/')
For this particular case a simple parameter substitution should suffice:
for file in '_FILE20210714.023.jpg' '/home/user/_ACH20210714.023.jpg'
do
nfile="${file//.0/.1}"
echo "######################"
echo " file: ${file}"
echo "nfile: ${nfile}"
done
This generates:
######################
file: _FILE20210714.023.jpg
nfile: _FILE20210714.123.jpg
######################
file: /home/user/_ACH20210714.023.jpg
nfile: /home/user/_ACH20210714.123.jpg
If you have the perl rename on your system, you'd write
rename -v 's/\.0(\d+\.jpg)$/.1$1/' *.jpg
Since you tagged bash
newname () {
local parts=() IFS="."
read -ra parts <<< "$1"
parts[1]="1${parts[1]#0}"
echo "${parts[*]}"
}
for file in *.jpg; do
mv -v "$file" "$(newname "$file")"
done

unix for loop modify the output filename

I am working in a directory with file names ending with fastq.gz. with using a loop like the following, I will be running a tool.
for i inls; do if [[ "$i" == *".gz" ]]; then bwa aln ../hg38.fa $i > $i | sed 's/fastq.gz/sai/g'; fi; done
My question is, I want my output filename to end with .sai instead of fastq.gz with keeping the rest of the filename the same. yet, as it first sees $i after >, it modifies the input file itself. I tried using it like <($i | sed 's/fastq.gz/sai/g') but that does not work either. what is the right way of writing this?
You can use string replacements to compute the filename and the extension.
Moreover, you shouldn't rely on the ls output but loop directly over the expression you are looking for.
for file in *.gz; do
name="${file%.*}"
file_output="${name}.sai"
bwa aln ../hg38.fa ${file} > ${file_output}
done

automatically renaming files

I have a bunch of files (more than 1000) on this like the followings
$ ls
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
org.allenai.ari.solvers.termselector.ExpandedLearner.lex
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lc
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lex
....
I have to rename these files files by adding a learners right before the capitalized name. For example
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
would change to
org.allenai.ari.solvers.termselector.learners.BaselineLearnersurfaceForm.lex
and this one
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
would change to
org.allenai.ari.solvers.termselector.learners.ExpandedLearner.lc
Any ideas how to do this automatically?
for f in org.*; do
echo mv "$f" "$( sed 's/\.\([A-Z]\)/.learner.\1/' <<< "$f" )"
done
This short loop outputs an mv command that renames the files in the manner that you wanted. Run it as-is first, and when you are certain it's doing what you want, remove the echo and run again.
The sed bit in the middle takes a filename ($f, via a here-string, so this requires bash) and replaces the first occurrence of a capital letter after a dot with .learner. followed by that same capital letter.
There is a tool called perl-rename, sometimes rename. Not to be confused with rename from util-linux.
It's very good for tasks like this as it takes a perl expression and renames accordingly:
perl-rename 's/(?=\.[A-Z])/.learners/' *
You can play with the regex online
Alternative you can a for loop and $BASH_REMATCH:
for file in *; do
[ -e "$file" ] || continue
[[ "$file" =~ ^([^A-Z]*)(.*)$ ]]
mv -- "$file" "${BASH_REMATCH[1]}learners.${BASH_REMATCH[2]}"
done
A very simple approach (useful if you only need to do this one time) is to ls >dummy them into a text file dummy, and then use find/replace in a text editor to make lines of the form mv xxx.yyy xxx.learners.yyy. Then you can simple execute the resulting file with ./dummy.
The exact find/replace commands depend on the text editor you use, but something like
replace org. with mv org.. That gets you the mv in the beginning.
replace mv org.allenai.ari.solvers.termselector.$1 with mv org.allenai.ari.solvers.termselector.$1 org.allenai.ari.solvers.termselector.learner.$1 to duplicate the filename and insert the learner.
There is also syntax with a for, which can do it probably in one line, (long) but I cannot explain it - try help for if you want to learn about it.

bash script rename multiple files [duplicate]

This question already has answers here:
Rename filename to another name
(3 answers)
Closed 7 years ago.
Let´s say I have a bunch of files named something like this: bsdsa120226.nai bdeqa140223.nai and I want to rename them to 120226.nai 140223.nai. How can i achieve this using the script below?
#!/bin/bash
name1=`ls *nai*`
names=`ls *nai*| grep -Po '(?<=.{5}).+'`
for i in $name1
do
for y in $names
do
mv $i $y
done
done
Solution:
name1=`ls *nai*`
for i in $name1
do
y=$(echo "$i" | grep -Po '(?<=.{5}).+')
mv $i $y
done
This:
#!/bin/bash
shopt -s extglob nullglob
for file in *+([[:digit:]]).nai; do
echo mv -nv -- "$file" "${file##+([^[:digit:]])}"
done
Remove the echo if you're happy with the mv commands.
Note. This solution does not assume that there are 5 leading characters to delete. It will delete all the leading non-numeric characters.
Using only bash, you could do this:
for file in *nai* ; do
echo mv -- "$file" "${file:5}"
done
(Remove the echo when satisfied with the output.)
Avoid ls in scripts, except for displaying information. Use plain globbing instead.
See also How do I do string manipulations in bash? for more string manipulation techniques.
Your script can't work with that structure: if you have 5 files, it will call mv five times for the first file (once for each element in the second list), five times for the second, etc. You'd need to iterate over the two sets of names in lockstep. (It also doesn't deal with things like whitespace in filenames.)
You would be better off using rename (prename on some systems) since that allows you to use Perl regular expressions to do the renaming, along the lines of:
prename 's/^.{5}//' *.nai
The reason your script is not behaving is that, for every source file, you're attempting to rename it to every target file.
If you need to limit yourself to using that script, you need to work out the single target file for each source file, something like:
#!/bin/bash
for i in *.nai; do
y=$(echo "$i" | cut -c6-)
mv "$i" "$y"
done
If your system has rename tool, it's better to go with the simple rename command,
rename 's/^.{5}//' *.nai
It just remove the first 5 characters from the file name.
OR
for i in *.nai; do mv "$i" $(grep -oP '(?<=^.{5}).+' <<< "$i"); done

Resources