Replacing 1 character with sed - bash

I am trying to process a change of a specific character with regex using sed.
Essentially I am running a bash script that is renaming files that have a specific string and I need to keep this string mostly constant. Here is an example file name:
_FILE20210714.023.jpg
So I am trying to create a variable nfile that is used for the mv command and will convert it to the following:
_FILE20210714.123.jpg
Keep in mind that I only want to change the last 0 to a 1.
I came up with the following regex to grab that specific character, but I'm lost on how to substitute with sed:
_FILE\d{8}\.\K0
nfile=$(echo ${file}| sed -e 's/_FILE\d{8}\.\K0/_FILE\d{8}\.\K1/')
when i then echo the nfile variable i get the original name and i'm not sure how to resolve this.
echo ${file}
echo ${nfile}
/home/user/_FILE20210714.023.jpg
/home/user/_FILE20210714.023.jpg
So essential once I can substitute the 023 to 123 I'm set only problem is I have multiple files that end in like .034.jpg so I can't direct string match it.

sed doesn't support the \d escape sequence, you need to use [0-9].
Unless you use the -E option, you have to escape {} quantifiers.
sed doesn't support \K, but I don't think it's needed here.
You need to use a capture group to copy the digits from the original name to the replacement.
nfile=$(echo "${file}"| sed -E -e 's/(_FILE[0-9]{8}\.)0/\11/')

For this particular case a simple parameter substitution should suffice:
for file in '_FILE20210714.023.jpg' '/home/user/_ACH20210714.023.jpg'
do
nfile="${file//.0/.1}"
echo "######################"
echo " file: ${file}"
echo "nfile: ${nfile}"
done
This generates:
######################
file: _FILE20210714.023.jpg
nfile: _FILE20210714.123.jpg
######################
file: /home/user/_ACH20210714.023.jpg
nfile: /home/user/_ACH20210714.123.jpg

If you have the perl rename on your system, you'd write
rename -v 's/\.0(\d+\.jpg)$/.1$1/' *.jpg
Since you tagged bash
newname () {
local parts=() IFS="."
read -ra parts <<< "$1"
parts[1]="1${parts[1]#0}"
echo "${parts[*]}"
}
for file in *.jpg; do
mv -v "$file" "$(newname "$file")"
done

Related

Adding test_ in front of a file name with path

I have a list of files stored in a text file, and if a Python file is found in that list. I want to the corresponding test file using Pytest.
My file looks like this:
/folder1/file1.txt
/folder1/file2.jpg
/folder1/file3.md
/folder1/file4.py
/folder1/folder2/file5.py
When 4th/5th files are found, I want to run the command pytest like:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Currently, I am using this command:
cat /workspace/filelist.txt | while read line; do if [[ $$line == *.py ]]; then exec "pytest test_$${line}"; fi; done;
which is not working correctly, as I have file path in the text as well. Any idea how to implement this?
Using Bash's variable substring removal to add the test_. One-liner:
$ while read line; do if [[ $line == *.py ]]; then echo "pytest ${line%/*}/test_${line##*/}"; fi; done < file
In more readable form:
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/}"
fi
done < file
Output:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Don't know anything about the Google Cloudbuild so I'll let you experiment with the double dollar signs.
Update:
In case there are files already with test_ prefix, use this bash script that utilizes extglob in variable substring removal:
shopt -s extglob # notice
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/?(test_)}" # notice
fi
done < file
You can easily refactor all your conditions into a simple sed script. This also gets rid of the useless cat and the similarly useless exec.
sed -n 's%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The regular expression matches anything after the last slash, which means the entire line if there is no slash; we include the .py suffix to make sure this only matches those files.
The pipe to xargs is a common way to convert standard input into command-line arguments. The -n 1 says to pass one argument at a time, rather than as many as possible. (Maybe pytest allows you to specify many tests; then, you can take out the -n 1 and let xargs pass in as many as it can fit.)
If you want to avoid adding the test_ prefix to files which already have it, one solution is to break up the sed script into two separate actions:
sed -n '/test_[^/]*\.py/p;t;s%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The first p simply prints the matches verbatim; the t says if that matched, skip the rest of the script for this input.
(MacOS / BSD sed will want a newline instead of a semicolon after the t command.)
sed is arguably a bit of a read-only language; this is already pressing towards the boundary where perhaps you would rewrite this in Awk instead.
You may want to focus on lines that ends with ".py" string
You can achieve that using grep combined with a regex so you can figure out if a line ends with .py - that eliminates the if statement.
IFS=$'\n'
for file in $(cat /workspace/filelist.txt|grep '\.py$');do pytest $file;done

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

Create variable by combining text + another variable

Long story short, I'm trying to grep a value contained in the first column of a text file by using a variable.
Here's a sample of the script, with the grep command that doesn't work:
for ii in `cat list.txt`
do
grep '^$ii' >outfile.txt
done
Contents of list.txt :
123,"first product",description,20.456789
456,"second product",description,30.123456
789,"third product",description,40.123456
If I perform grep '^123' list.txt, it produces the correct output... Just the first line of list.txt.
If I try to use the variable (ie grep '^ii' list.txt) I get a "^ii command not found" error. I tried to combine text with the variable to get it to work:
VAR1= "'^"$ii"'"
but the VAR1 variable contained a carriage return after the $ii variable:
'^123
'
I've tried a laundry list of things to remove the cr/lr (ie sed & awk), but to no avail. There has to be an easier way to perform the grep command using the variable. I would prefer to stay with the grep command because it works perfectly when performing it manually.
You have things mixed in the command grep '^ii' list.txt. The character ^ is for the beginning of the line and a $ is for the value of a variable.
When you want to grep for 123 in the variable ii at the beginning of the line, use
ii="123"
grep "^$ii" list.txt
(You should use double quotes here)
Good moment for learning good habits: Continue in variable names in lowercase (well done) and use curly braces (don't harm and are needed in other cases) :
ii="123"
grep "^${ii}" list.txt
Now we both are forgetting something: Our grep will also match
1234,"4-digit product",description,11.1111. Include a , in the grep:
ii="123"
grep "^${ii}," list.txt
And how did you get the "^ii command not found" error ? I think you used backquotes (old way for nesting a command, better is echo "example: $(date)") and you wrote
grep `^ii` list.txt # wrong !
#!/bin/sh
# Read every character before the first comma into the variable ii.
while IFS=, read ii rest; do
# Echo the value of ii. If these values are what you want, you're done; no
# need for grep.
echo "ii = $ii"
# If you want to find something associated with these values in another
# file, however, you can grep the file for the values. Use double quotes so
# that the value of $ii is substituted in the argument to grep.
grep "^$ii" some_other_file.txt >outfile.txt
done <list.txt

convert a file path into string

I'm having an error trying to find a way to replace a string in a directory path with another string
sed: Error tryning to read from {directory_path}: It's a directory
The shell script
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
for file in $(find $R2K_SOURCE )
do
if [ -d $file ]
then
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
# find $R2K_PROCCESED -type f -size -200c -delete
i'm understanding that the rror it's in this line
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
but i don't know how to tell sh that treats $file variable as string and not as a directory object.
If you want ot replace part of path name you can echo path name and take it to sed over pipe.
Also you must enable globbing by placing sed commands into double quotes instead of single and change separator for 's' command like that:
R2K_TEMP_DIR=$(echo "$file" | sed "s:$R2K_SOURCE:$R2K_PROCESSED:g")
Then you will be able to operate with slashes inside 's' command.
Update:
Even better is to remove useless echo and use "here is string" instead:
R2K_TEMP_DIR=$(sed "s:$R2K_SOURCE:$R2K_PROCESSED:g" <<< "$file")
First, don't use:
for item in $(find ...)
because you might overload the command line. Besides, the for loop cannot start until the process in $(...) finishes. Instead:
find ... | while read item
You also need to watch out for funky file names. The for loop will cough on all files with spaces in them. THe find | while will work as long as files only have a single space in their name and not double spaces. Better:
find ... -print0 | while read -d '' -r item
This will put nulls between file names, and read will break on those nulls. This way, files with spaces, tabs, new lines, or anything else that could cause problems can be read without problems.
Your sed line is:
R2K_TEMP_DIR=$( sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g' $file )
What this is attempting to do is edit your $file which is a directory. What you want to do is munge the directory name itself. Therefore, you have to echo the name into sed as a pipe:
R2K_TEMP_DIR=$(echo $file | sed 's/"$R2K_SOURCE"/"$R2K_PROCESSED"/g')
However, you might be better off using environment variable parameters to filter your environment variable.
Basically, you have a directory called source/ and all of the files you're looking for are under that directory. You simply want to change:
source/foo/bar
to
processed/foo/bar
You could do something like this ${file#source/}. The # says this is a left side filter and it will remove the least amount to match the glob expression after the #. Check the manpage for bash and look under Parameter Expansion.
This, you could do something like this:
#!/bin/sh
R2K_SOURCE="source/"
R2K_PROCESSED="processed/"
R2K_TEMP_DIR=""
echo " Procesando archivos desde $R2K_SOURCE "
find $R2K_SOURCE -print0 | while read -d '' -r file
do
if [ -d $file ]
then
R2K_TEMP_DIR="processed/${file#source/}"
echo "directorio $R2K_TEMP_DIR"
else
# some code executes
:
fi
done
R2K_TEMP_DIR="processed/${file#source/}" removes the source/ from the start of $file and you merely prepend processed/ in its place.
Even better, it's way more efficient. In your original script, the $(..) creates another shell process to run your echo in which then pipes out to another process to run sed. (Assuming you use loentar's solution). You no longer have any subprocesses running. The whole modification of your directory name is internal.
By the way, this should also work too:
R2K_TEMP_DIR="$R2K_PROCESSED/${file#$R2K_SOURCE}"
I just didn't test that.

Problems with shell scriptings using Sed

Me and a friend are working on a project, and We have to create a script that can go into a file, and replace all occurances of a certain expression/word/letter with another using Sed. It is designed to go through multiple tests replacing all these occurances, and we don't know what they will be so we have to anticipate anything. We are having trouble on a certain test where we need to replace 'l*' with 'L' in different files using a loop. The code that i have is
#!/bin/sh
p1="$1"
shift
p2="$1"
shift
for file in "$#" #for any file in the directory
do
# A="$1"
#echo $A
#B="$2"
echo "$p1" | sed -e 's/\([*.[^$]\)/\\\1/g' > temporary #treat all special characters as plain text
A="`cat 'temporary'`"
rm temporary
echo "$p1"
echo "$file"
sed "s/$p1/$p2/g" "$file" > myFile.txt.updated #replace occurances
mv myFile.txt.updated "$file"
cat "$file"
done
I have tried testing this on practice files that contain different words and also 'l*' But whenever i test it, it deletes all the text in the file. Can someone help me with this, we would like to get it done soon. Thanks
It looks like you are trying to set A to a version of p1 with all special characters escaped. But you use p1 later instead of A. Try using the variable A, and also try setting it without a temporary file:
A=$( echo "$p1" | sed -e 's/\([*.[^$]\)/\\\1/g' )

Resources