Remove and replace characters from string with one command - bash

I need to remove characters from string and then replace other characters.
This is the initial string:
something/filename.txt
I need to remove the directory "something" (can be any other name) and replace .txt with .gz
The following 2 commmands work perfect:
newfile=${newfile#*/}
newfile=${newfile::-4}.gz
So the output will be: filename.gz
Is there a way to do it in a single command? Something like:
${${$newfile#*/}::-4}.gz
With the above command I get: bad substitution error.
Thank you
Lucas

Perhaps you could use basename, i.e.
name_of_file="something/filename.txt"
newfile=$(basename "${name_of_file%%.*}".gz)
echo "$newfile"
filename.gz

Since your question is tagged bash, you can use Bash builtin regex to capture the group you need like this:
#!/usr/bin/env bash
filepath=something/filename.txt
# Regex group capture basename without dot suffix || exit err if not matching
[[ $filepath =~ .*/(.*)\.[^.]* ]] || exit
# Compose new file name from Regex captured group and new .gz extension
newfilename=${BASH_REMATCH[1]}.gz
# debug dump variables
declare -p filepath newfilename

new_file=something/filename.txt
new_file="${new_file#*/}"
new_file="${new_file%.*}.gz"
Is there a way to do it in a single command?
echo something/filename.txt | sed 's|.*/||;s|\..*$|.gz|'

A combination of cut and sed can help as below
oldfile='somethingelse/filename.txt'
newfile=`echo $oldfile | cut -d "/" -f2 |sed 's!.txt!.gz!g'`
echo $newfile
This displays filename.gz
EDIT
In case there are subdirectories and you want only file name
oldfile='somethingelse/other/filename.txt'
newfile=`echo $oldfile | rev| cut -d "/" -f1 |rev |sed 's!.txt!.gz!g'`
echo $newfile
The cut command gets the last field delimited by "/" .
Happy to be corrected and learn.

Related

Pipe the output of basename to string substitution

I need the basename of a file that is given as an argument to a bash script. The basename should be stripped of its file extension.
Let's assume $1 = "/somefolder/andanotherfolder/myfile.txt", the desired output would be "myfile".
The current attempt creates an intermediate variable that I would like to avoid:
BASE=$(basename "$1")
NOEXT="${BASE%.*}"
My attempt to make this a one-liner would be piping the output of basename. However, I do not know how to pipe stdout to a string substitution.
EDIT: this needs to work for multiple file extensions with possibly differing lengths, hence the string substitution attempt as given above.
Why not Zoidberg ?
Ehhmm.. I meant why not remove the ext before going for basename ?
basename "${1%.*}"
Unless of course you have directory paths with dots, then you'll have to use basename before and remove the extension later:
echo $(basename "$1") | awk 'BEGIN { FS = "." }; { print $1 }'
The awk solution will remove anything after the first dot from the filename.
There's a regular expression based solution which uses sed to remove only the extension after last dot if it exists:
echo $(basename "$1") | sed 's/\(.*\)\..*/\1/'
This could even be improved if you're sure that you've got alphanumeric extensions of 3-4 characters (eg: mp3, mpeg, jpg, txt, json...)
echo $(basename "$1") | sed 's/\(.*\)\.[[:alnum:]]\{3\}$/\1/'
How about this?
NEXT="$(basename -- "${1%.*}")"
Testing:
set -- '/somefolder/andanotherfolder/myfile.txt'
NEXT="$(basename -- "${1%.*}")"
echo "$NEXT"
myfile
Alternatively:
set -- "${1%.*}"; NEXT="${1##*/}"
NOEXT="${1##*/}"; NOEXT="${NOEXT%.*}"
How about:
$ [[ $var =~ [^/]*$ ]] && echo ${BASH_REMATCH%.*}
myfile

How can I save only a substring of file names from a directory without the file extension?

I have a directory that I'm reading from and I want to save only the date representation as a string.
I am close to getting it , although I know there is probably an easier way. Here is what I have so far:
#files are in the format of "THIS_20200420.csv" so I want only "20200420"
declare -a arr
declare -a arr2
FILES=test2/*.csv
for file in $FILES
do
arr=(${arr[*]} "${file##*/}")
done
for i in "${arr[#]}"
do
arr2+=$(echo $i | cut -c6-13)
done
for item in "${arr2[#]}"
do
echo $item
done
the output shows the array only having one element which is all the strings concatenated:
20200110202001202020021920200220202004202020042220200110202001202020021920200220202004202020042220200219202002202020042020200422
Im bashing my head against my computer at this point.
arr=(
"THIS_20200420.csv"
"THIS_20200421.csv"
"THIS_20200422.csv"
"THIS_20200423.csv"
"THIS_20200424.csv"
"THIS_20200425.csv"
"THIS_20200426.csv"
"THIS_20200427.csv"
"THIS_20200428.csv"
"THIS_20200429.csv"
"THIS_20200430.csv" )
arr=( ${arr[#]//*_} )
arr=( ${arr[#]//.*} )
echo "arr: ${arr[#]}"
Explanation:
arr=( ${arr[#]//*_} ) will match all char up to '_' for each element, and replace them with empty string.
arr=( ${arr[#]//.*} ) will match all char after '.' for each element, and replace them with empty string.
For more information on parameter expansion, a good reference is TLDP's guide on parameter expansion.
Try this
declare -a arrayname=($(ls -1 test2/*.csv | grep -o '[0-9]*'))
Demo:
$ls -1 *csv
THIS_20200420.csv
THIS_20200421.csv
THIS_20200422.csv
THIS_20200423.csv
THIS_20200424.csv
THIS_20200425.csv
THIS_20200426.csv
THIS_20200427.csv
THIS_20200428.csv
THIS_20200429.csv
THIS_20200430.csv
$declare -a arrayname=($(ls -1 *csv | grep -o '[0-9]*'))
$echo ${arrayname[#]}
20200420 20200421 20200422 20200423 20200424 20200425 20200426 20200427 20200428 20200429 20200430
$echo ${arrayname[2]}
20200422
$
You could achieve this using a loop with awk:
$ for file in *.csv; do echo $file | awk -F '[^[:alnum:]]' '{print $2}'; done
The -F '[^[:alnum:]]' tells awk to use non alphanumeric characters as the delimiter.
Another way to do this is to use bash shell parameter expansion to echo only the part of the filename you want. This obviously only works if your filenames have consistent formatting:
$ for file in *.csv; do echo "${file:5:8}"; done
I thought it would be nice to use bash parameter expansion to strip the unwanted prefix and suffix but you can't have nested expansion (afaict) so this is the best I could come up with:
$ for file in *.csv; do echo "$(tmp=${file%.csv}; echo ${tmp#THIS_})"; done
Meet Cut! A good friend of Linux Users
for file in ./*.csv; do echo $file | cut -d "_" -f 2 | cut -d "." -f 1 ; done
This one line should do the trick!
Example:
Use an array for the files assignment and parameter expansion.
#!/usr/bin/env bash
shopt -s nullglob
##: Save the files ending in *.csv in an array
## so it expands properly, variable assignment does not expand the glob *
files=(test2/*.csv)
##: Remain only the files that end with .csv without the pathname, longest match
files=("${files[#]##*/}")
##: Remain only the file names without the .csv extention
files=("${files[#]%.csv}")
##: Remain only the filename after the _ from the beginning, shortest match.
files=("${files[#]#*_}")
printf '%s ' "${files[#]}"

Create files using strings which delimited by specific character in BASH

Suppose we have the following command and its related output :
gsettings list-recursively org.gnome.Terminal.ProfilesList | head -n 1 | grep -oP '(?<=\[).*?(?=\])'
Output :
'b1dcc9dd-5262-4d8d-a863-c897e6d979b9', 'ca4b733c-53f2-4a7e-8a47-dce8de182546', '802e8bb8-1b78-4e1b-b97a-538d7e2f9c63', '892cd84f-9718-46ef-be06-eeda0a0550b1', '6a7d836f-b2e8-4a1e-87c9-e64e9692c8a8', '2b9e8848-0b4a-44c7-98c7-3a7e880e9b45', 'b23a4a62-3e25-40ae-844f-00fb1fc244d9'
I need to use gsettings command in a script and create filenames regarding to output ot gessetings command. For example a file name should be
b1dcc9dd-5262-4d8d-a863-c897e6d979b9
the next one :
ca4b733c-53f2-4a7e-8a47-dce8de182546
and so on.
How I can do this?
Another solution... just pipe the output of your command to:
your_command | sed "s/[ ']//g" | xargs -d, touch
You can use process substitution to read your gsettings output and store it in an array :
IFS=', ' read -r -a array < <(gsettings)
for f in "${array[#]}"
do
file=$(echo $f |tr -d "'" ) # removes leading and trailing quotes
touch "$file"
done

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Shell script to find a string with spaces and wrap with quotes

I am new to shell scripts. I want to read a file line by line, which contains arguments and if the arguments contains any spaces in it, I want to replace it by enclosing with quotes.
For example if the file (test.dat) contains:
-DtestArgument1=/path/to a/text file
-DtestArgument2=/path/to a/text file
After parsing the above file, shell script should prepare the string with following:
-DtestArgument1="/path/to a/text file" -DtestArgument2="/path/to a/text file"
Here is my shell script:
while read ARGUMENT; do
ARGUMENT=`echo ${ARGUMENT} | tr "\n" " "`
if [[ "${ARGUMENT}" =~ " " ]]; then
ARGUMENT=`echo $ARGUMENT | sed 's/\^(-D.*\)=(.*)/\1=\"\2\"/g'`
NEW_ARGUMENT="${NEW_ARGUMENT} ${ARGUMENT}"
else
echo "doesn't contains spaces"
NEW_ARGUMENT="${NEW_ARGUMENT} ${ARGUMENT}"
fi
done < test.dat
But it's throwing the following error:
sed: -e expression #1, char 28: Unmatched ) or \)
The code should be compatible with all shells.
I think you should simplify the problem. Rather than worrying about spaces, just quote the argument after the =. Something like:
sed -e 's/=/="/' -e 's/$/"/' test.dat | paste -s -d\ -
Should be sufficient. If you really care about spaces, you could try something like:
sed -e '/=.* /{ s/=/="/; s/$/"/; }' test.dat | paste -s -d\ -
That will only notice spaces after the =. Just use / / if you really want to change any line that has a space anywhere.
There's no need to use a while/read loop: just let sed read the file directly.
The sed parentheses should be escaped:
ARGUMENT=`echo $ARGUMENT | sed "s/\^\(-D.*\)=\(.*\)/\1=\"\2\"/g"`
One place you did, in 3 places you forgot... BTW, I generally use " quotation.
If you prefer '-style, do like this:
ARGUMENT=`echo $ARGUMENT | sed 's/\^(-D.*)=(.*)/\1="\2"/g'`

Resources