Make string more user friendly/readable in bash script [duplicate] - bash

This question already has answers here:
Remove/replace html tags in bash
(2 answers)
Closed 6 years ago.
I've been trying to convert the following string into a more readable and usable form in a bash script. Certain parts are not static.
(<font color='whisper'>[ <name shortname='UserName' src='whisper-from'>UserName</name> whispers, "test" to you. ]</font>
A lot of the stuff in fact is not static. Basically, I want the end result to look like:
([UserName whispers, "test" to you. ]
I have done this time and time again in Java, PHP, and even VB6. However I am new to bash scripts, and can't seem to get it to work.
Could someone help me convert this Java code to bash script?
data = MyString.replaceAll("<.*?>", "");
data = MyString.replaceAll("<", "<");
data = MyString.replaceAll(">", ">");

In bash, you can use pattern substitution. Let's start with this string:
$ s='<Name>'
And, let's do substitutions on it:
$ s="${s//</<}"
$ s="${s//>/>}"
$ echo "$s"
<Name>
Bash works on globs. If you need regular expressions, try sed:
$ s='<Name>'
$ echo "$s" | sed 's/</</g; s/>/>/g; s/<[^>]*>/<>/g'
<>
In a more complex example:
$ MyStr='(<font color='whisper'>[ <name shortname='UserName' src='whisper-from'>UserName</name> whispers, "test" to you. ]</font>'
$ echo "$MyStr" | sed 's/</</g; s/>/>/g; s/<[^>]*>//g'
([ UserName whispers, "test" to you. ]

Use sed by itself or, since you mentioned bash, use sed within a bash script (for example b.sh):
#!/bin/bash
sed 's/>/>/g' | sed 's/</</g' | sed 's/<.*?>//g'
Input data (for example b.txt file):
asdasdasdds<.*?>dasdasassxrh
sadaswqqw<ssadasdasdsdvvxc
sadssadadsads>dsdsdewpppp
Output results:
asdasdasddsdasdasassxrh
sadaswqqw<ssadasdasdsdvvxc
sadssadadsads>dsdsdewpppp
Usage:
b.sh < b.txt
NOTE: I broke each replace all into separate sed calls in case you wanted to modify or add more formatting changes.

Related

Combining variable concatenation and for loops in bash

I have this function in R, which I use to produce a list of dates:
#! usr/bin/env Rscript
date_seq = function(){
args = commandArgs(trailingOnly = TRUE)
library(lubridate)
days = seq(ymd(args[1]),ymd(args[2]),1)
days =format(days, "%Y%m%d")
return(days)
}
date_seq()
I call this function in a bash script to create a vector of dates:
Rscript date_seq.R 20160730 20160801 > dates
I define a couple of other string variables in the bash script:
home_url="https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"
file_name="/hrrr.t{00-23}z.wrfsfcf00.grib2"
The final goal is to create a vector of download links, that incorporates the three variables home_url, date and file_name, like so:
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160730/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160731/hrrr.t{00-23}z.wrfsfcf00.grib2"
"https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20160801/hrrr.t{00-23}z.wrfsfcf00.grib2"
I tried a few lines in bash script:
for date in $dates; do download_url=$home_url$date$hrrr_file; cat
$download_url; done
for date in $dates; do download_url="${home_url}${date}${hrrr_file}"; cat $download_url;
done
for date in $dates; do download_url="$home_url"; download_url+="$date"; download_url+="$hrrr_file"; cat $download_url; done
None of these produce the output I expect. I am not sure if the download_url variable is not being produced, or is being produced and stored somewhere, and I am not able to reproduce it. Can anyone please help me understand?
Edit
Results of trying the suggestions below:
#triplee suggested using
sed "s#.*#$home_url&$hrrr_file#" "dates"
and
while read -r date; do; printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"; done <dates
Both of these produce this output:
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1] "20160730" "20160731" "20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2
#xdhmoore suggested using
for date in $(cat dates); do; echo ${home_url}${date}${hrrr_file}"; done
which produces this output:
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/[1]/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160730"/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160731"/hrrr.t{00-23}z.wrfsfcf00.grib2
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/"20160801"/hrrr.t{00-23}z.wrfsfcf00.grib2`
Both are not the output I am expecting, though the solution by #xdhmoore is closer. But I see another problem in #xdhmoore's solution: The quotations around the date in output. The output of cat dates looks like this: "20160730" "20160731" "20160801", so I think I have to rework the function or the way I call it in the bash script as well.
I'll keep updating the question to reflect the output of all suggestions, since it is simpler to do so than trying to answer each comment. As always, thanks a lot!
The for statement loops over the tokens you give it as arguments, not the contents of files.
You seem to be looking for
sed "s#.*#$home_url&$hrrr_file#" "dates"
The token & recalls the text which was matched by the regex in a sed substitution.
The same thing could be done vastly more slowly with a shell loop;
while read -r date; do
printf '%s%s%s\n' "$home_url" "$date" "$hrrr_file"
done <dates
which illustrates how to (slowly) iterate over the lines in a file without the use of external utilities.
Either of hese can be piped to xargs curl (or perhaps xargs -n 1 curl); or you could refactor the while loop;
while read -r date; do
curl "$home_url$date$hrrr_file"
done <dates
As noted in comments, cat is a command for copying files, not echoing text; for the latter, use echo or (for any nontrivial formatting) printf.
Update: The above assumes your R output generated one date per line. To split the file into lines and remove quotes around the values, you can preprocess with sed 's/"\([^"]\)" */\1\n/g' "dates" (provided your sed dialects supports \n as an escape for newline); or perhaps do
sed "s#\"\([^\"]*\)\" *#$home_url\\1$frrr_file\\
#g" "dates"
again with some reservation for differences between sed dialects. In the worst case, maybe switch to Perl, which actually brings some relief to the backslashitis, but requires new backslashes in other places:
perl -pe "s#\"(\d+)\" *#$home_url\$1$frrr_file\n#g" "dates"
But probably a better solution is to change your R script so it doesn't produce wacky output. Or just don't use R in the first place. See e.g. https://stackoverflow.com/a/3494814/874188 for how to get dates from Perl. Or if you have GNU date, try
#!/bin/bash
start=$(date -d "$1" +%s)
end=$(date -d "$2" +%s)
for ((i=start; i<=end; i+=60*60*24)); do
date -d "#$i" +%Y%m%d
done
(If you are on a Mac or similar, the date program won't accept a date as an argument to -d and you will have to use slightly different syntax. It's not hard to do but this answer has too many speculations already.)

Using wildcard in bash-variable for echo and grep [duplicate]

This question already has answers here:
How to grep asterisk without escaping?
(2 answers)
When to wrap quotes around a shell variable?
(5 answers)
Closed 3 years ago.
I am trying to come up with a function that searches a given file for a given pattern. If the pattern is not found it shall be appended to the file.
This works fine for some cases, but when my pattern includes special characters, like wildcards (*) this method fails.
Patterns that work:
pattern="some normal string"
Patterns that don't work:
pattern='#include "/path/to/dir/\*.conf"'
This is my function:
check_pattern() {
if ! grep -q "$1" $2
then
echo $1 >> $2
fi
}
I' calling my function like this:
check_pattern $pattern $target_file
When escaping the wildcard in my pattern variable to get grep running correctly echo interprets the \ as character.
So when I run my script a second time my grep does not find the pattern and appends it again.
To put it simple:
Stuff that gets appended:
#include "/path/to/dir/\*.conf"
Stuff that i want to have appended:
#include "/path/to/dir/*.conf"
Is there some way to get my expected result without storing my echo-pattern in a second variable?
Use
grep -f
and
check_pattern "$pattern" "$target_file"
Thanks all, I got it now.
Using grep -F as pointed out by Gem Taylor in combination with calling my function like this check_pattern "$pattern" "$target_file" did the tricks.

A bash script to concatenate strings and editing a line in the same file using a bash script

I was trying to write a bash script to run several cases where the steps are the following:
reading a .txt file in an array
concatenate each element of an array with a specific string
opening several screens in detached mode and then changing a specific file
(in this case seq.txt) and run a few commands
The code is as follows:
COUNTER=0
readarray t1Gad < t1_gad.txt
while [ $COUNTER -lt 5 ]; do
NUM_DEVICE=8
DEVICE_NO=`expr $COUNTER % $NUM_DEVICE`
string1='objGa_'
string2=${t1Gad[$COUNTER]}
#string2='hello'
string=$string1$string2
echo $string2
echo $string1
echo $string
screen -d -m -S "$COUNTER" bash -c 'cd $HOME/Downloads && sed -i '2s/.*/$string/' seq.txt && cat seq.txt; exec sh'
let COUNTER=COUNTER+1
done
The funny thing is that if I replace string2 with a fixed string, it is working fine, but it is not working with array elements of the array.
I would be happy if someone explains it to me. I am very new to bash scripting but desperately want to learn this very useful but ugly scripting language.
I found the problem but I do not know how to solve it. While doing the string concatenation, say e.g.
string1="objGa_"
string="$string1${t1Gad[$COUNTER]}"
(say 192 is that specific element)and in that case it is substituting
objGa_192 ' '
inside the screen. I do not know how to get rid of that space and where it is coming from.
t1_gad.txt:
100
200
300
400
500
seq.txt:
abc
objw
cde
efg
xyz
readarray places a whitespace at end of each variable, which seems to be causing problems with the sed command.
One elegant way get rid of a trailng whitespace is to pipe the string to xargs:
string2=$(echo ${t1Gad[$COUNTER]} | xargs)

Replace a string in shell script by a string with special character [duplicate]

This question already has answers here:
Is it possible to escape regex metacharacters reliably with sed
(4 answers)
Closed 7 years ago.
I am trying to replace a string inside a shell script by a string with special character:
name="test&commit"
echo "{{name}}" | sed "s/{{name}}/$name/g"
and the result I am getting is
test{{name}}commit
I know that adding \ before & will make it work but the name param is given by the user so I'd like my code to somoehow predict that. Do anybody knows how to achieve this ?
You need to use another sed command to add a backslash before all the special characters in the given input string.
$ name="test&commit"
$ name1=$(sed 's/[^[:alpha:][:digit:][:blank:]]/\\&/g' <<<"$name")
$ echo $name1
test\&commit
$ echo "{{name}}" | sed "s/{{name}}/$name1/g"
test&commit
It would be minimized as,
$ name="test&commit"
$ echo "{{name}}" | sed "s/{{name}}/$(sed 's/[^[:alpha:][:digit:][:blank:]]/\\&/g' <<<"$name")/g"
test&commit
Slightly change way of providing the template and values:
$ cat template
Dear {{name}}
I hope to see you {{day}}.
(template is a file with {{var}}, to be instantiated with values)
$ name='Mary&Susan' day=tomorrow perl -pe 's/{{(\w+)}}/$ENV{$1}/g' template
Dear Mary&Susan,
I hope to see you tomorrow.
In perl you can turn off expressions with \Q \P.
I fill the vars template, placeholder and name:
$ echo "template=$template"
template=The name is {{name}} and we like that.
$ echo "placeholder=$placeholder"
placeholder={{name}}
$ echo "name=$name"
name=test&commit
The replacement will be performed with
$ echo $template | perl -pe 's/\Q'$placeholder'\E/'$name'/g'
The name is test&commit and we like that.

find a substring inside a bash variable [duplicate]

This question already has answers here:
Extract substring in Bash
(26 answers)
Closed 9 years ago.
we were trying to find the username of a mercurial url:
default = ssh://someone#acme.com//srv/hg/repo
Suppose that there's always a username, I came up with:
tmp=${a#*//}
user=${tmp%%#*}
Is there a way to do this in one line?
Assuming your string is in a variable like this:
url='default = ssh://someone#acme.com//srv/hg/repo'
You can do:
[[ $url =~ //([^#]*)# ]]
Then your username is here:
echo ${BASH_REMATCH[1]}
This works in Bash versions 3.2 and higher.
You pretty much need more that one statement or to call out to external tools. I think sed is best for this.
sed -r -e 's|.*://(.*)#.*|\1|' <<< "$default"
Not within bash itself. You'd have to delegate to an external tool such as sed.
Not familiar with mercurial, but using your url, you can do
echo 'ssh://someone#acme.com/srv/hg/repo' |grep -E --only-matching '\w+#' |cut --delimiter=\# -f 1
Probably not the most efficient way with the two pipes, but works.

Resources