Xidel extract number/float - xpath

I would like to extract number/float value from this code using Xidel:
<p class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
304.00
<span class="woocommerce-Price-currencySymbol">
€
</span>
</bdi>
</span>
</p>
I am trying the following command:
xidel -s '<p class="price"><span class="woocommerce-Price-amount amount"><bdi>304.00 <span class="woocommerce-Price-currencySymbol">€</span></bdi></span></p>' -e "//p[#class='price']/translate(normalize-space(substring-before(., '€')),' ','')"
The translate command should replace space, but it's not working, in the output I still see one space after number "304.00_".

You're going to have to process the no-break space separately with one of the following queries:
-e "//p[#class='price']/span/bdi/substring-before(text(),' ')"
-e "//p[#class='price']/span/bdi/translate(text(),x:cps(160),'')"
-e "//p[#class='price']/span/bdi/replace(text(),' ','')"
You can't use normalize-space(), because...
https://www.w3.org/TR/xpath-functions-31/#func-normalize-space:
The definition of whitespace is unchanged in [Extensible Markup Language (XML) 1.1 Recommendation]. It is repeated here for convenience:
S ::= (#x20 | #x9 | #xD | #xA)+
...it processes spaces, tabs, carriage returns and line feeds, but not no-break spaces:
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[test]
xidel -s "<x> test </x>" -e "x'[{x}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{normalize-space(x)}]'"
[ test ]
xidel -s "<x> test </x>" -e "x'[{translate(x,' ','')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,x:cps(160),'')}]'"
xidel -s "<x> test </x>" -e "x'[{replace(x,' ','')}]'"
[test]
Btw, an alternative to get the price on that website:
xidel -s "https://kenzel.sk/produkt/bicykle/zivotny-styl/signora/" -e ^"^
parse-json(^
//body/script[#type='application/ld+json']^
)//priceSpecification/price^
"
304.00

Try changing the xpath expression to
-e "substring-before(//p[#class='price']//bdi/normalize-space(.),' ')"
or
-e "substring-before(//p[#class='price']//bdi/.,' ')"
or use tokenize()
-e "tokenize(//p[#class='price']//bdi/.,' ')[1]"
The output should be
'304.00'

Related

How can I pass external variable with Xidel tool?

I have a XQuery expression stored in a file
(: file process.xq :)
declare variable $var external;
...
and use it with Xidel.
xidel --silent --color=never --xml --xquery "$(< process.xq)" my.xml
How can I pass such external variable?
it seems be not possible with "external"
but it can be achieved somehow with extra query expression....
xidel --silent --color=never --xml --xquery "foo := bar" --xquery "$(< process.xq)" my.xml
and just use the $foo as "usual"
(: file process.xq :)
$foo
...
I'm no XQuery expert, but at least for xidel this is how you declare a variable in a query-file:
declare variable $var := "external";
()
And don't forget the (), or you'll get err:XPST0003: Unexpected query end.
Then to load the query-file:
$ xidel -s --extract-file=process.xq -e '$var'
#or
$ xidel -s -e #process.xq -e '$var'
external

Problem Converting Morse Code to Text using sed

I am kinda new to bash scripting so bear with me.
I have a task to convert text to morse code and vice versa from a given .txt file.
I successfully did the first part but when i try to run the second script to translate from morse to text i get this error:
sed: can't read s/.-/A/g: No such file or directory
S
My original code:
#!/bin/bash
sed 's/.-/A/g' -e 's/-.../B/g' -e 's/-.-./C/g' -e 's/-../D/g' -e 's/./E/g' -e 's/..-./F/g' -e 's/-
-./G/g' -e 's/..../H/g' -e 's/../I/g' -e 's/.---/J/g' -e 's/-.-/K/g' -e 's/.-../L/g' -e 's/--/M/g' -e
's/-./N/g' -e 's/---/O/g' -e 's/.--./P/g' -e 's/--.-/Q/g' -e 's/.-./R/g' -e 's/.../S/g' -e 's/-/T/g'
-e 's/..-/U/g' -e 's/...-/V/g' -e 's/.--/W/g' -e 's/-..-/X/g' -e 's/-.--/Y/g' -e 's/--../Z/g' -e
's/...../1/g' -e 's/....-/2/g' -e 's/...--/3/g' -e 's/....-/4/g' -e 's/...../5/g' -e 's/-..../6/g' -e
's/--.../7/g' -e 's/---../8/g' -e 's/----./9/g' -e 's/-----/0/g' Morse_Text.txt
The Morse_Text.txt file:
-- --- .-. … . -.- --- -.-. .
You have to escape (prepend a \) the . characters, as they have a special meaning in regular expressions.
Like this sed 's/\.-/A/g
Have a look at https://en.wikipedia.org/wiki/Regular_expression
Also you are missing the first -e after your sed command:
░ tamasgal#greybox.local:~
░ 16:37:45 > echo '... --- ...'| sed -e 's/\.\.\./S/g' -e 's/---/O/g'
S O S
Btw. you will hit more issues due to the fact sed works, it will replace parts of other morse codes during the stages.
You should consider including a leading space or new line after each group of morse command.
Here is an example in stages (decoding SOS9):
░ tamasgal#greybox.local:~
░ 16:43:47 > echo '... --- ... ----.'| sed -e 's/\.\.\./S/g'
S --- S ----.
░ tamasgal#greybox.local:~
░ 16:43:56 > echo '... --- ... ----.'| sed -e 's/\.\.\./S/g' -e 's/---/O/g'
S O S O-.
░ tamasgal#greybox.local:~
░ 16:44:00 > echo '... --- ... ----.'| sed -e 's/\.\.\./S/g' -e 's/---/O/g' -e 's/----\./9/g'
S O S O-.
One solution is to include a space or end of line character ($) to each group, like this:
░ tamasgal#greybox.local:~
░ 16:44:02 > echo '... --- ... ----. '| sed -e 's/\.\.\.[ $]/S/g' -e 's/---[ $]/O/g' -e 's/----\.[ $]/9/g'
SOS9
Btw. this is by far an ugly solution, but I'll not do your homework ;)
You have to make sure that the largest matches are replaced first.
It is convenient to store the commands in morse.sed like this:
s/-----/0/g
s/\.\.\.\.\./5/g
s/\.\.\.\.\./1/g
s/\.\.\.\.-/4/g
s/\.\.\.\.-/2/g
s/\.\.\.--/3/g
s/-\.\.\.\./6/g
s/--\.\.\./7/g
s/---\.\./8/g
s/----\./9/g
s/\.\.\.\./H/g
s/\.\.\.-/V/g
s/\.\.-\./F/g
s/\.-\.\./L/g
s/\.--\./P/g
s/\.---/J/g
s/-\.\.\./B/g
s/-\.\.-/X/g
s/-\.-\./C/g
s/-\.--/Y/g
s/--\.\./Z/g
s/--\.-/Q/g
s/\.\.\./S/g
s/\.\.-/U/g
s/\.-\./R/g
s/\.--/W/g
s/-\.\./D/g
s/-\.-/K/g
s/--\./G/g
s/---/O/g
s/\.\./I/g
s/\.-/A/g
s/-\./N/g
s/--/M/g
s/\./E/g
s/-/T/g
Now you can use
sed -Ef morse.sed Morse_Text.txt

Use XMLStarlet to insert a single value too long to fit on a command line

Suppose I have an xml file:
<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<map>
<string name="a"></string>
</map>
And I want to set the value of string with attribute a with something big:
$ xmlstarlet ed -u '/map/string[#name="a"]' -v $(for ((i=0;i<200000;i++)); do echo -n a; done) example.xml > o.xml
This will result in bash error "Argument list is too long". I was unable to find option in xmlstarlet which accept result from a file. So, how would I set the value of xml tag with 200KB data+?
Solution
After trying to feed chunks into the xmlstarlet by argument -a (append), I realized that I am having additional difficulties like escape of special characters and the order in which xmlstarlet accepts these chunks.
Eventually I reverted to simpler tools like xml2/sed/2xml. I am dropping the code as a separate post below.
This, as a workaround for your own example that bombs because of the ARG_MAX limit:
#!/bin/bash
# (remove 'echo' commands and quotes around '>' characters when it looks good)
echo xmlstarlet ed -u '/map/string[#name="a"]' -v '' example.xml '>' o.xml
for ((i = 0; i < 100; i++))
do
echo xmlstarlet ed -u '/map/string[#name="a"]' -a -v $(for ((i=0;i<2000;i++)); do echo -n a; done) example.xml '>>' o.xml
done
SOLUTION
I am not proud of it, but at least it works.
a.xml - what was proposed as an example in the starting post
source.txt - what has to be inserted into a.xml as xml tag
b.xml - output
#!/usr/bin/env bash
ixml="a.xml"
oxml="b.xml"
s="source.txt"
echo "$ixml --> $oxml"
t="$ixml.xml2"
t2="$ixml.xml2.edited"
t3="$ixml.2xml"
# Convert xml into simple string representation
cat "$ixml" | xml2 > "$t"
# Get the string number of xml tag of interest, increment it by one and delete everything after it
# For this to work, the tag of interest should be at the very end of xml file
cat "$t" | grep -n -E 'string.*name=.*a' | cut -f1 -d: | xargs -I{} echo "{}+1" | bc | xargs -I{} sed '{},$d' "$t" > "$t2"
# Rebuild the deleted end of the xml2-file with the escaped content of s-file and convert everything back to xml
# * The apostrophe escape is necessary for apk xml files
sed "s:':\\\':g" "$s" | sed -e 's:^:/map/string=:' >> "$t2"
cat "$t2" | 2xml > "$t3"
# Make xml more readable
xmllint --pretty 1 --encode utf-8 "$t3" > "$oxml"
# Delete temporary files
rm -f "$t"
rm -f "$t2"
rm -f "$t3"

How do I include a variable in a ksh command subtitution?

I'm trying to find a number of lines that match a regex pattern in grep received as a variable. When I do the grep with the pattern directly in the command substitution, it works. When I use a variable for the pattern, it doesn't.
#!/bin/bash
pattern="'^\\\".*\\\"$'"
echo "pattern : $(echo $pattern)"
NB=$(grep -c -E -v -e ${pattern} abc.txt)
NB2=$(grep -v -c -E -e '^\".*\"$' abc.txt)
echo " -> $NB , $NB2"
Besides what's in the code, I've tried:
NB=$(grep -c -E -v -e $(echo $pattern) abc.txt)
No success.
cmd="grep -c -E -v -e ${pattern} abc.txt"
NB="$($cmd)"
No success.
In the example, abc.txt file contains 3 lines:
"abc"
"abc
abc"
The pattern in the variable seems ok:
pattern : '^\".*\"$'
I'm expecting that the 2 numbers in NB and NB2 are the same. If you look in the code, the actual result is:
pattern : '^\".*\"$'
-> 3 , 2
I expect:
pattern : '^\".*\"$'
-> 2 , 2
NB2=$(grep -v -c -E -e '^\".*\"$' abc.txt)
If that works, then assign that exact regex to $pattern. Don't add more backslashes and quotes.
pattern='^\".*\"$'
It's always a good idea to quote variable expansions to prevent unwanted wildcard expansion and word splitting.
NB=$(grep -c -E -v -e "${pattern}" abc.txt)
# ^ ^

BASH sed does not evaluate array

So i have an array with filenames.
for i in "${!array_FILE[#]}"; do
printf "%s\t%s\n" "$i" "${array_FILE[$i]}"
sed -e "s/\${USERNAME_VAR}/$USERNAME_VAR/" -e "s/\${USERNAME}/$USERNAME/" template > array_FILE[$i].js
done
The printf works and gives me
0 app_calander
1 app_contacts
2 app_search
3 app_index
but the files created are:
array_FILE[0].js
array_FILE[1].js
array_FILE[2].js
array_FILE[3].js
instead of
app_calander.js
app_contacts.js
app_search.js
app_index.js
If you can help me i appreciate it, it has to be changed by index because i have two array and i need to change values at the same index.
My temporary solution is:
filename="${array_FILE[$i]}"
sed -e "s/\${USERNAME_VAR}/$USERNAME_VAR/" -e "s/\${USERNAME}/$USERNAME/" template > $filename.js
but i was wondering if there is a better way!
REAL ISSUE
Now the real issue is when i try to pass a URL
for i in "${!array_FILE[#]}"; do
#printf "%s\t%s\n" "$i" "${array_FILE[$i]}"
filename="${array_FILE[$i]}"
url="${array_URL[$i]}"
sed -e "s/\${USERNAME_VAR}/$USERNAME_VAR/" -e "s/\${USERNAME}/$USERNAME/" -e "s/\${URL}/'$url'/" template > $filename.js
done
sed: -e expression #3, char 18: unknown option to `s'
sed: -e expression #3, char 18: unknown option to `s'
sed: -e expression #3, char 18: unknown option to `s'
sed: -e expression #3, char 18: unknown option to `s'
sample value of url is URL:https://example.com/app/index.html
EDIT FOR CLARIFICATION
data.txt
USERNAME_VAR:input_username
USERNAME:user01
PASSWORD_VAR:input_password
PASSWORD:password1
SUBMIT:submit
AUTH:cas
URL:https://example.com/app/calander.html
FILE:app_calander
URL:https://example.com/app/contacts.html
FILE:app_contacts
URL:https://example.com/app/search.html
FILE:app_search
URL:https://example.com/app/index.html
FILE:app_index
template
${USERNAME_VAR} = ${USERNAME}
${SUBMIT} IS TRUE
${PASSWORD_VAR} = ${PASSWORD}
${AUTH} = AUTH IS
URL TO HIT IS ${URL}
inject.sh
#!/bin/bash
USERNAME_VAR=($(grep -o 'USERNAME_VAR.*' data.txt | cut -f2- -d':'))
USERNAME=($(grep -o 'USERNAME.*' data.txt | grep -v 'VAR.*' | cut -f2- -d':'))
echo $USERNAME_VAR
echo $USERNAME
array_URL=($(grep -o 'URL.*' data.txt | cut -f2- -d':'))
array_FILE=($(grep -o 'FILE.*' data.txt | cut -f2- -d':'))
for i in "${!array_FILE[#]}"; do
#printf "%s\t%s\n" "$i" "${array_FILE[$i]}"
FILENAME="${array_FILE[$i]}"
URL="${array_URL[$i]}"
echo $URL
sed -e "s/\${URL}/$URL/" -e "s/\${USERNAME_VAR}/$USERNAME_VAR/" -e "s/\${USERNAME}/$USERNAME/" template > $FILENAME.js
done
Continuing from the comment, you could do something like:
for i in "${!array_FILE[#]}"; do
#printf "%s\t%s\n" "$i" "${array_FILE[$i]}"
filename="${array_FILE[$i]}"
url="${array_URL[$i]}"
sed -e "s#\${USERNAME_VAR}#$USERNAME_VAR#" \
-e "s#\${USERNAME}#$USERNAME#" \
-e "s#\${URL}#$url#" template > $filename.js
done
The answer to the first question is staring right at you: "${array_FILE[$i]}" is obviously different from "array_FILE[$i]"
To understand the "REAL ISSUE", just look at the error messages. They are telling you the problem is with the third sed expression, which assumes that $url does not have a "/" in it.
Unless you are certain that $USERNAME, $USERNAME_VAR and $url do not have "/" in them, then those sed commands will not work in the way you seem to expect.

Resources