Xpath get text with <br> - xpath

Lets say that I have html:
<div class="html">
<div class="offer">
text1
<br>
text2
<br>
text3
<br>
text4
<br>
text5
</div>
</div>
Im trying to get full text with xpath //div[#class='offer']/text(), but in result I havetext1.
I try to use it with [preceding-sibling::br], but result is the same.
What I need?
text1 text2 text3 text4 text5

I fix it using extract() method.

Related

Laravel pass dynamic varibles in loops

in my blade file I have below varables like :
text1
text2
text3
text4
text5
so in blade file I want to show in loop like
#for($i=0;$i>5;$i)
{{text[$i]}}
#endfor
So how can I archive that ?
Try:
{{${'text'.$i}}}
I hope, this helps you.
I think this could help:
text{{ $i }}
You have to use like this
#for($i=1;$i<6;$i++)
text{{$i}}
#endfor

Find Only 1st Level XPath Children

I have the following HTML code:
<div>
<div>
<span>test1</span>
</div>
<span>test2</span>
<span>test3</span>
<span>test4</span>
<div>
<span>test5</span>
</div>
<span>test6</span>
</div>
How can I select all span elements that are direct descendants of the 1st div. (Elements with innerText test2, test3, test4, test6) ?
This XPath will get you want
'//span[not(parent::div[parent::div])]'
xmllint --html --xpath '//span[not(parent::div[parent::div])]' test.html | sed -re 's%(</[^>]+>)%\1\n%g'
<span>test2</span>
<span>test3</span>
<span>test4</span>
<span>test6</span>

Multiple occurrences in sed substitution

I am trying to retrieve some data within a specific div tag in my html file.
My current html code is in the following format.
<div class = "class0">
<div class = "class1">
<div class = "class2">
some text some text
</div>
Some more text
</div>
Too much text
</div>
When I try to extract tag in just the div with class2, using the bash code
sed -e ':a;N;$!ba
s/[[:space:]]\+/ /g
s/.*<div class\="class2">\(.*\).*/\1/g' test.html > out.html
I get the output html file with the code as
some text some text </div> Some more text </div> Too much text
I want all the data after the first </div> to be removed but instead the final one is being replaced.
Can someone please elaborate my mistake.
You could do this in awk:
awk '/class2/,/<\/div>/ {a[++i]=$0}END{for (j=2;j<i;++j) print a[j]}' file
Between the lines that match /class2/ and /<\/div>/, write the contents to an array. At the end of the file loop through the array, skipping the first and last lines.
Instead of making an array, you could check for the first and last lines using a regular expression:
awk '/class2/,/<\/div>/ {if (!/class2|<\/div>/) print}' file
This works for retrieving text inside the div class = "class2" tags
#!/bin/bash
htmlcode='
<div class = "class0">
<div class = "class1">
<div class = "class2">
some text some text
</div>
Some more text
</div>
Too much text
</div>
'
echo $htmlcode |
sed -e's,<,\
<,g' |
grep 'div class = "class2"' |
sed -e's,>,>\
,g'|
grep -v 'div class = "class2"'

Parse HTML snippet with awk

I am trying to parse an HTML document with awk.
The document contains several <div class="p_header_bottom"></div blocks
<div class="p_header_bottom">
<span class="fl_r"></span>
287,489 people
</div>
<div class="p_header_bottom">
<span class="fl_r"></span>
5 links
</div>
I am using
awk '/<div class="p_header_bottom">/,/<\/div>/'
to receive all such div's.
How I can get 287,489 number from first one?
Actually awk '/<\/span>/,/people/' doesn't work correctly.
With gawk, and assuming that the only digits and commas within each <div> </div> block occur in the numeric portion of interest
awk -v RS='<[/]?div[^>]*>' '/span/ && /people/{gsub(/[^[:digit:],]/, ""); print}' file.txt

Nokogiri: how to parse text fragment?

I have such example:
html= <<EOT
<div>Some text1
<p>Some text2</p>
</div>
EOT
doc = Nokogiri::HTML(html)
puts doc.css('div').text
This makes:
Some text1
Some text2
But i need "Some text1" only
doc.css('div').children.first.text
# => "Some text1\n "
doc.css('div').children.first.text.rstrip
# => "Some text1"
One XPath expression and a strip will get you there:
some_text1 = doc.xpath('//div/text()[1]').text.strip

Resources