Hi I have below markup structure. I just grabbed one column from the entire table.
What I'm trying to do here is that within td with "mon" class, all other tags would be hidden other than "monTime" class (which would be done by CSS). Then all "staffBox" div should be re-sorted according to value in p tag with "monTime" class
So in below example, the order should be Vanessa, Adele then Zoe (naturally sorted by their starting time in "monTime" )
I don't mind if I need to change class name structure or whatsoever
<td class="mon">
<div class="staffBox">
<h4>Adele</h4>
<p class="monTime">7AM - 7AM</p>
<p class="tueTime">7AM - 6AM</p>
<p class="wedTime">12AM - 5AM</p>
<p class="thuTime">8AM - 12AM</p>
<p class="friTime">6AM - 12AM</p>
<p class="satTime">12AM - 10AM</p>
<p class="sunTime">12AM - 9AM</p>
</div>
<div class="staffBox">
<h4>Zoe</h4>
<p class="monTime">1PM - 6PM</p>
<p class="tueTime"> - </p>
<p class="wedTime"> - </p>
<p class="thuTime"> - </p>
<p class="friTime"> - </p>
<p class="satTime"> - </p>
<p class="sunTime"> - </p>
</div>
<div class="staffBox">
<h4>Vanessa</h4>
<p class="monTime">3AM - 6AM</p>
<p class="tueTime"> - </p>
<p class="wedTime"> - </p>
<p class="thuTime"> - </p>
<p class="friTime"> - </p>
<p class="satTime"> - </p>
<p class="sunTime"> - </p>
</div>
</td>
Have a look at the jQuery tablesorter plugin, it's what I always use for sorting.
Related
I am using : https://github.com/mccarlosen/laravel-mpdf/issues to generate pdf ,
all ok except when i use multi footer ,
now the footer show only on page 1 (MyFooter1) and 2 (MyFooter2).
I need to show staring page2 on all pages (MyFooter2) .I am finding the correct syntax , below my code
I used <sethtmlpagefooter name="MyFooter2" page="ON" value="1" write="true" /> to show the footer , it show only the footer only on the page where it exist .
<!DOCTYPE html>
<html>
<head>
<title>My title</title>
</head>
#include('pdf.css2')
<body>
<htmlpagefooter name="MyFooter1" style="display:none">
<div class="print-footer">
<div class="footer-section-1" >
<div style="width:20% ; float:left; padding-right: 5px;">
My company name
</div>
<div style="width:20% ;float:left; padding-right: 5px;">
<div> <b>T</b><span> : +98 123 123</span></div>
<div> <b>T/F</b><span>: +98 123 123</span></div>
<div> <b>Email</b><span>: cdoom#cc.co</span></div>
</div>
<div style="width:35% ; float:left; padding-right: 5px;">
<div> <b>Head Office</b></div>
<div> <span>test test,</span> </div>
<div> <span>Road, test, </span></div>
<div> <span> USA</span></div>
<div> <span>P.O. Box 211265 </span></div>
</div>
<div style="width:11% ; float:left; padding-right: 5px;">
<div><b>Branches</b></div>
<div><span>B1</span></div>
<div><span>B2</span></div>
</div>
<div style=" float:left; padding-right: 20px;padding-top: 20px;">
WE ARE THE BEST
</div>
</div>
</div>
</htmlpagefooter>
<htmlpagefooter name="MyFooter2" style="display:none">
<div class="print-footer">
My company name
</div>
</htmlpagefooter>
<div class="page1" >
<div style=" text-align: center ;" >
<div class="logo">
#include('pdf.company logo')
</div>
<div class="title" >
<span class="span-title" > <b>INDIVIDUAL OFFER </b> </span>
<div class="div-name" ><span class="span-name" >client </span></div>
<div>
<div class="info" style="width:50% ;float:left; padding-right: 5px;">DATE: December 2, 2020</div>
<div class="info" style="float:right; padding-right: 5px;">REFERENCE NUMBER: vd 123 xl</div>
</div>
</div>
</div>
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<sethtmlpagefooter name="MyFooter1" value="1" />
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<sethtmlpagefooter name="MyFooter2" page="ON" value="1" write="true" />
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
</body>
</html>
in CSS , I used this :
#page {
footer: html_MyFooter2;
margin-bottom: 170px;
}
#page :first {
footer: html_MyFooter1;
}
and in HTML I removed :
<sethtmlpagefooter name="MyFooter1" value="1" />
<sethtmlpagefooter name="MyFooter2" value="1" />
This fix the issue and help me ,
refer to :
Mpdf different header for first page
3rd answer (by : Ningappa )
I would like to parse schema data using XPATH.
Here's a simple structure.
<div itemscope itemtype="http://www.schema.org/Product">
<div itemscope itemtype="http://www.schema.org/Person">
<span itemprop="birthday" datetime="2009-05-10">May 10th 2009</span>
</div>
<div itemprop="name"> Product name </div>
<div itemprop="offers" itemscope itemtype="https://schema.org/Offer">
<span itemprop="price" content="500.00"> USD 500 </span>
</div>
</div>
The result I would like to parse is like this:
1. Category: http://www.schema.org/Product
v name: Product name
v Offers
- price: USD 500
2. Category: http://www.schema.org/Person
v birthday: May 10th 2009
To categorize "http://www.schema.org/Product" and "http://www.schema.org/Person", I used this code:
var category = $x("//*[#itemtype and not(#itemprop)]");
So category[0]:
<div itemscope itemtype="http://www.schema.org/Product">
<div itemscope itemtype="http://www.schema.org/Person">
<span itemprop="birthday" datetime="2009-05-10">May 10th 2009</span>
</div>
<div itemprop="name"> Product name </div>
<div itemprop="offers" itemscope itemtype="https://schema.org/Offer">
<span itemprop="price" content="500.00"> USD 500 </span>
</div>
</div>
Category[1]:
<div itemscope itemtype="http://www.schema.org/Person">
<span itemprop="birthday" datetime="2009-05-10">May 10th 2009</span>
</div>
Before parsing itemprop, I have to purge this one on category[0] to prevent duplicated data,
...
<div itemscope itemtype="http://www.schema.org/Person">
<span itemprop="birthday" datetime="2009-05-10">May 10th 2009</span>
</div>
...
How can I exclude those things on category[0]?
I would like to exclude this one under category[0]->
Final expression I would like to make:
Select category[0] not select ([contains(#itemtype,'schema.org/') and not(#itemprop)]/descendant-or-self::*)
Please shed light on the matter.
Thank you:)
I'm trying to scrape user review data from a website. I hope to have a 2 column data (ratings and reviews) at the end.
Here is a sample xml file that emulates my scraping problem. I have tried it on https://www.freeformatter.com/xpath-tester.html#ad-output.to get the outputs.
<root>
<div class="user-review">
<div class="rating"> 5,0 </div>
<p class="review-content"> Reiew text of item/movie.
<span class="details">
<span class="details-header">Detail: </span>
<span class="details-content">Some details to emphasis</span>
</span>
Continue to review
</p>
</div>
<div class="user-review">
<div class="rating"> 4,0 </div>
<p class="review-content">Reiew text of item/movie.
</p>
</div>
<div class="user-review">
<div class="rating"> 4,0 </div>
<p class="review-content">Reiew text of item/movie.
</p>
</div>
</root>
I can get 3 rating values with query below.
/root/div/div[#class="rating"]/text()
Output:
Text=' 5,0 '
Text=' 4,0 '
Text=' 4,0 '
When I try to get the review part the first text is divided into 2 sections. Because of that I have two different sized lists(3 sized ratings and 4 sized reviews) and cannot match reviews with ratings
//p[#class="review-content"]/text()
Output:
Text=' Reiew text of item/movie.
'
Text='
Continue to review
'
Text='Reiew text of item/movie.
'
Text='Reiew text of item/movie.
Can anybody help me to get one of my expected ouputs?
Expected output1:
Text=' Reiew text of item/movie.
Continue to review
'
Text='Reiew text of item/movie.
'
Text='Reiew text of item/movie.
Expected output2:
Text=' Reiew text of item/movie. Some details to emphasis
Continue to review
'
Text='Reiew text of item/movie.
'
Text='Reiew text of item/movie.
Try this, sel is here selector, in your case may be response
tags = sel.xpath('//p[#class="review-content"]')
reviews = []
for tag in tags:
text = " ".join(tag.xpath('.//text()').extract())
reviews.append(text)
You'll have to loop over div elements with user-review class and extract the review content from each of these. If you want a one-liner, look at this:
import scrapy
text = """
<root>
<div class="user-review">
<div class="rating"> 5,0 </div>
<p class="review-content"> Reiew text of item/movie.
<span class="details">
<span class="details-header">Detail: </span>
<span class="details-content">Some details to emphasis</span>
</span>
Continue to review
</p>
</div>
<div class="user-review">
<div class="rating"> 4,0 </div>
<p class="review-content">Reiew text of item/movie.
</p>
</div>
<div class="user-review">
<div class="rating"> 4,0 </div>
<p class="review-content">Reiew text of item/movie.
</p>
</div>
</root>
"""
selector = scrapy.Selector(text=text)
review_content = [review.xpath('normalize-space(.//p[#class="review-content"])').extract_first() for review in selector.xpath('//div[#class="user-review"]')]
<tr><td class=term>1st param</td>
<td>PUTIN
<div class='info-icon'>
<a href='#' onmouseover='show_pd(351);' onmouseout='hide_pd(351);' id='info-icon-351'></a>
</div>
<div id='pd-351' style='display: none; position: absolute;'>
<b>СПРАВКА</b>
<br /><br />
<P align=justify><NOBR><STRONG>ABS</STRONG></NOBR>bla-bla-bla text</P>
<P align=justify>bla-bla-bla text 2</P>
<P align=justify>bla-bla-bla text 3</P>
<P align=justify>bla-bla-bla text 4</P>
</div>
</td>
I need extract only "PUTIN".
Now I'm on
//td[#class="term"][contains(text(), "1st param")]/following-sibling::td/[not(self::p)]
With some adjustments to your XML following XPath
//td[#class="term"][contains(text(), "1st param")]/following-sibling::td/node()[1]
has the output PUTIN
Adjustments were to change <td class=term> into <td class="term"> and all <P align=justify> into <P align="justify"> (maybe not necessary for your settings but was required for the XPath evaluator I just used).
I receive an html like that below from a server. I rebuild the textual part by using the XPath exp #"//text()" and appending the "nodeContent" value to a string. The code is something like this:
for (int i=2; i<[resultXPathQuery count]; i++) {
[mytext appendString:[[resultXPathQuery objectAtIndex:i] objectForKey:#"nodeContent"]];
[mytext appendString:#"\n"];
}
I obtain:
Line 1
line 2
line 3
line 4
How could I build the textual part also considering the empty node?
I would to obtain:
Line 1
line 2
line 3
line 4
<html><head><title>A title</title><style type="text/css">
ol{margin:0;padding:0}p{margin:0}
.c0{font-size:12pt;background-color:#ffffff;font-family:Times New Roman}
.c6{width:432.0pt;background-color:#ffffff;padding:72.0pt 90.0pt 72.0pt 90.0pt}
.c7{color:#aaaaaa;font-family:Times New Roman}
.c3{color:#0000ee;text-decoration:underline}
.c5{color:inherit;text-decoration:inherit}
.c2{font-size:12pt;font-family:Times New Roman}
.c4{height:12pt}.c1{direction:ltr}
body{color:#000000;font-size:12pt;font-family:Times New Roman}
h1{padding-top:12.0pt;line-height:1.0;text-align:left;color:#000000;font-size:24pt;font- family:Times New Roman;font-weight:bold;padding-bottom:12.0pt}
h2{padding-top:11.25pt;line-height:1.0;text-align:left;color:#000000;font-size:18pt;font-family:Times New Roman;font-weight:bold;padding-bottom:11.25pt}
h3{padding-top:12.0pt;line-height:1.0;text-align:left;color:#000000;font-size:14pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.0pt}
h4{padding-top:12.75pt;line-height:1.0;text-align:left;color:#000000;font-size:12pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.75pt}
h5{padding-top:12.75pt;line-height:1.0;text-align:left;color:#000000;font-size:9pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.75pt}
h6{padding-top:18.0pt;line-height:1.0;text-align:left;color:#000000;font-size:8pt;font-family:Times New Roman;font-weight:bold;padding-bottom:18.0pt}</style>
</head>
<body class="c6">
<p class="c1"><span class="c2">A title</span></p>
<p class="c1 c4"><span class="c2"></span></p>
<p class="c4 c1"><span class="c2"></span></p>
<p class="c1"><span class="c7">Line 1</span></p>
<p class="c1"><span class="c7">line 2</span></p>
<p class="c4 c1"><span class="c7"></span></p>
<p class="c1"><span class="c7">line 3</span></p>
<p class="c4 c1"><span class="c7"></span></p>
<p class="c4 c1"><span class="c7"></span></p>
<p class="c3 c2"><span class="c1"></span></p>
<p class="c1"><span class="c7">line 4</span></p>
</body></html>
EDIT
Really, I noticed that the html can be more "complicated", so it's not enough selecting all the span elements or p elements. Moreover, more span elements can appear in the same p element, so in that case I have not to create a new line in my string.
This is the body of a more complicated returned html:
<body class="c13">
<p class="c5"><span>gfgfgfd</span></p>
<p class="c1"><span></span></p>
<p class="c5 c10"><span>ghhgfhgfh hghg hgkfhjgk ghjgkh ghjgjhg gjhjg gjhj gjhgjhgjhg gfhjkgjg jghjgfhjgf fghfj jghfj fghjggf jhgjgjgkjg</span></p>
<p class="c1 c10"><span></span></p>
<p class="c4"><span>gfgfgfd</span></p>
<p class="c4"><span>f</span></p>
<p class="c4">
<span>gfdgfdg</span>
<span class="c7">hg</span></p>
<p class="c4"><span class="c7">ghgfhgfh</span></p>
<p class="c4"><span class="c7">gfhgfhgf</span></p>
<p class="c5">
<span class="c7">hgfh </span>
<span class="c0">gfdgfg</span></p>
<p class="c5"><span class="c0">fgfdgfdgfd</span></p>
<p class="c5"><span class="c0">gdfgdfgfd</span></p>
<p class="c5"><span class="c0">gfgf</span></p>
<p class="c1"><span class="c0"></span></p>
<p class="c5"><span class="c0 c8"><a class="c12" href="http://www.google.com">www.google.com</a></span></p>
<p class="c1"><span class="c0"></span></p>
<p class="c5"><span class="c0">fgfdgfdg</span></p>
<p class="c5">
<span class="c0">fgffgfdgfg</span>
<span class="c0 c11">gfgfdgfd fgd fd</span>
<span class="c0">fdgfdg</span></p>
<p class="c5"><span class="c0">fgfdgfdgf</span></p>
<p class="c5"><span class="c0">gfd</span></p>
<p class="c5"><span class="c0">gfgf</span></p>
<p class="c1"><span class="c0"></span></p>
<p class="c5"><span class="c0 c8"><a class="c12" href="mailto:….">...</a></span></p>
<p class="c1"><span class="c0"></span></p>
<ol class="c9" start="1">
<li class="c3"><span class="c0">gfgfd</span></li>
<li class="c3"><span class="c0">gfdgfd</span></li>
<li class="c3"><span class="c0">gfdgfd</span></li>
<li class="c3"><span class="c0">gdfgfd</span></li>
</ol>
<p class="c1"><span class="c0"></span></p>
<p class="c5"><span class="c0">hgfhgf</span></p>
<p class="c5"><span class="c0">gfhgfh</span></p>
<p class="c5"><span class="c0">hgfhgf</span></p>
<p class="c1"><span class="c0"></span></p>
<ol class="c2" start="1">
<li class="c3"><span class="c0">gfhg</span></li>
<li class="c3"><span class="c0">hgfh</span></li>
<li class="c3"><span class="c0">hgf</span></li>
</ol>
<p class="c1"><span class="c0"></span></p>
<h1 class="c5 c15"><a name="h.kafwflosthlg"></a><span class="c7 c14">hgfhgfh</span></h1>
<p class="c1"><span class="c6"></span></p>
<p class="c1"><span class="c6"></span></p>
<p class="c1"><span class="c6"></span></p>
</body>
I'd need an XPath expression that selects p, h1, h2,..., h6, li elements, and considers the inner textual part in such way that new line and empty lines are properly detected.
For the example above you can use //span which will return all the <span> elements regardless of their contents. It looks like you are doing some other filtering also because //text() should also return your CSS block and A Title from the <title> and first <span>.
I would rather use a regex for this one:
Grab all the content between the body tags (you can also do that with XPath)
Replace </p> by </p>\n
Strip tags