Jsoup - Not able to get desired output , Help required - xpath

I am new to JSOUP. Was trying a few exercise and came across a scenario where i was not able to fetch the product links from the below url.
original URL - https://www.amazon.co.jp/gp/new-releases/digital-text/2275256051/ref=zg_bsnr_2275256051_pg_1?ie=UTF8&pg=1
Pasted the selected node for reference
<div class="zg_title">
僕が本当に好きな和食
</div>
my Code
Elements ele = doc.select("div.zg_title > a ");
for (org.jsoup.nodes.Element element : ele)
{
System.out.println(element.toString());
}
Required Output
https://www.amazon.co.jp/%E5%83%95%E3%81%8C%E6%9C%AC%E5%BD%93%E3%81%AB%E5%A5%BD%E3%81%8D%E3%81%AA%E5%92%8C%E9%A3%9F-%E7%AC%A0%E5%8E%9F-%E5%B0%86%E5%BC%98-ebook/dp/B01LYCVBW3/ref=zg_bsnr_2275256051_1
I get the correct output with xpath - "//div[#class='zg_title']//a/#href"
How to do this with Jsoup.

Here it is:
Elements ele = doc.select("div.zg_title > a");
for (org.jsoup.nodes.Element element : ele) {
System.out.println(element.absUrl("href"));
}
Things to check:
CSS query (you have an additional space in the query);
if you want to retrieve the href attribute you should use the element.absUrl("href") method.

Related

Ckeditor in Drupal 8 : how to remove <span> tags if they don't have class attributes?

I'm using the "Allowed html tags" filter in Ckeditor - Drupal 8.
I want Ckeditor to keep <span> tags that have specific classes or IDs, and to remove if it has no attribute.
For example :
Keep span: <span class="apple">text sample</span>
Keep span : <span id="fruit">text sample</span>
Remove span : <span>text sample</span> -> text sample
Actually, when I configure a text format, I have this code in the allowed tags field :
<p><sup><sub><span id class="apple"><a href !href accesskey id rel target title>
It keeps <span> with IDs or wanted classes, but I cannot get rid of the unwanted <span> with no attribute.
Is there any way to solve this problem with code input?
Thanks in advance,
Emilie
So here is the custom module I wrote to make it work and to get around this major bug in CKEDITOR :
<?php
use Drupal\editor\Entity\Editor;
function MODULENAME_editor_js_settings_alter(array &$settings) {
foreach ($settings['editor']['formats'] as $name => $value) {
$settings['editor']['formats']['machine_name_of_your_text_editor_profile']
['editorSettings']['allowedContent'] =
'p sup h1 h2 h3' +
'span[!id];
span(!foo);
span(!bar);
span(!jane);
span(!doe);'
;}
}
Result : spans are totally deleted if there is no ID, or if you use a class that is not mentionned in this list (foo, bar, jane or doe). You must declare all elements you need to be displayed, because this config will overwrite all previous inputs in the ACF field.
For this solution, I was inspired by :
The ACF Custom doc : https://ckeditor.com/docs/ckeditor4/latest/examples/acfcustom.html
A tread about hook_editor_js_settings_alter : https://drupal.stackexchange.com/questions/268311/hook-editor-js-setting...
Note : Limit allowed HTML tags and correct faulty HTML filter (in /admin/config/content/formats) does not act consistently with the Ckeditor API. Only a part of the options are really implemented in this field, and uses of "!" don't work. This is why the solution provided uses "hook_editor_js_settings_alter".
function MODULENAME_editor_js_settings_alter(array &$settings) {
$formats = ['basic_html', 'full_html'];
foreach ($formats as $format) {
$settings['editor']['formats'][$format]['editorSettings']['allowedContent']['span']['attributes'] = '!class';
}
}
allowedContent is an array when loaded by Drupal. Instead of replacing it with a string, you can use the ACF rules to specify whether attributes are required. This allows the config from the UI to still apply.

nightwatch select current element's inner div

i'm new to nightwatch and was wondering if there's any good way to select the inner element of a current element and then get the text? Assuming i have the following..and i'm trying to retrieve the text inside (a) tags of each (li).
so i would like to get 'text to retrieve' and 'text to retrieve 2'.
...
<div class="mywrapperhere">
<ul>
<li>
<a>.....
<div>text to retrieve</div>
</a>
</li>
<li>
<a>.....
<div>text to retrieve 2</div>
</a>
</li>
<li>...
...
</div>
I'm thinking along these lines..
module.exports = {
'Demo test 1' : function (browser) {
....
//some sort of selector then gets from the anchor list
...'.mywrapperhere li a') : {
..
//for each element of the anchor..
{
//is there anyway to get it through something like
element.('div').innerHTML eg..
//or am i forced to use browser.execute( ...getElementsByTag method
//to achieve this?
}
}
browser.end();
}
};
Looking at the nightwatch api, i couldn't find anything allows me to do that. I'm particularly looking at the 'Element State' examples that doesn't seem to have a way for me to select the current element state's child element :
http://nightwatchjs.org/api/elementIdAttribute.html
The reason why i had to loop through the anchor tag level is because i'll need to retrieve a few more data besides the one from div tag, thanks!
You can use elementIdElement and elementIdText to get text from a child element. First you can get all the li elements by using .elements(). Then you use elementIdElement to get a child element. Then you can use elementIdText to get the text of this child element. Here is an example that will allow you to get the text of both list items in your snippet and log the values to the console.
browser.elements('css selector', 'li', function(listItems) {
listItems.value.forEach(function(listItem) {
browser.elementIdElement(listItem.ELEMENT, 'css selector', 'a', function(anchor) {
browser.elementIdText(anchor.ELEMENT, function(text) {
console.log(text.value);
});
});
}, browser); //have to pass in browser for scoping
});

How to refer Element using xpath inside multiple <li> that in turn contains <a> tag

How can i refer multiple elements present under li tag using xpath?
<div id="accordian">
<ul>
<li>
<h3 class="classroom"></h3>
<ul style="display: block;">
<li>name1</li>
<li>name2</li>
<li>name3</li>
<li>name4</li>
</ul>
</li>
i am using Selenium Webdriver, I tried following code to refer the element, but it returns a blank value.
List<WebElement> listelement=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
for(WebElement list: listelement)
{
System.out.println(list.getText());
}
List<WebElement> list=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
just add a tag at end of your xpath, that all this will work
//div[#id='accordian']/ul/li/ul/li/a"
*** This is a comment as I don't have access to Comments section ****
Hi,
Limit xpath till /ul and don't use /li. It will return list and then iterate over the child elements.
xpath("//div[#id='accordian']/ul/li/ul")
I doubt about the Xpath you tried, But below is the way you can achieve it.
List<WebElement> list=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li"));
System.out.println("No of names present="+ list.size());
// use of for loop for iteration
for(int i=0;i<list.size();i++)
{
System.out.println(list.get(i).getText());
}
System.out.println("-------------------------");
//use of for each for iteration
for(WebElement wb: list)
System.out.println(wb.getText());
Do getText() on a tag elements. I always prefer using css over xpath. So here is my solution,
By byCss = By.cssSelector("#accordian>ul>li>ul>li>a");
List<WebElement> listElement = driver.findElements(byCss);
for(WebElement list: listElement)
{
System.out.println(list.getText());
}
I got the same problem and struggled with it for few days. You can use href tag as well to fetch the elements. Also You can try using 'a' tag. It will be something like this:
List<WebElement> listelement=driver.findElements(By.xpath("//div[#id='accordian']/ul/li/ul/li/a"));
for(WebElement list: listelement) {
String name= list.getAttribute("href");
System.out.println(name);
}
---should be comment, but don't have enough reputation---
I tried your solution on given HTML,
for me it is working fine for chromedriver and firefox.(printing all four values from list)
for InternetExplorer driver i am not able to get values, but it it because listElement.size() is 0
You can try
element.getAttribute("value") or elem.getAttribute("innerHTML");
to check what is happening here.
swapnil, your code with xpath is working for me, I get all the 4 elements, still you can try this as well
List<WebElement> listElements = driver.findElements(By.tagName("a"));
for(WebElement a : listElements){
System.out.println(a.getText());
}

read class attribute in CasperJs

I'm wondering whether its possible to get the class value of a li item, the html looks something like this:
<div id="cardsdeck">
<ul id="cards">
<li id="card-0" class="card-image card-shown" .... >
......
I'm trying to get card-show out of the li.
I'm unsure if this is what you're trying to do, but to get and array of the classes that an element has, you can use:
document.querySelector('#card-0').className.split(' ');
However, if you're trying to get elements that have the card-shown class, then you can use:
document.querySelector('.card-shown');
Edit: better suited for your comment below:
casper.then(function() {
var num = 0;
var shown = this.evaluate(function isShown(k) {
return document.querySelector('#cards li.card-shown').id == ('card-'+k);
}, num);
console.log(shown);
})
This will look for an element with the card-shown class and then check to see if the id matches card-k, with k being a number.

How to perform click event on an element present in the anchor tag?

<div class="buttonClear_bottomRight">
<div class="buttonBlueOnWhite">
<a onclick="$find('{0}').close(true); callPostBackFromAlert();" href="#">Ok</a><div
class='rightImg'>
</div>
</div>
</div>
In the above code i wanted to click on Ok button present in the anchor tag.But an id is not generated because of which i cannot directly perform a click action. I tried a work around mentioned below.
IElementContainer elm_container = (IElementContainer)pw.Element(Find.ByClass(classname));
foreach (Element element in elm_container.Elements)
{
if (element.TagName.ToString().ToUpper() == "A")
{
element.Click();
}
}
But here elm_container returns null for intial instances due to which we cannot traverse through it. Is there any other easy method to do it ?
Try this...
Div div = browser.Div(Find.ByClass("buttonClear_bottomRight")).Div(Find.ByClass("buttonBlueOnWhite"));
Debug.Assert(div.Exists);
Link link = div.Link(lnk => lnk.GetAttributeValue("onclick").ToLower().Contains(".close(true)"));
Debug.Assert(link.Exists);
link.Click();
Hope it helps!
You can simply Click on the link by finding its text
var OkButton = Browser.Link(Find.ByText("Ok"));
if(!OkButton.Exists)
{
\\Log error here
}
OkButton.Click();
Browser.WaitForCompplete();
Or you can find the div containing the link like,
var ContainerDiv = Browser.Div(Find.ByClass("buttonBlueOnWhite"));
if(!ContainerDiv.Exists)
{
\\Log error here
}
ContainerDiv.Links.First().Click();
Browser.WaitForComplete();

Resources