Select attribute in simplexml - simplexml

The xml data looks like this:
<feed>
<entry>
<id>12345</id>
<title>Lorem ipsum</title>
<link type="type1" href="https://foo.bar" />
<link type="type2" href="https://foo2.bar"/>
</entry>
<entry>
<id>56789</id>
<title>ipsum</title>
<link type="type2" href="https://foo4.bar"/>
<link type="type1" href="https://foo3.bar" />
</entry>
</feed>
I want to select the content of the href attribute from a link with certain type. (note that type 1 is not always the first link)
Part of the code that works:
for($i=0; $i<=5; $i++) {
foreach($xml->entry[$i]->link as $a) {
if($a["type"] == "type2")
$link = (string)($a["href"]);
}
}
However, I wonder if there is a faster and more elegant solution to this that does not require a foreach loop. Any ideas?

Have you tried using xpath? http://php.net/manual/en/simplexmlelement.xpath.php
This will allow you to do a search for nodes with the specified tag/attribute.
$nodes = $xml->xpath('//link[#type="type2"]');
foreach ($node in $nodes)
{
$link = $node['href'];
}
// Updated
You can skip the for loop if you are interested in only the first value. The xpath function returns an array of SimpleXmlElement objects so you can use index 0 to retrieve the first element, and then it's property.
Note - If the element is missing or cannot be found, the xpath element will return false, and the below code will error. The code is for illustration only, so you should verify error checking when implementing it.
// This will work if the xml always has the required attrbiute - will error if it's missing
$link = $xml->xpath('//link['#type="type2"]')[0]['href'];

Use xpath:
$xml->xpath('//link[#type="type2"]');
More on the language at w3.org

Related

Yii2 translation does not work in config/params

I have the following config/params.php in my yii2-basic-app:
<?php
$siteName = Yii::t('app','Site Name'); //previously, this value had been placed directly in the array just a try to make it available to the translation
return [
'adminEmail' => 'admin#example.com',
'siteName' => $siteName,
'textToPrint' => null,
'meta-description' => $siteName,
];
The message Site Name is already has a translation in #app/messages/ar/app.php and the translation is working fine on the website.
However, when I try to use meta tag description in the main layout like the following:
<meta name="description" content="<?= Yii::$app->params['meta-description'] ?>" />
So, in any view, if I have set a value to Yii::$app->params['meta-description'] it should be printed out in the layout while when there is no any supplied value to it, it should print the initial value defined in config/params.php.
The problem is, the initial value is printed without translation. This is issue may be solved by translating the string in the main layout as the following:
<meta name="description" content="<?= Yii::t('app',Yii::$app->params['meta-description']) ?>" />
Due to the above solution I have two questions:
Why the string did not be translated in the config/params.php?
Does the heavy use of Yii::t() with many untranslated strings, (in my case, when I decide to override the value Yii::$app->params['meta-description'] in a view), has any performance issue?
Answers:
Because config/params.php file will merge with main config before initialization of main application. For translation will used \yii\i18n\I18N component.
Yii2::t() is not heavy method. But if you have any problems with performance, you can override this method and execute Yii:$app->getI18n()->translate() only for existing strings, or enable cache this values.
You can use something like this
public static function translateParams($param)
{
if (is_array($param)) {
array_walk($param, function (&$value) {
$value = \Yii::t("app", $value);
});
return $param;
} else {
return \Yii::t("app", $param);
}
}

Scraping framework with xpath support

I'm looking for a web scraping framework that lets me
Hit a given endpoint and load the html response
Search for elements by some css selector
Recover the xpath for that element
Any suggestions? I've seen many that let me search by xpath, but none that actually generate the xpath for an element.
It seems to be true that not many people search by CSS selector yet want a result as an XPath instead, but there are some options to get there.
First I wound up doing this with JQuery plus an additional function. This is because JQuery has pretty nice selection and is easy to find support for. You can use JQuery in Node.js, so you should be able to implement my code in that domain (on a server) instead of on the client (as shown in my simple example). If that's not an option, you can look below for my other potential solution using Python or at the bottom for a C# starter.
For the JQuery approach, the pure JavaScript function is pretty simple for returning the XPath. In the following example (also on JSFiddle) I retrieved the example anchor element with the JQuery selector, got the stripped DOM element, and sent it to my getXPath function:
<html>
<head>
<title>The jQuery Example</title>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script type="text/javascript">
function getXPath( element )
{
var xpath = '';
for ( ; element && element.nodeType == 1; element = element.parentNode )
{
var id = $(element.parentNode).children(element.tagName).index(element) + 1;
id > 1 ? (id = '[' + id + ']') : (id = '');
xpath = '/' + element.tagName.toLowerCase() + id + xpath;
}
return xpath;
}
$(document).ready(function() {
$("#example").click(function() {
alert("Link Xpath: " + getXPath($("#example")[0]));
});
});
</script>
</head>
<body>
<p id="p1">This is an example paragraph.</p>
<p id="p2">This is an example paragraph with a <a id="example" href="#">link inside.</a></p>
</body>
</html>
There is a full library for more robust CSS selector to XPath conversions called css2xpath if you need more complexity than what I provided.
Python (lxml):
For Python you'll want to use lxml's CSS selector class (see link for full tutorial and docs) to get the xml node.
The CSSSelector class
The most important class in the lxml.cssselect module is CSSSelector.
It provides the same interface as the XPath class, but accepts a CSS
selector expression as input:
>>> from lxml.cssselect import CSSSelector
>>> sel = CSSSelector('div.content')
>>> sel #doctest: +ELLIPSIS <CSSSelector ... for 'div.content'>
>>> sel.css
'div.content'
The selector actually compiles to XPath, and you can see the
expression by inspecting the object:
>>> sel.path
"descendant-or-self::div[#class and contains(concat(' ', normalize-space(#class), ' '), ' content ')]"
To use the selector, simply call it with a document or element object:
>>> from lxml.etree import fromstring
>>> h = fromstring('''<div id="outer">
... <div id="inner" class="content body">
... text
... </div></div>''')
>>> [e.get('id') for e in sel(h)]
['inner']
Using CSSSelector is equivalent to translating with cssselect and
using the XPath class:
>>> from cssselect import GenericTranslator
>>> from lxml.etree import XPath
>>> sel = XPath(GenericTranslator().css_to_xpath('div.content'))
CSSSelector takes a translator parameter to let you choose which
translator to use. It can be 'xml' (the default), 'xhtml', 'html' or a
Translator object.
If you're looking to load from a url, you can do that directly when building the etree: root = etree.fromstring(xml, base_url="http://where.it/is/from.xml")
C#
There is a library called css2xpath-reloaded which does nothing but CSS to XPath conversion.
String css = "div#test .note span:first-child";
String xpath = css2xpath.Transform(css);
// 'xpath' will contain:
// //div[#id='test']//*[contains(concat(' ',normalize-space(#class),' '),' note ')]*[1]/self::span
Of course, getting a string from the url is very easy with C# utility classes and needs little discussion:
using(WebClient client = new WebClient()) {
string s = client.DownloadString(url);
}
As for the selection with CSS Selectors, you could try Fizzler, which is pretty powerful. Here's the front page example, though you can do much more:
// Load the document using HTMLAgilityPack as normal
var html = new HtmlDocument();
html.LoadHtml(#"
<html>
<head></head>
<body>
<div>
<p class='content'>Fizzler</p>
<p>CSS Selector Engine</p></div>
</body>
</html>");
// Fizzler for HtmlAgilityPack is implemented as the
// QuerySelectorAll extension method on HtmlNode
var document = html.DocumentNode;
// yields: [<p class="content">Fizzler</p>]
document.QuerySelectorAll(".content");
// yields: [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>]
document.QuerySelectorAll("p");
// yields empty sequence
document.QuerySelectorAll("body>p");
// yields [<p class="content">Fizzler</p>,<p>CSS Selector Engine</p>]
document.QuerySelectorAll("body p");
// yields [<p class="content">Fizzler</p>]
document.QuerySelectorAll("p:first-child");

Manually control <head> markup in Joomla

Is there a way to manually configure the contents of the <head> section of the site in Joomla 3.1? I want to use the templating system for the entire markup of the page, including everything between <html></html>.
I just read this: http://forum.joomla.org/viewtopic.php?f=466&t=230787 and I am astonished at the response. Surely this is template/data separation 101. Has this been fixed in the latest Joomla release?
If you are planning for a template development and you need all your template data get separated from Joomla libraries or core file (the head section).
Normally the head section include will works like
<jdoc:include type="head" />
it loads the content from libraries libraries\joomla\document\html\renderer\head.php
If you want to override the content of head you can make a module for your task.
Just create a module and include that module instead of this head make sure that have all required codes added to work $document Class otherwise it miss a lot off features of Joomla regarding document class
As explained by the answer from Jobin, normally, you would include the head data by using the <jdoc:include type="head" /> tag, but if you want more control over this, you can use the JDocument.
Example code in your template's PHP:
$doc = JFactory::getDocument();
$my_head_data = $doc->getHeadData();
This will give you an array of the data that JDocument would normally print, so that you can completely choose what to print and how.
To make jQuery load from CDN and get it on top of the script list, I made a little patch just after the $doc = JFactory::getDocument(); that manipulates the header array directly inside the $this object as follows:
$my_jquery = "//ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js";
$my_jquery_ui = "//ajax.googleapis.com/ajax/libs/jqueryui/1.11.2/jquery-ui.min.js";
$my_jquery_cx = $this->baseurl."/media/jui/js/jquery-noconflict.js ";
foreach($this->_scripts as $k=>$v) {
// put own jquery.conflict && jquery-ui && jquery on top of list
if( strpos($k,'jquery.min.js')) {
unset($this->_scripts[$k]);
$r = array( $my_jquery_cx => $v);
$this->_scripts = $r + $this->_scripts;
$r = array( $my_jquery_ui => $v);
$this->_scripts = $r + $this->_scripts;
$r = array( $my_jquery => $v);
$this->_scripts = $r + $this->_scripts;
}
else if( strpos($k,'jquery.ui.min.js')) {
unset($this->_scripts[$k]);
}
else if( strpos($k,'jquery-noconflict.js')) {
unset($this->_scripts[$k]);
}
}
Replace $my_jquery_xxx with editable config parameters in your templateDetails.xml file

Html tags in xml (rss)

Followed http://damieng.com/blog/2010/04/26/creating-rss-feeds-in-asp-net-mvc to create RSS for my blog. Everything fine except html tags in xml document. Typical problem:
<br />
insted of
<br />
Normally I would use
#HtmlRaw()
or
MvcHtmlString()
But how can I fix it in XML document created with SyndicationFeed?
Edit:
Ok, I'm starting to think that my question is pointless.
Should I just leave my RSS as it is?
With the XML element, you can wrap the text with your HTML in it in as a CDATA section:
<![CDATA[
your html
]]>
I don't recommend doing that, however.
wrap the text in side the CDATA
var xml= '<person><name><![CDATA[<h1>john smith</h1>]]></name></person>',
xmlDoc = $.parseXML( xml ),
$xml = $( xmlDoc ),
$title = $xml.find( "name" );
$($title.text()).appendTo("body");
DEMO

Google Documents List API - How to Publish a Document

I'm utterly lost as to how one can programmatically publish a Google Document (specifically a spreadsheet).
I've read the Google Documents List API Protocol Guide and have found this:
http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#GettingRevisions
The next section of the article begins with 'Publishing documents by publishing a single revision' and this is where I found this example:
PUT /feeds/default/private/full/resource_id/revisions/revision_number
GData-Version: 3.0
Authorization: <your authorization header here>
Content-Length: 722
Content-Type: application/atom+xml
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:gd='http://schemas.google.com/g/2005'
xmlns:docs="http://schemas.google.com/docs/2007" gd:etag="W/"DkIBR3st7ImA9WxNbF0o."">
<id>https://docs.google.com/feeds/id/resource_id/revisions/1</id>
<updated>2009-08-17T04:22:10.440Z</updated>
<app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-06T03:25:07.799Z</app:edited>
<title>Revision 1</title>
<content type="text/html" src="https://docs.google.com/feeds/download/documents/Export?docId=doc_id&revision=1"/>
<link rel="alternate" type="text/html"
href="https://docs.google.com/Doc?id=doc_id&revision=1"/>
<link rel="self" type="application/atom+xml"
href="https://docs.google.com/feeds/default/private/full/resource_id/revisions/1"/>
<author>
<name>user</name>
<email>user#gmail.com</email>
</author>
<docs:publish value="true"/>
<docs:publishAuto value="false"/>
</entry>
I have been retrieving document list feeds and CRUDing worksheets but I cannot get the publishing to work nor do I understand how it is supposed to work. My basic setup for establishing a connection to my feed and preparing the data to be PUT is as follows:
<?php
set_include_path($_SERVER['DOCUMENT_ROOT'].'/library/');
require_once 'Zend/Loader/Autoloader.php';
$autoloader = Zend_Loader_Autoloader::getInstance();
$autoloader->setFallbackAutoloader(true);
$theId = 'my-worksheet-id';
$user = "my-gmail-account-name";
$pass = "my-gmail-account-password";
$service = Zend_Gdata_Docs::AUTH_SERVICE_NAME;
$client = Zend_Gdata_ClientLogin::getHttpClient($user, $pass, $service);
$service = new Zend_Gdata($client);
$xml = "<entry xmlns='http://www.w3.org/2005/Atom' xmlns:gd='http://schemas.google.com/g/2005'
xmlns:docs='http://schemas.google.com/docs/2007' gd:etag='W/\"DkIBR3st7ImA9WxNbF0o.\"'>
<id>https://docs.google.com/feeds/id/spreadsheet:$theId/revisions/1</id>
<updated>2009-08-17T04:22:10.440Z</updated>
<app:edited xmlns:app='http://www.w3.org/2007/app'>2009-08-06T03:25:07.799Z</app:edited>
<title>Revision 1</title>
<content type='text/html' src='https://docs.google.com/feeds/download/documents/Export?docId=$theId&revision=1'/>
<link rel='alternate' type='text/html'
href='https://docs.google.com/Doc?id=$theId&revision=1'/>
<link rel='self' type='application/atom+xml'
href='https://docs.google.com/feeds/default/private/full/spreadsheet:$theId/revisions/1'/>
<author>
<name>$user</name>
<email>$user</email>
</author>
<docs:publish value='true'/>
<docs:publishAuto value='false'/>
</entry>";
$putURL = "http://docs.google.com/feeds/default/private/full/spreadsheet:".$theId."/revisions/0";
$data = $service->put($xml, $putURL);
?>
Which results in a
Fatal error: Uncaught exception 'Zend_Gdata_App_HttpException' with message 'Expected response code 200, got 400 Invalid request URI
Can someone help me out? Has anyone successfully published a Google Document programmatically?
Assuming the document has already been created and has a document id of XXXX
What you need to do is send a "PUT" request with specific headers, and XML (an entry describing your document) as the body, to a specific URL.
Since you are not changing any content of the doc (only the meta-data), your target URL will look like this...
https://docs.google.com/feeds/default/private/full/XXXX/revisions/0
The first thing you need to do is authenticate with the proper Google service.
$client = Zend_Gdata_ClientLogin::getHttpClient(GDOC_LOGIN, GDOC_PASS,'writely');
Use the returned object to grab your auth token.
$auth_token = $client->getClientLoginToken();
In Zend/Gdata/App.php is a helper function for executing the PUT request.
Prepare parameters for this method like so...
$method = "PUT";
$url ="https://docs.google.com/feeds/default/private/full/XXXX/revisions/0";
$headers['GData-Version'] = '3.0';
$headers['If-Match'] = '*';
$headers['Authorization'] = "GoogleLogin auth = $auth_token";
$headers['Content-Length'] = '380';
$headers['Content-Type'] = 'application/atom+xml';
$body = <<<XML
<?xml version='1.0' encoding='UTF-8'?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:docs="http://schemas.google.com/docs/2007"
xmlns:gd="http://schemas.google.com/g/2005">
<category scheme="http://schemas.google.com/g/2005#kind"
term="http://schemas.google.com/docs/2007#spreadsheet"/>
<docs:publish value="true"/>
<docs:publishAuto value="true"/>
</entry>
XML;
$contentType = "application/atom+xml";
$remainingRedirects = 99;
Then call the helper function...
$app = new Zend_Gdata_App();
$app->performHttpRequest($method, $url, $headers, $body, $contentType, $remainingRedirects);
Good luck!
Let me know if this helps!
Ok... where do I start?
First of all, your URL is incorrect. (the resource ID you're using is for JSON/XML not URL)
you have
$putURL = "http://docs.google.com/feeds/default/private/full/spreadsheet:".$theId."/revisions/0";
and you should have
$putURL = "http://docs.google.com/feeds/default/private/full/$theId/revisions/0";
(you can omit . for concatenation if you use " as delimiters)
now there are other problems since you're manually creating a xml entry.
Your authorization header is missing.
In your XML you're using revision 1 but in your URL you have revision/0
value is manually written and I'm pretty sure you are not trying to publish a 2 years old file. Same for and
MUST MATCH the retrieved etag or you won't be able to perform any PUT request.
Now you can change these values manually assigning variables but I think it's better to use Zend GData structured returned object.
In any case:
Retrieve from google the document you want to publish.
Find the correct entry (in this case the entry with the ID https://docs.google.com/feeds/id/spreadsheet:$theId/revisions/1)
change docs:publish value property to "true"
send a put request with the modified entry
that should work
I am new at Zend_Gdata myself but have sucessfully uploaded to Google Docs.
I don't know if this is what you are after but here is my code:
$client = Zend_Gdata_ClientLogin::getHttpClient(
'my#googleDocsEmail.address',
'MyPassword',
Zend_Gdata_Docs::AUTH_SERVICE_NAME
);
$gdClient = new Zend_Gdata_Docs($client);
$newDocumentEntry = $gdClient->uploadFile(
$file,
null,
null,
Zend_Gdata_Docs::DOCUMENTS_LIST_FEED_URI
);
I hope this helps,
Garry
Google says that putted data is wrong and response you with code 400.
try to place this code
<?xml version='1.0' encoding='UTF-8'?>
before
<entry xmlns='http://www.w3.org/2005/Atom'...

Resources