PHP + Wikipedia: Get content from the first paragraph in a Wikipedia article? - xpath

I’m trying to use Wikipedia’s API (api.php) to get the content of a Wikipedia article provided by a link (like: http://en.wikipedia.org/wiki/Stackoverflow).
And what I want is to get the first paragraph (which in the example of the Stackoverflow wiki article is: Stack Overflow is a website part of the Stack Exchange network[2][3] featuring questions and answers on a wide range of topics in computer programming.[4][5][6]).
I’m going to do some data manipulation with it.
I’ve tried with the API url: http://en.wikipedia.org/w/api.php?action=parse&page=Stackoverflow&format=xml but it gives me some kind of error. It outputs:
<api>
<parse displaytitle="Stackoverflow" revid="289948401">
<text xml:space="preserve">
<ol> <li>REDIRECT Stack Overflow</li> </ol> <!-- NewPP limit report Preprocessor node count: 1/1000000 Post-expand include size: 0/2048000 bytes Template argument size: 0/2048000 bytes Expensive parser function count: 0/500 --> <!-- Saved in parser cache with key enwiki:pcache:idhash:21772484-0!*!0!!*!* and timestamp 20110525165333 -->
</text>
<langlinks/>
<categories/>
<links>
<pl ns="0" exists="" xml:space="preserve">Stack Overflow</pl>
</links>
<templates/>
<images/>
<externallinks/>
<sections/>
</parse>
</api>
I found this snippet of code that I’ve tried
$doc = new DOMDocument();
$doc->loadHTML($wikiPage);
$xpath = new DOMXpath($doc);
$nlPNodes = $xpath->query('//div[#id="bodyContent"]/p');
$nFirstP = $nlPNodes->item(0);
$sFirstP = $doc->saveXML($nFirstP);
echo $sFirstP;
but I can’t get the HTML content in the variable $wikiPage.
I do not know if this is the best or most optimal way to do it so please feel free to comment on that and otherwise any suggestion or solutions would be very appreciated.
Thank you
- Mestika

You're getting the contents of a redirect page. Replace 'Stackoverflow' with 'Stack_Overflow' and it should work.
The API does have support for an &redirects option, which will resolve redirects for you.

Related

How to edit the data in CK editor that comes from API?

I post the data in API and want to edit this data after getting from API. when I try to edit the data it gives me the following error:
CKEditorError: datacontroller-set-non-existent-root: Attempting to set data on a non-existing root. Read more: https://ckeditor.com/docs/ckeditor5/latest/framework/guides/support/error-codes.html#error-datacontroller-set-non-existent-root
<CKEditor
editor={ClassicEditor}
onChange={this.handleChange}
data={html}
></CKEditor>
Here is the dummy solution for now, I don't know what is the proper reason of that:
Create one variable:
let a = "";
Replace the content of the variable a with the your content coming from the API
and parse it with htmlparser:
let data = a.replace("", htmlparser(/*data coming from your api*/))

TagHelper cached output by calling GetChildContentAsync() and Content.GetContent()

According to this article if we use several tag helpers(targeted to the same tag) and in each of them we will use await output.GetChildContentAsync() to recieve html content we will come to the problem with cached output:
The problem is that the tag helper output is cached, and when the WWW tag helper is run, it overwrites the cached output from the HTTP tag helper.
The problem is fixed by using statement like:
var childContent = output.Content.IsModified ? output.Content.GetContent() :
(await output.GetChildContentAsync()).GetContent();
Description of this behaviour:
The code above checks to see if the content has been modified, and if
it has, it gets the content from the output buffer.
The questions are:
1) What is the difference beetween TagHelperOutput.GetChildContentAsync() and TagHelperOutput.Content.GetContent() under the hood?
2) Which method writes result to buffer?
3) What does it mean "cached output": ASP.NET MVC Core caches initial razor markup or html markup as result of TagHelper calling?
Thank in advance!
The explanation of other answer was not clear for me, so i tested it and here is the summary:
await output.GetChildContentAsync(); ⇒ gets the original content inside the tag which is hard coded in the Razor file. note that it will be cached at first call and never changed at subsequent calls, So it does not reflect the changes done by other TagHelpers at runtime!
output.Content.GetContent(); ⇒ should be used only to get content modified by some TagHelper, otherwise it returns Empty!
Usage samples:
Getting the latest content (whether initial razor or content modified by other tag helpers):
var curContent = output.IsContentModified ? output.Content : await output.GetChildContentAsync();
string strContent = curContent.GetContent();

Issues in Updating Metadata while Generating PDF

I am working on a Extend Script which saves FrameMaker Book as a PDF. The script is able to save to the PDF but when I tried to add the PDF Metadata (Author/CreationDate/Keywords/Subject/Title) etc, the same does not reflect in the generated PDF.
On Closure inspection I found that Metadata elements were not added to PDFDocInfo property of the Book.
Here is the code which I wrote to update the Author Details in PDFDocInfo
$.writeln("Length before" + doc.PDFDocInfo.length);
doc.PDFDocInfo.push("Author");
doc.PDFDocInfo.push("Mr Bond");
$.writeln("Length after" + doc.PDFDocInfo.length);
where doc is an Object of type Book
The output is
Length before0
Length after0
Should the PDFDocInfo not have 2 elements in it now. Am I missing any thing here ?
The following code did the trick...
var pdfDocInfo = new Strings();
pdfDocInfo.push("Author");
pdfDocInfo.push("Mr Bond");
book.PDFDocInfo = pdfDocInfo;

Parsing Liquid in a Jekyll generator before converting to JSON

Best to start by saying that I am very new to Ruby and Liquid. I have searched around looking for some resource on this issue, but as yet haven't been able to find anything of real use.
I have a Jekyll site, which utilises the HTML5 History API. I have a Jekyll generator plugin which creates a single JSON file which holds all the post and page content, ready for use with HTML5 PushState and PopState. This part is functioning properly and is tested.
My problem comes when I have a post/page on the site which has Liquid tags in it. I am guessing I need to parse these Liquid tags to get the template output before I create my JSON object for each post/page. Here is what I have for pages as an example:
# Iterate over all pages
site.pages.each do |page|
# Encode the page HTML content to JSON
link = page.url
#content = Liquid::Template.parse(page.content)
hash[link] = { "body_class" => page.data['body_class'], "content" => converter.convert(#content.render), "title" => '<h1>' + page.data["content_title"] + '</h1>' }
end
Now, this at the minute is basically removing all Liquid tags from the generated JSON file, leaving nothing in it's place.
Here is my full generator file on Github which is based very heavily on nice work by Jezen Thomas.
The output JSON file is also in that repo with the site, or can be accessed quickly here. The blog.html content is the last item in the JSON file and shows the empty h1 and div tags which should have content.

ClientGlobalContext.js.aspx broken in Dynamics 2011?

I am trying to implement a custom web resource using jquery/ajax and odata. I ran into trouble and eventually found that when I call:
var serverUrl = context.getServerUrl();
The code throws exceptions.
However, when I change serverUrl to the literal url, it works. I then found forum posts that said I should verify my .aspx page manually by going to https://[org url]//WebResources/ClientGlobalContext.js.aspx to verify that it is working. When I did that I received a warning page:
The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.
--------------------------------------------------------------------------------
Invalid at the top level of the document. Error processing resource 'https://[org url]//WebResources/Clien...
document.write('<script type="text/javascript" src="'+'\x26\x2347\x3b_common\x26\x2347\x3bglobal.ashx\x26\x2363\x3bver\x2...
What the heck does that mean?
Hard to tell outside of context (pun not intended) of your code, but why aren't you doing this?
var serverUrl = Xrm.Page.context.getServerUrl();
(Presumably, because you have defined your own context var?)
Also, this method is deprecated as of Rollup 12, see here: http://msdn.microsoft.com/en-us/library/d7d0b052-abca-4f81-9b86-0b9dc5e62a66. You can now use getClientUrl instead.
I now it is late but hope this will be useful for other people who will face this problem.
Until nowadays even with R15 there are two available ClientGlobalContext.js.aspx
https://[org url]/WebResources/ClientGlobalContext.js.aspx (the bad one)
https://[org url]/[organization name]/[publication id]/WebResources/ClientGlobalContext.js.aspx (The good one)
I don't know why exist 1. but it causes many issues like:
It could not be published or hold information (Your case #Steve).
In a deployment with multiple organizations, seems it saves info only for the last organization deployed causing that methods under Xrm.Page.context. will return info from a fixed organization. Actually each method that underground uses these constants included in ClientGlobalContext.js.aspx: USER_GUID, ORG_LANGUAGE_CODE, ORG_UNIQUE_NAME, SERVER_URL, USER_LANGUAGE_CODE, USER_ROLES, CRM2007_WEBSERVICE_NS, CRM2007_CORETYPES_NS, AUTHENTICATION_TYPE, CURRENT_THEME_TYPE, CURRENT_WEB_THEME, IS_OUTLOOK_CLIENT, IS_OUTLOOK_LAPTOP_CLIENT, IS_OUTLOOK_14_CLIENT, IS_ONLINE, LOCID_UNRECOGNIZE_DOTC, EDIT_PRELOAD, WEB_SERVER_HOST, WEB_SERVER_PORT, IS_PATHBASEDURLS, LOCID_UNRECOGNIZE_DOTC, EDIT_PRELOAD, WEB_RESOURCE_ORG_VERSION_NUMBER, YAMMER_IS_INSTALLED, YAMMER_IS_CONFIGURED_FOR_ORG, YAMMER_APP_ID, YAMMER_NETWORK_NAME, YAMMER_GROUP_ID, YAMMER_TOKEN_EXPIRED, YAMMER_IS_CONFIGURED_FOR_USER, YAMMER_HAS_CONFIGURE_PRIVILEGE, YAMMER_POST_METHOD. For instance method Xrm.Page.context.getUserId() is implemented as return window.USER_GUID;
To be sure that your URL is the correct just follow the link posted by #Chris

Resources