Producing a pdf with internal anchor links using dompdf - pdf-generation

I am using dompdf to collate a load of existing HTML pages. A lot of these pages have anchor links in them that I would like to preserve. When I collate these articles the pdf collates very nicely but the anchors links don't work. The text is underlined like a link but on clicking it you don't go anywhere.
I have some test HTML that I am using to try out anchor links. Such as:
$content .= '<div style="page-break-after: always;">blah</div>
<div><a id="blah">link location</a></div>';
and also I have tried using name instead of id, based on this forum post - http://www.dashinteractive.net/dompdf/index.php?v=1530231. Such as:
$content .= '<div style="page-break-after: always;">blah</div>
<div><a name="blah">link location</a></div>';
Of course neither of these are working as I would expect.
I can't find much on the internet about how dompdf handles internal links. Apart from this page http://webresourcesdepot.com/html-to-pdf-rendering-engine-dompdf/ that says it can handle links and anchors. Not sure how reliable it is...
How do you put internal anchor links in pdfs using dompdf please? Can it do it?

dompdf up through 0.6.2 should work so long as you use the <a name="blah">...</a> format. The only problem in that release is that if the A tag is empty it will be removed before the link is rendered.
Your second sample should be fine, though maybe just as part of typing up the question the actual anchor reference is incorrect. The following should work:
<div style="page-break-after: always;">blah</div>
<div><a name="blah">link location</a></div>
The current beta for 0.7.0 has a bug that mangles the anchor resulting in a mis-interpreted link type. That issue should be addressed for the stable 0.7.0 release.
Note that no version up to and including v0.7.0 supports linking based on ID.

Related

How to provide details/summary HTML element in TYPO3's CKEditor?

Unfortunately there's no details/summary element in the default CKEditor configuration of TYPO3 and I'm looking for a way to add it.
What I've been trying to do:
I searched and found a widget on https://ckeditor.com/cke4/addon/detail , but it's repository on GitHub has been archived and the widget does not work as expected. It requires 'api,widget' and this generates a JavaScript error:
[CKEDITOR.resourceManager.load] Resource name "api" was not found at "/typo3/sysext/rte_ckeditor/Resources/Public/JavaScript/Contrib/plugins/api/plugin.js?t=K24B-4e13cc129f".
When removing this requirement for "api", there's an error regarding line 72
CKEDITOR.api.parser.add(item, el);.
Then I found a similar widget at GitHub , which looks like an older version of the former without requirement for "api".
It already looks quite good, but is still a bit buggy: the HTML structure is changed when saving and the summary is duplicated. When switching to the source code, the HTML structure specified in the template ...
<details><summary>Summary</summary><div class="details-content"></div></details>
… get's partially lost.
I'm not sure if the widgets are buggy or if the editor is limited by the integration into TYPO3 and I was also not able to combine the two in a way that would lead to a working solution.
Update (Jul 22):
I successfully modified the Creating a Simple CKEditor Widget (Part 1) example to create
a widget with the following HTML structure:
<div class="expander">
<p class="expander-title">Title</p>
<div class="expander-content"><p>Content...</p></div>
</div>
With the help of a small JavaScript snippet and some CSS it now behaves almost like a details-summary element, but is not quite as good in terms of SEO and accessibility.
If I replace the elements <div class="expander"> and <p class="expander-title"> with <details> and <summary> in the widget, it unfortunately doesn't work properly anymore and changes the structure when saving. For some reason the RTE treats them differently.
I have already manually added the following to the RTE configuration:
processing:
allowTags:
- details
- summary
allowTagsOutside:
- details
- summary

CKEDITOR extra html tag when paste from word

I have blog page in my website and website try to write content with CKEDITOR and copy it from word but when the content publiched, the html have too many extra tag that the are not useful.
how can we filter and delete extra tag and just save main html tag for blog page
thank you
As you did not provide any code sample, its hard to find out your problem. From my understanding I am writing this answer. Hope this will help.
When you save the text from the CKEDITOR fields, it saves text and HTML tags as well. So while viewing the saved texts, you can use the Laravel blade engine print function like this.
{!! $variale->field_name !!}
It will remove the HTML tags and only the text will be shown.

Web scraping from youtube with nokogiri

I want to scrape all the names of the users who commented below a youtube video.
I'm using ruby and nokogiri.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://www.youtube.com/watch?v=tntOCGkgt98"
doc = Nokogiri::HTML(open(url))
doc.css(".comment-thread-renderer > .comment-renderer").each do |comment|
name = comment.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
But it's not working, I'm not getting any output, no error either.
I won't be able to give you a solution, but at least I can give you a couple of hints that may help you to move forward.
The code you have is not working because the comments section is loaded via an ajax call after the page is loaded. If you do a hard reload in your browser, you will see that there is a spinner icon and a Loading... text in the sections comment, waiting for the content to be loaded. When Nokogiri gets the page via the http request, it gets the html content that you see before the comments are loaded. As a matter of fact the place where the contents will be later added looks like:
<div id="watch-discussion" class="branded-page-box yt-card">
<div id="comment-section-renderer"
class="comment-section-renderer vve-check"
data-visibility-tracking="CCsQuy8iEwjr3P3u1uzNAhXIepAKHRV9D8Ao-B0=">
<div class="action-panel-loading">
<p class="yt-spinner ">
<span class="yt-spinner-img yt-sprite" title="Loading icon">
</span>
<span class="yt-spinner-message">Loading...</span>
</p>
</div>
</div>
</div>
That is the reason why you won't find the divs you are looking for, because they aren't part of the html you have.
Looking at the network console in the browser, it seems that the ajax request to get the comments data is being sent to https://www.youtube.com/watch_fragments_ajax?v=tntOCGkgt98&tr=time&distiller=1&ctoken=EhYSC3RudE9DR2tndDk4wAEAyAEA4AEBGAY%253D&frags=comments&spf=load. As you can see the v parameter is the video id, however there are a couple of caveats:
There is a ctoken param, which you can get by scraping the original page contents. It is inside a <script> tag, in the form of
'COMMENTS_TOKEN': "<token>".
However, you still need to send a session_token as a form data in the body of the AJAX request (which is a POST). That I don't know where is coming from :(.
I think that you will be pushing the limits of Nokogiri here, as AFAIK it is not intended to follow ajax requests or handling Javascript. Maybe the ruby Selenium driver is better suited for this.
HTH
I think you need name.css("#comment-section..."
The each statement will iterate over the elements, using the variable name.
You may want to use node instead of name:
doc.css(".comment-thread-renderer > .comment-renderer").each do |node|
name = node.css("#comment-section-renderer-items .g-hovercard").text
puts name
end
I wrote this rails app using nokogiri to see all the tags that a page has before any javascript is run in the browser. The source code is here, so you can adjust it if you need to add more info about the node in the view.
That can easily tell you if the particular tag element that you are looking for is something you can retrieve without having to do some JS eval.
Most web crawlers don't support client-side rendering, which gives you an idea that it's not a trivial task to execute JS when scraping content.
YouTube is a dynamically rendered JavaScript website, though it could be parsed with Nokogiri without using Selenium or another package. Try open the Network tab in dev tools, scroll to the comment section, and see what request being send.
You need to make a post request in order to fetch comments data. You can preview the output in the "Preview" tab.
Preview output:
Which is equivalent to this comment:
Note: Since this comment brings very little value, this answer will be updated with the attached code once there will be an available solution.

DOMDocument and FBML/Google + Button not working

Well I'm having a bit of an issue, I have an application that uses DOMDocument to display some content but it is removing some code that is needed for FBML and a Google +1 button to display.
For example, Facebook's like button is <fb:like>, it is removing fb: from the string. Google's +1 button is like <g:plusone> and it's removing g:.
Is there any way to make it not remove that part of the code?
You can solve both issues.
With Facebook like button, you will want to use the HTML5 version. See: https://developers.facebook.com/docs/reference/plugins/like/
ex: <div class="fb-like" data-send="true" data-width="450" data-show-faces="true"></div>
With Google's Plus one you can use the HTML5 version as well. See: https://developers.google.com/+/plugins/+1button/
ex: <div class="g-plusone" data-size="tall" ... ></div>

facebook javascript, image picker

I/m having many img tag dynamically printed from facebook album pictures, like below-
echo '<'img src="' . $photo_detail['src_small'] . '" id="imageurl" onclick="return false" />';
I need is- When clicking on image the source of image is should set to- <'input type="hidden" id="imagesrc"/> value in the form
when submitting the form the value also submitted - like an image picker
To do what you need you will need to have an understanding of Javascript as well as Facebook's custom version of javascript called FBJS (if you are building an FBML canvas application).
If you do not yet have a strong understanding of how to do this outside of Facebook then I recommend reading through a good book on Javascript until you do.
Once you understand how to do this outside of Facebook the following wiki page should be a good guide on how you can use the same technique with FBJS: http://developers.facebook.com/docs/fbjs/

Resources