Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
OK, as admins requested it, i refine my question:
i am searching for local Proxie solutions which can tweak HTTPS / HST websites. They should be able to tweak the content and the headers of the site. Do you know such Proxies? I would prefer Python solutions because they are hackable.
Yes, there are solutions which work with Browser plugins, and i have posted an answer containing an example using Yarip, but the problem is: As soon as the Browser developers decide to remove APIs, not that anyone would do that, the Plugin stops working.
Therefore i want to have a solution which works on the protocol level. So, which Proxies can do that, tweak HTTPS / HST websites? I dont care for performance, my internet is slow anyway and im not in a hurry. Please also give a small example how to tweak the content of the website and a small example how to tweak a header, using your solution.
Hopefully my question is clear now.
Ok, here is my solution to this problem. AJAX with bottle.py and the jQuery Docs were helpful.
Use Firefox (Developer Edition), version 52. Later versions of Firefox do not support Yarip, which is my approach to inject Javascript into the document and to tweak the Content-Security-Policy response header, if there is one. I am interested in solutions for later Firefoxes/Chromes/whatever.
Stop Firefox from blocking mixed content (loading http resources from https sites). Which is nonsense on localhost. According to this Discussion and this Wiki Entry, starting with Firefox 55 localhost is finally whitelisted by default. However, as i need Yarip and therefore can not use Firefox 55, i still need to manually disable this policy.
This can be done globally by setting about.config -> security.mixed_content.block_active_content to false which is lazy and veery dangerous, as it affects every website, or by doing it temporarily per-page which is not so lazy but a little bit less dangerous.
Install Python 3
pip install bottle
Create a file server.py with the following contents:
import os, json
from bottle import request, response, route, static_file, debug, run
#route('/inc') # this handles the ajax calls
def inc():
# only needed when you pass parameters to the ajax call.
response.set_header('Access-Control-Allow-Origin', '*')
number = request.params.get('number', 0, type=int)
return json.dumps({'increased_number': number + 1})
#route('/static/<filename:path>') # this serves the two static javascript files
def send_static(filename):
return static_file(filename, root=os.path.dirname(__file__))
debug(True)
run(port=9030, reloader=True)
Run it
Put a copy of jquery.js into the same directory
Create a file logic.js in the same directory with the following contents:
// http://api.jquery.com/jQuery.noConflict/
var my = {};
my.$ = $.noConflict(true);
// http://api.jquery.com/ready/
my.$(function() {
var target = my.$(
'<div id="my-ajax-result" style="position:absolute; padding:1em; background:'
+ 'white; cursor:pointer; border:3px solid blue; z-index:999;">0</div>'
);
my.$('body').prepend(target);
function ajaxcall(){
// http://api.jquery.com/jQuery.getJSON/
my.$.getJSON(
"http://localhost:9030/inc",
{
number : target.text() // parameters
},
function(result) {
target.text(result.increased_number);
}
);
}
// http://api.jquery.com/click/
target.click(function(event) {
ajaxcall();
return false;
});
ajaxcall();
});
Install Yarip
Save the following xml to a file and, from Yarips manage pages dialog import it. This should create a new rule for en.wikipedia.org. Im too lazy now to explain how yarip works, but it is worth learning it. This rule will inject the jquery.js and the logic.js at the end of body and it will tweak the Content-Security-Policy response header, if there is one.
<?xml version="1.0" encoding="UTF-8"?>
<yarip version="0.3.5">
<page id="{56b07a5d-e2df-41f2-9ca8-34b4ecb04af8}" name="wikipedia.org" allowScript="true" created="1496845936929">
<page>
<header>
<response>
<item created="1496845998973">
<regexp flags="i"><![CDATA[.*wikipedia\.org.*]]></regexp>
<name><![CDATA[Content-Security-Policy]]></name>
<script><![CDATA[function (value) {
return "Content-Security-Policy: connect-src *";
}]]></script>
</item>
</response>
</header>
<stream>
<item created="1496845985382">
<regexp flags="i"><![CDATA[.*wikipedia\.org.*]]></regexp>
<stream_regexp flags="gim"><![CDATA[</body>]]></stream_regexp>
<script><![CDATA[function (match, p1, offset, string) {
return '<script type="text/javascript" src="http://localhost:9030/static/jquery.js"></script><script type="text/javascript" src="http://localhost:9030/static/logic.js"></script></body>';
}
]]></script>
</item>
</stream>
</page>
</page>
</yarip>
Make sure that yarip is enabled
Navigate to en.wikipedia.org. You should see a blue rectangle top left, with a number in it. If you click on it, an ajax call to localhost will be made and the contents of the blue rectangle will be replaced with the result result of that call – the number increased by 1. Screenshot:
Play around with this and tweak the web the way you want, including HTTPS sites. Have read/write access to your computer using the Python. Eat this, Firefox Nanny devs.
Related
I need to get the headline and the text separately out of a text content element. The reason is, to give the editor a simple way to add a content for a complicated section in my html theme.
I am new to TYPO3 an we run V11.5.16! I read and watched some tutorials and I got most of my site already working! Contents are dynamic and multilinguale so far.
To get contents from backend, I use Backend Layouts and copy the content from styles.content.get inside my setup.typoscript. I think this is the common way to do it, and as I said, it works. I output them using {contentXY->f:transform.html()} or {contentXY->f:format.raw()}.
For a text content element, I get something like:
<div id="c270" class="frame frame-default frame-type-text frame-layout-0">
<header>
<h2 class="">Headline</h2>
</header>
<p>Some Text</p>
</div>
Is it possible to get only "Headline"? And if so, it hopefully works also for getting out separately "Some Text"
Something like: {contentXY->f:transform.html(filterBy('h2'))}
Thanks in Advance!
EDIT:
According to Peter Krause's answer: I know, there is an extra content element for headers. But I need the text content element, because for the places in the html, I need header AND text. And the editors are technically not savy enough to fill in different content elements. Please don't ask in more detail. ):
You can handle header and body of an CE seperately, but not in a page context.
In page context you get the result from rendering the CEs, which is a string (with HTML).
For each CE there is a rendering information, which nowadays is also FLUID.
Depending on your installation it probably is FSC (ext:fluid_styled_content) or a Bootstrap extension.
This means: there are FLUID templates which can be overriden and modified.
In these templates you have access to each field of a CE separately.
Look for the templates stored in the defined paths (in TSOB) and add your own path for overides:
lib.contentElement {
templateRootPaths {
1 = ...
2 = ...
3 = ...your path...
}
partialRootPaths {
1 = ...
2 = ...
3 = ...your path...
}
layoutRootPaths {
1 = ...
2 = ...
3 = ...your path...
}
}
Thanks for all hints! I think, for my requirement, there is no solution out of the box. So i made a custom CE with Mask and edited the template html. For non-technical editors, it is the best solution in terms of data input. I hope this stands for future upgrades...
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
This is my script written to scrape data inside the <iframe> tag using Nokogiri:
require 'nokogiri'
require 'restclient'
doc = Nokogiri::HTML(RestClient.get("http://www.sample_site.com/"))
doc.xpath('//iframe[#width="1001" and #height="973"]').children
I am getting like this:
=> [#<Nokogiri::XML::Text:0x1913970 "\r\nYour browser does not support inline frames\r\n">]
Can anyone tell me why?
An iframe is used to embed another document within the current HTML document. It means the iframe loads his content from an external source that is specified in the src attribute.
So, if you want to do scraping to an iframe content you should send a request to the external source from where it loads his content.
# The iframe (notice the 'src' attribute)
<iframe src="iframe_source_url" height="973" width="1001">
# iframe content
</iframe>
# Code to do the scraping
doc = RestClient.get('iframe_source_url')
parsed_doc = Nokogiri::HTML(doc)
parsed_doc.css('#yourSelectorHere') # or parsed_doc.xpath('...')
Note (about the error)
When you do scraping, the HTTP client you use acts as your browser (yours is restclient). The error says your browser does not support inline frames, in other words, restclient does not support inline-frames and is why it cannot load the content of the frame.
The issue is to be addressed to RestClient, not to Nokogiri.
RestClient does not retrieve the content of iframes. You might want to try to examine the content of RestClient.get("http://www.sample_site.com/"), there will be the string like:
<iframe src="page-1.htm" name="test" height="120" width="600">
You need a Frames Capable browser to view this content.
</iframe>
Nokogiri is fine dealing with this, it returns the content of iframe node which is apparently the only TextNode having the string you yielded as a result.
I'm developing a web browser on Android and want to show the URL logo for the most visited sites like in Chrome (4 X 2). But the problem is that most favicons (eg: http://www.bbc.co.uk/favicon.ico) are of size either 16X16 or 32X32 and they don't look good when scaled up.
Is there a way I can download a high resolution icon/bitmap for an URL in a standard way? How about opening the home page and then extracting all the image links and then choose an image with the name logo in it? Would this method work for all the URLs? I want to know if there is a standard way to obtain a high resolution icon for a given URL or favicon is the only standard way to get the website logo?
You can code it yourself or use an existing solution.
Do-it-yourself algorithm
Look for Apple touch icon declarations in the code, such as <link rel="apple-touch-icon" href="/apple-touch-icon.png">. Theses pictures range from 57x57 to 152x152. See Apple specs for full reference.
Even if you find no Apple touch icon declaration, try to load them anyway, based on Apple naming convention. For example, you might find something at /apple-touch-icon.png. Again, see Apple specs for reference.
Look for high definition PNG favicon in the code, such as <link rel="icon" type="image/png" href="/favicon-196x196.png" sizes="196x196">. In this example, you have a 196x196 picture.
Look for Windows 8 / IE10 and Windows 8.1 / IE11 tile pictures, such as <meta name="msapplication-TileImage" content="/mstile-144x144.png">. These pictures range from 70x70 to 310x310, or even more. See these Windows 8 and Windows 8.1 references.
Look for /browserconfig.xml, dedicated to Windows 8.1 / IE11. This is the other place where you can find tile pictures. See Microsoft specs.
Look for the og:image declaration such as <meta property="og:image" content="http://somesite.com/somepic.png"/>. This is how a web site indicates to FB/Pinterest/whatever the preferred picture to represent it. See Open Graph Protocol for reference.
At this point, you found no suitable logo... damned! You can still load all pictures in the page and make a guess to pick the best one.
Note: Steps 1, 2 and 3 are basically what Chrome does to get suitable icons for bookmark and home screen links. Coast by Opera even use the MS tile pictures to get the job done. Read this list to figure out which browser uses which picture (full disclosure: I am the author of this page).
APIs and open source projects
RealFaviconGenerator: You can get any web site favicon or related icon (such as the Touch Icon) with this favicon retrieval API. Full disclosure: I'm the author of this service.
BestIcon: Although less comprehensive, Besticon offers a good alternative, especially if you want to host the code yourself. There is also a hosted version you can use right away.
The Go code at https://github.com/mat/besticon tries to solve this problem.
For example
$ besticon http://github.com
http://github.com: https://github.com/apple-touch-icon-144.png
There is also an accompanying hosted version of the code, see for example http://icons.better-idea.org/icons?url=github.com.
(Disclaimer: I wrote it because I needed to solve the same problem a while ago.)
another option is getting favicons from any domain using a hidden google API
the favicon link pattern will be
https://www.google.com/s2/favicons?domain={domain}&sz={size}
for example
https://www.google.com/s2/favicons?domain=stackoverflow.com&sz=64
Logos are not going to be consistently named and very difficult to identify consistently. Consider putting the favicon on a colour tile of suitable dimensions. People will quickly associate the colour with the website. You could either extract a dominant colour from the website or favicon using something like colorthief, or make each one unique using a golden angle formula to choose a hue.
Here is a new and genuine solution which will always give you the best results-
Webchromeclient gives a callback of onReceivedTouchIconUrl method for all the websites just hold the url from here.
Next step is to convert this url to bitmap which can be done like this-
try {
URL url = new URL(touchiconUrl);
HttpURLConnection connection =
(HttpURLConnection)url.openConnection();
connection.setDoInput(true);
connection.connect();
InputStream input = connection.getInputStream();
Bitmap myBitmap = BitmapFactory.decodeStream(input);
return myBitmap;
} catch (IOException e) {
e.printStackTrace();
return null;
}
Next step is to send this bitmap for the shortcut.
Note: Remember to create bitmap on background thread like asynctask.
This HTML document requires a base url and the HTML/"View Page Source" of the web page and should output the values.
<!doctype html>
<input type=text placeholder=URL><br>
Place "View Page Source" of HTML homepage<br>
<textarea id=HTML placeholder="HTML content of webpage">
</textarea><br>
<input type=Submit>
<script>
function url(u,n){
try{
u = u.getAttribute(n);
}
catch(e){
return 'null';
}
if(u.slice(0,2) == "//"){
u = "http:"+u;
}
else if(u.slice(0,1) == "/"){
u = u.slice(0,1);
}
return '<img src="'+u+'">';
}
document.querySelector('input[type=Submit]').onclick = function(){
var output = '';
var HTML = document.getElementById('HTML').value;
var doc = document.implementation.createHTMLDocument("New Document");
doc.documentElement.innerHTML = HTML;
output = output + "apple-touch-icon<br>"+url([].slice.apply(doc.querySelectorAll('link[rel="apple-touch-icon"]')).reverse()[0],'href')
// deprecated output = output + "apple-touch-icon-precomposed<br>"+url([].slice.apply(doc.querySelectorAll('link[rel="apple-touch-icon-precomposed"]')).reverse()[0],'href')
output = output + "<br>image/png<br>" + url(doc.querySelectorAll('link[rel="icon"][type="image/png"]')[0],'href');
// <meta name="msapplication-TileImage" content="/mstile-144x144.png">
// deprecated output = output + "<br>msapplication-Ti:<br>"+ url(doc.querySelectorAll('link[name="msapplication-TileImage"]')[0],'content');
// <meta name="msapplication-config" content="/browserconfig.xml/ ">
//output = output + "<br>msapplication-con: "+ url(doc.querySelectorAll('meta[name="msapplication-config"]')[0],'content');
// <meta property="og:image" content="http://somesite.com/somepic.png"/>
output = output + "<br>og:image<br>" + url(doc.querySelectorAll('meta[property="og:image"]')[0],'content');
// <link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
output = output + "<br>image_source<br>" + url(doc.querySelectorAll('link[rel="image_src"]')[0],'href');
var URL = window.location.hash;
document.getElementById('output').innerHTML = output;
};</script>
<div id=output></div>
If you would like to automate the retrieval of the HTML you could use something like the following for PHP.
<? echo file_get_contents($_GET["url"]);
Usually favicon is small (like 16x16 or 32x32). If you need bigger dimensions, extract not favicon, but logo from homepage/header.
How can I programmatically add script or stylesheet tag to a page specified in page's YAML front matter (meta)?
Assuming there is src/documents/posts/a.html.eco with following contents:
---
layout: default
scripts: ['a.js']
---
Blog post that requires a special javascript
and layout src/layouts/default.html.eco with following contents:
...
#getBlock('scripts').toHTML()
</body>
...
The final result for posts/a.html should be:
...
<!-- some extra stuff that was added when processing script tag -->
<script scr="/scripts/a.js"></script>
</body>
...
..while other pages shouldn't have a reference to /scripts/a.js
The comment above tag is just to show that there may be some processing envolved before injecting the tag.
I tried many approaches using different events in docpad.coffee file (including approach taken from docpad-plugin-livereload plugin) but every time I was facing the same problem - script tag was applied to all pages instead of being applied to a.html only. Here is one of my tries:
renderDocument: (opts) ->
{extension,templateData,file,content} = opts
if extension == 'html' and scripts = file.get('scripts')
if typeof scripts != 'undefined'
scripts.forEach (scriptName) ->
#docpad.getBlock('scripts').add('<!-- custom script tag here -->')
I've also tried render event, populateCollections (which is not documented however I found it in docpad-plugin-livereload plugin) and even extendTemplateData events and no luck so far.
I know there is a method of doing this right inside a layout:
#getBlock('scripts').add(#document.scripts or [])
..which is totally fine and it really works as expected however it doesn't seem to provide enough freedom for me to manipulate the content before it's injected to a page.. And even if it's possible I won't like the idea of having some heavy logic inside layout template, I want it to be in a plugin/docpad.coffee
Hopefully that makes sense
Try templateData.getBlock('scripts').add instead of docpad.getBlock('scripts').add
My brand new job is full of wonderful and awful surprises. one of the most interesting part of this job is the will to enhance, accelerate, make everything scale.
And today, first real problem.
Here's the deal : we get up to 20 list elements, each one of them displaying its own Facebook share, Twitter share, and Facebook Like button.
As you can imagine, 60 iframes opening is just a pain for user experience.
My question : anybody has already been facing such problems, and what would you recommend to upscale these performance issues ?
While I'm thinking of an AddThis implementation, I hope there are other solutions I could consider.
Best way to improve performance is not to copy paste the code from facebook plugins.
Facebook 'Like Button' code looks like:
<div id="fb-root"></div>
<script src="http://connect.facebook.net/en_US/all.js#appId=127702313984818&xfbml=1"></script>
<fb:like href="example.com" send="true" width="450" show_faces="true" font=""></fb:like>
Issue with this is, if you have 20 like buttons, then 20 Divs are created with id="fb-root" and 20 times the script for all.js is called. Best way is to move the
<div id="fb-root"></div>
<script src="http://connect.facebook.net/en_US/all.js#appId=127702313984818&xfbml=1"></script>
to header of page and whenever you want a like button, only use
<fb:like href="example.com" send="true" width="450" show_faces="true" font=""></fb:like>
Same goes for facebook comments & other plugins.
Also, foir some plugins facebook provides option to either use xfmbl or iframe code. Always pick the iframe code because facebook's js has to parse all xfbml code and convert to iframe. It causes a lot of DOM insertions and slows down the page.
Hope this helps!