HTML efficiency, best practices - performance

In the past, I created some divs to act like articles. Now I am thinking about changing it to HTML5 tag article...
Is there an important diference (in terms of efficiency) between using HTML elements or using equivalent divs created by the user?
For example: Will the browser load the pages faster if they are built only with HTML elements?

Short answer: No.
Long answer: maybe, if it will decrease the amount of markup you use. But not likely.
The benefit of using semantic tags is to add more meaning to the markup, not improve performance.

May be. When you cretae a div and add styling to it, the browser needs to first interpret the element and then process the style over it and render it. If you use the appropriate HTML element, it would put less burden on the rendering engine.

Related

How to slideshow through multiple jpegs/png

This is follow up question to this question: mclapply vs for loops for plotting: speed and scalability focus
If I have a bunch of images in the pattern "n_mu_stdev.jpeg", as given in Chase's solution to the above question, is there an R based method that allows some sort of slideshow with a slider according to the parameter values? From the packages that I have looked at most of the image readers tend read in the image as a data file and then re-plot it rather than just displaying a raw image, thus slowing it down dramatically...
I am not amazingly familiar with JavaScript, CSS or jQuery so if that is the proposed solution, then would probably need a step by step guide in terms of what to install etc.
one other solution i had considered was potentially uploading all individual images upon generation to a photo-sharing site like flickr or imgur (e.g kinda like in knitr), keeping note of the urls and then all the sliders do is choose which row of a dataframe to look at, which provides the correct url from which to get and display the image?...
use jquery
Read the documentations
jquery library: http://jquery.com/
jquery slideshow plugin: http://www.twospy.com/galleriffic/

algorithm to find 'article' in webpage?

some browser plugin, like readability can extract the 'article' from a webpage. Does anyone has idea about how to do it? What's the difference between the real articles and ads or comments?
Well, it depends how you want to define "real articles"...
Taking HTML5 into consideration, a webpage is constructed of semantic tags. Pages no longer have to be built with elements like <div> that have exactly no semantic meaning. In HTML5 you may use <section>, <article>, <header> and so on. Those elements can give an application pretty good sense of what is the main content of a webpage (e.g. print <article>s and skip <nav>s...)
Of course, not many pages use those tags yet. Furthermore, the tags might get abused and lose their meaning. In that case I'd stick to some statistics, e.g. selecting the largest elements in a HTML document. Moreover, if you have to scrape a webpage, you could use a modification of some pattern-matching algorithm, DIPRE for instance.

What algorithms could I use to identify content on a web page

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.
This is my personal favorite: VIPS: a Vision-based Page Segmentation Algorithm
First, if you need to parse a web page, I would use HTMLAgilityPack to transform it to an XML. It will speed everything and will enable you, using a simple XPath to go directly to the BODY.
After that, you have to run on all the divs (You can get all the DIV elements in a list from the agility pack), and get whatever you want.
There's a simple technique to do this,based on analysing how "noisy" HTML is, i.e., what is the ratio of markup to displayed text through an html page. The Easy Way to Extract Useful Text from Arbitrary HTML describes this tex, giving some python code to illustrate.
Cf. also the HTML::ContentExtractor Perl module, which implements this idea. It would make sense to clean the html first, if you wanted to use this, using beautifulsoup.
I would recommend Vit Baisa's thesis on Web Content Cleaning, I think he has some code too, but I can't find a link for it. There is also a discussion of the very same problem on the natural language processing LingPipe blog.

MVC Framework - Server-side DOM manipulation

I'm building an MVC framework, and I'm looking for native solutions / frameworks / tag libraries to draw from or to replace my framework entirely.
I'm interested in the following features specifically:
server-side DOM manipulation
server-side events (page reload, form submit, node insertion, etc.)
traversing the DOM tree using css selectors
validation of html nodes nesting
validation of html nodes allowed attributes
support for tag libraries / user controls
Pretty much what you get with JavaScript, but on the server-side and with some little extras.
Any solution will do (even if partial), any language will do, any pointers are appreaciated (even from client-side languages, as long as it's possible to check the source code). Dealing with malformed html is not a prerequisite. Outputting valid markup is a big plus.
Please offer practical solutions by pointing the language/framework that is being discussed and, if possible, what features it provides.
have you checked out aptana jaxer?
If you load your page into a DOM-parser you would be able to modify it from there. Then outputting it to the output buffer seems trivial.
But you would need to store the entire document in memory, which will inflict on the performance.
So, jQuery has a sort of selectors API implemented, I guess I can take a look at their source code. Also, PHP has support for XPath, this could help too.
Found a php html dom parser that also implements some html selectors here: http://simplehtmldom.sourceforge.net
Fizzler uses HTMLAgility pack and adds a server side queryselectorall to provide css selection: http://code.google.com/p/fizzler/
Maybe you are looking for ItsNat

Why does my website need so much time to render?

When cached, my starting page only needs to load one element (the "root document") - but then it needs some time until it's rendered completely:
alt text http://www.walkner.biz/_temp/firebug_net.png
The elements following are things loaded asynchronous via JavaScript.
Two questions:
Why does it take so "long" from loading the root document until the DomContentLoaded-event?
Does it make sense to load some not-so-important things asynchronously? Is it important to have the DmoContentLoaded-event as early as possible? Unfortunately there's not much documentation about that event, but I don't think it's the moment when the page is displayed, is it?
I'm not sure YSlow is gonna help him as that will download all elements for a page and run performance tests on them, whereas swalkner's problem is how long it is taking to render the HTML page itself when all other elements (images, CSS, etc) are cached.
At least that's what I think he's saying.
In the original question you said, "The elements following are things loaded asynchronous via JavaScript." but then listed nothing. What is loaded?
I would suggest checking for Javascript errors in the first instance. Then try removing some of your asynchronous loading calls one by one until you hit the bottleneck. In fact, remove them all, how long does the downloaded HTML take to render? Take that time and work from there.
Is your HTML document very big? Does it use lots of inline styles that could be in the CSS file?
Perhaps if you posted a link to the site then people would have a look at it.

Resources