MVC Framework - Server-side DOM manipulation - model-view-controller

I'm building an MVC framework, and I'm looking for native solutions / frameworks / tag libraries to draw from or to replace my framework entirely.
I'm interested in the following features specifically:
server-side DOM manipulation
server-side events (page reload, form submit, node insertion, etc.)
traversing the DOM tree using css selectors
validation of html nodes nesting
validation of html nodes allowed attributes
support for tag libraries / user controls
Pretty much what you get with JavaScript, but on the server-side and with some little extras.
Any solution will do (even if partial), any language will do, any pointers are appreaciated (even from client-side languages, as long as it's possible to check the source code). Dealing with malformed html is not a prerequisite. Outputting valid markup is a big plus.
Please offer practical solutions by pointing the language/framework that is being discussed and, if possible, what features it provides.

have you checked out aptana jaxer?

If you load your page into a DOM-parser you would be able to modify it from there. Then outputting it to the output buffer seems trivial.
But you would need to store the entire document in memory, which will inflict on the performance.

So, jQuery has a sort of selectors API implemented, I guess I can take a look at their source code. Also, PHP has support for XPath, this could help too.

Found a php html dom parser that also implements some html selectors here: http://simplehtmldom.sourceforge.net

Fizzler uses HTMLAgility pack and adds a server side queryselectorall to provide css selection: http://code.google.com/p/fizzler/

Maybe you are looking for ItsNat

Related

Unable to identify element in Blue Prism using XPath

I have spied an input text box using the Application Modeller of Blue Prism and was able to successfully highlight the text box using the below XPath:
/HTML/BODY(1)/DIV(4)/main(1)/DIV(1)/DIV(1)/DIV(1)/DIV(2)/DIV(1)/DIV(1)/DIV(2)/IFRAME(1)/HTML/BODY(1)/DIV(2)/FORM(1)/DIV(3)/TABLE(2)/TBODY(1)/TR(1)/TD(1)/DIV(1)/DIV(1)/DIV(1)/DIV(2)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/SPAN(1)/DIV(1)/DIV(2)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/DIV(1)/TABLE(1)/TBODY(1)/TR(1)/TD(1)/INPUT(1)
I wanted to use a more robust XPath and to achieve that I was trying to use the below XPath:
//*[#id="CT"]/div/div/div/div[1]/div[1]/table/tbody[1]/tr/td/input[1]
The above XPath was identifying the element correctly in Chrome but was getting the below error message when trying the same in Blue Prism:
Error - Highlighting results - Object reference not set to an instance of an object.
Let me know if I am doing anything incorrectly.
Sorry for replying to a pretty old one! The workaround we've devised for this scenario (where making the path dynamic requires too long of a loop / search) is to use Jquery snippets. If the page is using jquery it is trivial to execute these queries very quickly using the blue prism capability of executing javascript functions.
And we put in an enhancement request, because it'd be supremely useful functionality.
Update: As a user points out below, the vanilla js querySelector method is probably safer and more future proof than using jquery if it is possible to be used.
Blue Prism does not fully support the XPath spec; alas the construct you're attempting to use here won't work.
Alternatively, you can set the Path attribute of an application modeler entry to be Dynamic, which allows you to insert dynamic parameters from the process/object level to pinpoint elements you'd like to interact with.
Unfortunately Blue Prism doesn't actually use "real" XPaths, but only an extremely limited subset: Absolute paths without wildcards. (Note: It is technically possible to match the XPath to a string with wildcards, but this seemingly causes BP to check every single element in the document, and is so slow it is almost never the right solution.)
For cases where an element can't be robustly identified via the BP application modeler (maybe because it requires complex or dynamic selectors), my workaround is to inject a JS snippet. JS can select elements much more reliably, and it can then generate the BluePrism path for that element.
Returning data from JS to BluePrism is not trivial, but one of the nicer solutions is to have JS create a <script id="_output"> element, put JSON inside it, then have BluePrism read the contents of this element.

Combining Require.js, Backbone.js and a server side MVC framework

We're actually planning a really complex web application. At least for my own standards.
In the past we have always been using a combination of a server side MVC Framework (Codeigniter) and client side functionality (jQuery and plugins). We have simply written inline javascript code in our views. This worked as expected, but of course with several disadvantages:
no caching
duplicated js code
maintainability issues
...
My main goal now is to organize the client side code in an efficient and easily maintainable way. But I want to stay with the server side MVC because of the existing know how and some existing interfaces. Furthermore I want to reduce complex DOM manipulation with jQuery and "spaghetti code".
Now I thought about a combination of Backbone.js and Require.js but I really can't find a tutorial or any solid description about how to combine them with a server side MVC.
Is it even recommended?
In my old apps I got a file structure like this:
application (CodeIgniter)
assets
js
css
imgs
Are there any ideas or best practices?
Thank you!
To add to mexique1's advice, it might be worth looking at the backbone-boilerplate project. It should provide you best-practice solutions for many of the problems you're currently considering, such as the combination of require and backbone, the organisation of the client-side of your project, and the reduction of complex DOM manipulation (see templating).
The challenge, as you anticipate, will most likely be in combining the boilerplate approach with the approach you're used to. However, it will almost certainly be worth the effort since it should provide you a solid foundation for this and future projects.
I think Backbone is a good choice, and Require is not mandatory here.
Require will just help you organize your source code and maybe improve performance. I think you can start right away with Backbone, which will be the thing you are going to use most, and add Require later.
Regarding Backbone, yes it's easy to use to use its Model with an existing MVC application, provided it returns JSON. To load your existing data you will want to use the fetch method combined to url to adapt to your existing code, or your own method.
Generally, think about which models are displayed in which views. Backbone helps you think this way : I'm displaying Models represented as JSON data in Views which are made by HTML.
Also, for the view layer, it's very easy to reuse your existing HTML, because views are not tied to anything, no JavaScript templating or nothing.
Simple example :
<div id="user">
<span class="name">John</span>
</div>
var UserView = Backbone.View.extend({
render: function() {
this.$el('.name').html(this.model.get('name'));
}
});
var userView = new UserView({el: $('#user')[0], model: ...});
In this example the #user div reflects the state of a User model, with its name.
Also check the Todo App example in Backbone.

Algorithm / API for converting HTML to email friendly HTML (for newsletters)

I'm sure this is a very old question, but I could not find a straight answer
I'm looking for a works-mostly algorithm to take regular HTML content, and make it email client friendly.
I can rewrite any nice DIV layout to table layout, this is OK, but is there anything that will do it for me?
Here are my concerns
Overflow content - gmail etc ignores any overflow:hidden, the algorithm should address it
Clipped images - same as above, but here the solution will probably be server side clipping
CSS / Script / non standard tags - the algorithm should remove but keep the general look and feel
DIV layout to table layout, I heard it's a must, but I'm sure it's not an easy task to automate
There are many HTML to PDF converters, but I could not find a good HTML to "HTEMAIL" converter
Is there any standard or proposed standard for HTML for email clients? or is it an open jungle out there?
There is no way to make a converter that will be cross email client compatible. The closest you can get is using templates and adding text in certain sections using php or .net
I've been creating emails for 6 months, and the amount of time you spend correcting email client differences is normally around 50% of the time you spend making the email.
Here is some reading that may help you:
http://www.sitepoint.com/code-html-email-newsletters/
http://www.campaignmonitor.com/css/
As you can see from that last link there is no way to create an algorithm that can sort out all these issues.
Hope this helps
Another option that I've been using is to build the email in HTML or directly in Mailchimp. Once I'm happy with it, using Mailchimp, I click on preview and I get the email in a popup. The source code from the popup is email-client friendly (in tables). I then copy that code and use it for my emails.
Not ideal and a bit of trouble, but so far the best solution I can find.
And before people ask, I mostly us Mailchimp directly, but there is one situation that I have to kick it old school.

HTML efficiency, best practices

In the past, I created some divs to act like articles. Now I am thinking about changing it to HTML5 tag article...
Is there an important diference (in terms of efficiency) between using HTML elements or using equivalent divs created by the user?
For example: Will the browser load the pages faster if they are built only with HTML elements?
Short answer: No.
Long answer: maybe, if it will decrease the amount of markup you use. But not likely.
The benefit of using semantic tags is to add more meaning to the markup, not improve performance.
May be. When you cretae a div and add styling to it, the browser needs to first interpret the element and then process the style over it and render it. If you use the appropriate HTML element, it would put less burden on the rendering engine.

What algorithms could I use to identify content on a web page

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.
This is my personal favorite: VIPS: a Vision-based Page Segmentation Algorithm
First, if you need to parse a web page, I would use HTMLAgilityPack to transform it to an XML. It will speed everything and will enable you, using a simple XPath to go directly to the BODY.
After that, you have to run on all the divs (You can get all the DIV elements in a list from the agility pack), and get whatever you want.
There's a simple technique to do this,based on analysing how "noisy" HTML is, i.e., what is the ratio of markup to displayed text through an html page. The Easy Way to Extract Useful Text from Arbitrary HTML describes this tex, giving some python code to illustrate.
Cf. also the HTML::ContentExtractor Perl module, which implements this idea. It would make sense to clean the html first, if you wanted to use this, using beautifulsoup.
I would recommend Vit Baisa's thesis on Web Content Cleaning, I think he has some code too, but I can't find a link for it. There is also a discussion of the very same problem on the natural language processing LingPipe blog.

Resources