When loading custom fonts from Google they offer a way for you to optimize them: https://developers.google.com/webfonts/docs/getting_started#Optimizing_Requests
Meaning if you are you using a custom font for just a header, rather than loading the whole font, you can tell it what letters you need so not to load the entire font alphabet. So far so good.
However, in the example it only has one font style. How do you do it with two?
For instance I'm using this to load to styles of a font:
The normal 400 I use for a lot of text, but the 400italic is only used for one short header.
If I do:
Will it load the entire 400 and just the "sample header" for 400italic like I want or will it do something else?
If you're doing character subsetting and trying to combine multiple fonts into one CSS request, then the specified characters need to be the union of all the required characters between all the fonts.
Take a look here: http://jsbin.com/welcome/22005/edit
<link href="http://fonts.googleapis.com/css?family=Inconsolata|Skranji&text=ABCDHIJK" rel="stylesheet">
Two things going above. We're asking for two fonts (Inconsolate and Skranji) and specifying that both fonts should serve characters A-K. If you only need A-D for one, and H-K for another, then you'll have to split the above into two distinct CSS requests.
Related
I'm trying to "upload" an html-converted .docx file into a CKEditor. So far, the convertion from .docx to html is nearly perfect and I'm able to pass the code from Java(Spring/Maven) to my webapp(ZK framework, using native CKEditor and JavaScript).
The problems I've had so far revolve around the fact that the loaded text is either half-formatted or not formatted at all, and that's the actual reason I'm working in this (To avoid loss of format present by copy-pasting). I've managed to find the reason of this behaviour: CK likes HTML tags OR won't use multiple styles per container (ie. style="font-weight: bold" is Ok, but style="font-style: italic; font-weight: bold" isn't, it will pick one or another) and Docx4j uses inline styling for formating because of XHTML (As far as I've read).
After that I tried to force the styles in CKEditor by the config file, but that wasn't the solution as one element will overwrite the another, resulting in only one style being used.
With all that, I decided to manipulate a test docx (It's literally a "hello world" line bold, with italics and underline), converted it and forced the tags b, i and u on the resulting HTML file through Java. The result was the desired one.
Now my focus is to config docx4j so it uses tags instead of inline css, as so far it's the "easiest" solution and I liked the resulting html from it. After reasing some more I came across an old class with a method that (by name) will do that, but it's not present in my imported library. I tried both, new and old methods to convert to html but the results are the same.
Is there a setting or a way to let docx4j (v8.2.3 reference) know that I want html tags instead of css styles? I've seen the examples and looked into the javadoc, but it's a bit outdated and didn't really helped me that much. This seems to be the only way to do this, or build my own parser, which is simply not an option due time constraints.
Thanks!
I'm using Full HTML filter, with CKEditor. The following filters are enabled:
Align images
Caption images
Track images uploaded via a Text Editor
Collapsible text blocks
Note that Limit allowed HTML tags and correct faulty HTML is NOT enabled.
when I add a style attribute to a table element in Ckeditor using the Source view, specifically "width=75%", it is stripped when the page is rendered. When I edit the page again and go to Source view, the style tag is there.
What is stripping it on render?
I believe inline styles are removed by default for security reasons. But, there has been a lot of discussion about this issue on Drupal.org over the past few years. If you're looking for a workaround and accept the risk, here are two approaches I have found:
How to fix: CKEditor is removing style attributes. Drupal 8.
Refactor Xss::attributes() to allow filtering of style attribute values
Fair warning: I have not personally implemented either of these.
Inline style is stripped by default with Basic HTML formatter. Unless you have a specific reason why you don't want to turn on Limit allowed HTML tags I highly recommend that you do because it gives you a lot of control over what tags you and others can use in the wysiwyg. In addition, it allows you to add a "Styles" button with pre-configured styles so you don't have to insert inline CSS code repetitively.
I have been working on a MVC3 new project where I wanted to introduce the concepts of dynamic themes.
Instead of creating a bunch of .css files and dynamically linking to the
right one, I wanted to use a <style> section in master <head> section
that specifies the values to use for the selectors and their properties.
The values would be pulled from a database and written to header section in style,
look like this:
<head>
<style type="text/css">
.testClass { color:Purple;background-color:LightGreen; }
</style>
</head>
Not an answer on how to achieve this end, per se, as much as a suggestion that you reconsider. I have seen this approach taken firsthand several times over the years, and it invariably ends up first with writing a proprietary tool to edit the database themes and subsequently with an expensive rewrite to extract all the themes out of the database and into proper css files.
One typical reason to go down the path of putting styles in the database tends to be a desire to allow a given style to be "overridden" on a case-by-case basis - for instance, in an application service provider model, where one customer wants to change only one or two of the default styles. However, the "cascading" in "cascading style sheets" allows this exact behavior, without abandoning all the goodness of proper css and the associated tools - as long as you sequence the stylesheets in the correct order in the page head (e.g. "maintheme.css" first, then "customerX.css"), you only need to redefine the styles of interest in the customer's stylesheet and they will automatically override those in the main theme's stylesheet (assuming the css selectors otherwise have the same precedence).
A related, but slightly different reason given for going with database-driven stylesheets is to allow end users or business owners to edit the styles themselves. With a few exceptions, that sort of feature turns out to be less used and more difficult to maintain in practice than it seems when drawing it up. In this case, the number of styles being customized is theoretically quite small - presumably entirely constrained - and you'd be writing a proprietary tool to allow them to be edited, regardless, so again I would suggest simply writing out the customized styles to a css file on a filesystem, rather than a database (or as a blob to a CDN, etc.).
Is there any way to fetch the raw contents of a CSS file?
Lets imagine that I wanted to fetch any vendor-specific css properties from a CSS file. I would need to somehow grab the CSS contents and parse them accordingly. Or I could just use the DOM to access the rules of a CSS file.
The problem is that in while using the DOM, mostly all browsers (except for <= IE8) tend to strip out all of the custom properties that do not relate to their browser engine (webkit strips out -moz and -o and -ms). Therefore it wouldn't be possible to fetch the CSS contents.
If I were to use AJAX to fetch the contents of the CSS file, then if that CSS file hosted on another domain, then the same origin policy would break and the CSS contents could not be fetched.
If one were to use a cross-domain AJAX approach then there would only be a JSONP solution which wouldn't work since we're not parsing any javascript code (therefore there is no callback).
Is there any other way to fetch the contents?
If a CSS file is on the same domain as the page you're running the script on, you can just use AJAX to pull in the CSS file:
$.get("/path/to/the.css", function(data) {/* ... */});
If not, you could try using Yahoo! Pipes as a proxy and get the CSS with JSONp.
As for parsing, you can check out Sizzle to parse the selectors. You could also use the CSS grammar (posted in the CSS standards) to use a JS lex/yacc parser to parse out the document. I'll leave you to get creative with that.
Good luck!
No, you've pretty much covered it. Browsers other than IE strip out unknown rules from their object models both in the style/currentStyle objects and in the document.styleSheets interface. (It's usually IE6-7 whose CSS you want to patch up, of course.)
If you wanted to suck a stylesheet from an external domain you would need proxy-assisted-AJAX. And parsing CSS from would be a big nasty job, especially if you needed to replicate browser quirks. I would strenuously avoid any such thing!
JSONP is still a valid solution, though it would hurt the eyes somewhat. Basically, in addition to the callback padding, you would have to add one JSON property "padding" and pass the CSS as a value. For example, a call to a script, http://myserver.com/file2jsonp/?jsonp=myCallback&textwrapper=cssContents could return this:
myCallback("cssContents":"body{text-decoration:blink;}\nb{text-size:10em;}");
You'd have to text-encode all line breaks and wrap the contents of the CSS file in quotes (after encoding any existing quotes). I had to resort to doing this with a Twitter XML feed. It felt like such a horrible idea when I built it, but it did its job.
How would you solve this problem?
You're scraping HTML of blogs. Some of the HTML of a blog is blog posts, some of it is formatting, sidebars, etc. You want to be able to tell what text in the HTML belongs to which post (i.e. a permalink) if any.
I know what you're thinking: You could just look at the RSS and ignore the HTML altogether! However, RSS very often contains only very short excerpts or strips away links that you might be interested in. You want to essentially defeat the excerptedness of the RSS by using the HTML and RSS of the same page together.
An RSS entry looks like:
title
excerpt of post body
permalink
A blog post in HTML looks like:
title (surrounded by permalink, maybe)
...
permalink, maybe
...
post body
...
permalink, maybe
So the HTML page contains the same fields but the placement of the permalink is not known in advance, and the fields will be separated by some noise text that is mostly HTML and white space but also could contain some additional metadata such as "posted by Johnny" or the date or something like that. The text may also be represented slightly different in HTML vs. RSS, as described below.
Additional rules/caveats:
Titles may not be unique. This happens more often than you might think. Examples I've seen: "Monday roundup", "TGIF", etc..
Titles may even be left blank.
Excerpts in RSS are also optional, but assume there must be at least either a non-blank excerpt or a non-blank title
The RSS excerpt may contain the full post content but more likely contains a short excerpt of the start of the post body
Assume that permalinks must be unique and must be the same in both HTML and RSS.
The title and the excerpt and post body may be formatted slightly differently in RSS and in HTML. For example:
RSS may have HTML inside of title or body stripped, or on the HTML page more HTML could be added (such as surrounding the first letter of the post body with something) or could be formatted slightly differently
Text may be encoded slightly differently, such as being utf8 in RSS while non-ascii characters in HTML are always encoded using ampersand encoding. However, assume that this is English text where non-ascii characters are rare.
There could be badly encoded Windows-1252 horribleness. This happens a lot for symbol characters like curly quotes. However, it is safe to assume that most of the text is ascii.
There could be case-folding in either direction, especially in the title. So, they could all-uppercase the title in the HTML page but not in RSS.
The number of entries in the RSS feed and the HTML page is not assumed to be the same. Either could have more or fewer older entries. We can only expect to get only those posts that appear in both.
RSS could be lagged. There may be a new entry in the HTML page that does not appear in the RSS feed yet. This can happen if the RSS is syndicated through Feedburner. Again, we can only expect to resolve those posts that appear in both RSS and HTML.
The body of a post can be very short or very long.
100% accuracy is not a constraint. However, the more accurate the better.
Well, what would you do?
I would create a scraper for each of the major blogging engines. Start with the main text for a single post per page.
If you're lucky then the engine will provide reasonable XHTML, so you can come up with a number of useful XPath expressions to get the node which corresponds to the article. If not, then I'm afraid it's TagSoup or Tidy to coerce it into well formed XML.
From there, you can look for the metadata and the full text. This should safely remove the headers/footers/sidebars/widgets/ads, though may leave embedded objects etc.
It should also be fairly easy (TM) to segment the page into article metadata, text, comments, etc etc and put it into fairly sensible RSS/Atom item.
This would be the basis of taking an RSS feed (non-full text) and turning it into a full text one (by following the permalinks given in the official RSS).
Once you have a scraper for a blog engine, you can start looking at writing a detector - something that will be the basis of the "given a page, what blog engine was it published with".
With enough scrapers and detectors, it should be possible to point a given RSS/Atom feed out and convert it into a full text feed.
However, this approach has a number of issues:
while you may be able to target the big 5 blog engines, there may be some blogs which you just have to have that aren't covered by them: e.g. there are 61 engines listed on Wikipedia; people who write their own blogging engines each need their own scraper.
each time a blog engine changes versions, you need to change your detectors and scrapers. More accurately, you need to add a new scraper and detector. The detectors have to become increasing more fussy to distinguish between one version of the same engine and the next (e.g. everytime slashcode changes, it usually changes the HTML, but different sites use different versions of slash).
I'm trying to think of a decent fallback, but I'll edit once I have.
RSS is actually quite simple to parse using XPath any XML parser (or regexes, but that's not recpmmended), you're going through the <item> tags, looking for <title>, <link>, <description> .
You can then post them as different fields in a database, or direcrtly merge them into HTML. In case the <description> is missing, you could scrape the link (one way would be to compare multiple pages to weed-out the layout parts of the HTML).