I am looking to create an application effectively like a very simple blog - with another article being added every few days. In order to control the appearance of the articles, I would like to write them directly using html so picures, links etc can be put in appropriate places and formatted as required. However, storing these articles in a database would seem to make the maintenance a lot easier and provide additional searching capabilities. Is it sensible to store such html / erb code in a database in this way? If not what are the alternatives?
The standard way to do this is to use a markup library, such as redcloth, to do this. You use something similar even here on SO - it looks like markdown, if you read the help link. The content can certainly be put in a database. It can be done with html and erb, but the reason it is not often done is safety.
If you are the only one using it, it may not be an issue, but if you allow anyone else to insert data, you can open yourself up to XSS attacks with html, or even code exploits if you allowed raw erb. Markup languages exist to limit the set of markup allowed and to remove the ability for scripting attacks.
see also: Better ruby markdown interpreter?
Update: What great timing, there was a railscast released today about this: http://railscasts.com/episodes/272-markdown-with-redcarpet
Related
I am getting data from a broken RSS feed that gives me wrong link. I wanted to fix this link so I made this code:
<link.*>(.*)&.*tid(.*)</link>
and the link could be like:
www.somedomain.com/?value=50&burrrdurrrr;tid=120
But the real working link is in this form:
www.somedomain.com/?value=50&tid=120
The thing that I'm asking is if my measure thing looks like this:
[FeedURL]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[Feed]
StringIndex=2 ;now I only get www.somedomain.com/?value=50
Substitute=#SubstituteFeed#
How am I supposed to concatenate the strings together to complete the url?
I'm guessing rather than &burrrdurrrr;, the link has &, which is how you have to write & in an HTML or XML file.
If that's the case, you just need to set the DecodeCharacterReference option, as described in this handy-looking tutorial. Another option mentioned there is Substitute, which would be able to strip it out even if it really was &burrrdurrrr;.
None of this is a particularly sensible way of dealing with HTML or XML - a much better approach would be a plugin which actually parsed the document structure and let you reference nodes using XPath or CSS rules - but you work with what you've got, I guess. (I've never heard of this "Rainmeter" before, despite its claim to be "the best known and most popular desktop customization program for Windows"; maybe because nobody else calls their program that, instead almost universally using the word "widget"?)
I would like to use tags like {{headline}} in the CodeIgniter views instead of PHP and I'm looking for a template parser. CodeIgniter has a built-in template parser: http://www.ellislab.com/codeigniter/user-guide/libraries/parser.html
The question is if it's better to use the built-in parser or another parser? Are there any limitations with the CI template parser like not supporting loops, if statements, etc.?
If so, there are a number of other parsers but it seems that a developer works on them for some time and then it falls into a numb state when it's not supported any more. I'm looking for a parser which will also be supported in a year:
Bucket
http://backstack.ca/projects/bucket/
Comper Template Parser
http://parser.comper.sk/en/
Ocular-Template-Library
http://github.com/lonnieezell/Ocular-Template-Library
Phil Sturgeon Template library
http://philsturgeon.co.uk/code/codeigniter-template
PyroCMS Lex Parser
http://github.com/pyrocms/lex
Template Library for CodeIgniter
http://www.williamsconcepts.com/ci/codeigniter/libraries/template/
The most active seem to be Comper and Lex Parser. What is the difference between Phil Sturgeon Template library and PyroCMS Lex Parser because it's the same developer?
What I am looking for is:
- Separation of PHP and HTML/CSS in views
- Solidly supported so that it's not stalled within a year
- Use of simple tags but also loops, if statements and other functions
Can anyone give me a tip? The existing information on the CI forum or elsewhere have not been really useful.
Many thanks!
Philip
How to Choose a Template Engine
I went through a similar exercise for choosing a PHP/CMS system, and here are some points that may carry over to your decision making process.
I first look at the documentation to get a sense of how much support there is for the system, evaluate range of features and so on. I also see if there is an online forum with enough activity to get some help if needed.
I then try out the installation to see if it goes smoothly. If I have trouble at this stage, I may simply quit and try another system unless there is an online forum or help desk with a ready answer.
I then set up a sample website (2-3 pages) and try out the features that I need. In the case of CodeIgniter, I may have content stored in multiple database tables and I evaluate how much effort it takes to get the data from my SQL queries into the array structure that can be used by the template system. This is usually the step that takes the most effort when developing the website.
I also check to see how easy I could integrate a PHP function into the mix. For example, I once had to build a specialized function to determine a range of dates and these dates had to be passed to the template engine. I was able to do it but it took a lot of effort. The template system had almost no support for parsing dates and I had to resort to a PHP function to do the work.
Summary
Ultimately, you will need to try a few of these systems out to get a feel for them. Once that is done, pick one that makes sense for your coding style, ease of use, and your data structure.
PS
I have not used the systems that you listed above but I have spent quite a bit of time using the template engine in Expression Engine (CMS from the same group that created CodeIgniter). My comments are based on my experience implementing database driven websites using Expression Engine and dealing with the limitations and quirks of that particular platform.
So what I would like to do is scrape this site: http://boxerbiography.blogspot.com/
and create one HTML page that I can either print or send to my Kindle.
I am thinking of using Hpricot, but am not too sure how to proceed.
How do I set it up so it recursively checks each link, gets the HTML, either stores it in a variable or dumps it to the main HTML page and then goes back to the table of contents and keeps doing that?
You don't have to tell me EXACTLY how to do it, but just the theory behind how I might want to approach it.
Do I literally have to look at the source of one of the articles (which is EXTREMELY ugly btw), e.g. view-source:http://boxerbiography.blogspot.com/2006/12/10-progamer-lim-yohwan-e-sports-icon.html and manually programme the script to extract text between certain tags (e.g. h3, p, etc.)?
If I do that approach, then I will have to look at each individual source for each chapter/article and then do that. Kinda defeats the purpose of writing a script to do it, no?
Ideally I would like a script that will be able to tell the difference between JS and other code and just the 'text' and dump it (formatted with the proper headings and such).
Would really appreciate some guidance.
Thanks.
I'd recomment using Nokogiri instead of Hpricot. It's more robust, uses less resources, fewer bugs, it's easier to use, and faster.
I did some scraping extensively for work on time, and had to switch to Nokogiri, because Hpricot would crash on some pages unexplicably.
Check this RailsCast:
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
and:
http://nokogiri.org/
http://www.rubyinside.com/nokogiri-ruby-html-parser-and-xml-parser-1288.html
http://www.engineyard.com/blog/2010/getting-started-with-nokogiri/
I may need to implement this sometime in the future, but I think the trigger for the question now is mainly curiosity.
I thought of how to write a text editor to a web site I'll build soon, and saw this site's (and other's) way, so I thought - isn't it a bit too complicated? If tags should be used from the first place, why not let users use HTML tags? The only reason I can think of is HTML injection which I don't know much about, but it sounds like an easy issue to solve, isn't it?
Thank you.
Simply because not all of your users will know HTML. *bold text* is a lot more easy to understand (and read in it's raw form) than <b>bold text</b>. Especially if you get into links.
The reason we use Markdown, Textile and the rest is to provide a nice alternative that's accessible to more users.
Of course you can still provide the ability to use HTML to your users (it's in the Markdown spec) but you'll have to do a lot of checking to make sure there's nothing malicious going on - for example, blocking <script>, <iframe>, large images, javascript in the form <a href="javascript:alert("...");"> etc.
There are several reason why you should not use HTML tags in such an editor:
1) It might be less complex for the user if you introduce an own reduced tag set
2) HTML Injection: There is a big risk of dangerous HTML code getting injected.
If you really want to allow HTML code you have to be very careful.
Historically, systems like BBCode were designed to limit available formatting elements to things that would not break the layout of the site, but now, with more mature and smarter HTML parsers, it's not necessary to invent a new markup language just to bar certain un-safe HTML tags.
The current main reason I've seen is that HTML is foreign to most users, and the HTML substitutes are aimed at providing a simplified version of the formatting directives an every-day user would need.
HTML script injection is most emphatically not an easy problem to solve. HTML is a fairly complicated, non-regular language - detecting all possible vulnerabilities is a really hard problem. Many sites have tried, and failed. It's easier, from a vulnerability-prevention POV, to just prohibit HTML entirely, or allow only a small subset of tags.
I want in an application with a simple text input, enriched with some marks to include formatting or semantic labeling. I want the syntax as easy as possible and I want to include self-defined labels.
Example:
[bold]Stackoverflow[/bold] is a [tag]good[/tag] resource for programmers.
Tables would be needed too.
HTML/XML and LaTeX are mighty enough to allow this, but too complicated. Wiki-Syntax seems simple, but uses another symbol for each markup, has unclear quoting and every Wiki seems to have another syntax. For tables and similar stuff Wiki becomes very complicated.
Exists a language/syntax, that matches my needs or can be slightly changed to do so? Or do I have to invent something myself? In that case, do you have suggestions?
Definitely do NOT invent your own. There are plenty of simple markup languages already, and users HATE learning new ones. Trust me on this!
I would suggest using one of the following:
Textile
Markdown
BBCode
Make your decision based on your userbase, as well as what tools and parsers are available in your chosen language. For my site, we went with Textile, but I've found that BBCode tends to be the language that most people already know. However, this will vary with different user demographics.
StackOverflow, along with several other sites, uses Markdown. I think it will give you the best balance between features and simplicity.
Let me add ReStructuredText to the list.
An additional benefit of using it is given by the availability of ReStructuredText to Anything service that makes extremely easy to create HTML or PDF versions of the document.
As already pointed out there are a lot of lightweight markup languages (many are listed here: wikipedia article), there should be no need of creating your own.