CodeIgniter santizing POST values - codeigniter

I have a text area in which I am trying to add youtube embed code and other HTML tags. $this->input->post is converting the <iframe> tags to < and > respectively but not the <h1> and <h2> tags.
Any idea how I can store these values?

If you only have a small number of forms that you need to allow iframes in, I would just write a function to restore the iframe (while validating that it's a valid YouTube embed code).
You can also turn off global_xss_filtering in your config (or not implement it if you're using it), but that's not the ideal solution (turning off all of your security to get one thing to work is generally a horrible idea).
$config['global_xss_filtering'] = FALSE;
To see all of the tags that get filtered out, look in the CI_Input class and search for the '$naughty' variable. You'll see a pipe-delimited list (don't change anything in this class).

Why don't you avoid CIs auto sanitizing and use something like htmlspecialchars($_POST['var']); ? Or make a helper function for sanitizing youtube urls...

Or you could either just ask for the video ID code or parse the code from what you are getting.
This would let you use both the URL or the embed code.
Also storing just the ID takes less space in you database, and you could write a helper function to output the embed code/url.

In this case, use $_POST instead of $this->input->post to get the original text area value, and then use HTML Purifier to clean the contents without losing the <iframe> tag you want.
You will need to check HTML Purifier documentation for details. Please, check this specific documentation page about "Embedding YouTube Videos".

Related

Laravel Save Markdown to Database - Don't Understand

I am reluctant to post this, but I am having trouble understanding how markdown actually "saves" to a database.
When I'm creating a migration, I will add columns and specify the type of value (i.e. integer, text, string, etc.) and in the course of operation on the website, users will input different information that is then saved in the DB. No problem there.
I just can't seem to wrap my head around the process for markdown. I've read about saving the HTML or saving the markdown file, rendering at runtime, pros and cons all that.
So, say I use an editor like Tiny MCE which attaches itself to a textarea. When I click "Submit" on the form, how does that operate? How does validation work? Feel free to answer my question directly or offer some resource to help further my understanding. I have an app built on Laravel so I'm guessing I'll need to use a package like https://github.com/GrahamCampbell/Laravel-Markdown along with an editor (i.e. Tiny MCE).
Thanks!
Let's start with a more basic example: StackOverflow. When you are writing/editing a question or answer, you are typing Markdown text into a textarea field. And below that textarea is a preview, which displays the Markdown text converted to HTML.
The way this works (simplified a little) is that StackOverflow uses a JavaScript library to parse the Markdown into HTML. This parsing happens entirely client side (in the browser) and nothing is sent to the server. With each key press in the textarea the preview is updated quickly because there is no back-and-forth with the server.
However, when you submit your question/answer, the HTML in the preview is discarded and the Markdown text from the textarea is forwarded to the StackOverflow server where is is saved to the database. At some point the server also converts the Markdown to HTML so that when another user comes alone and requests to view that question/answer, the document is sent to the user as HTML by the server. I say "at some point" because this is where you have to decide when the conversion happens. You have two options:
If the server converts the HTML when is saves it to the Database, then it will save to two columns, one for the Markdown and one of for the HTML. Later, when a user requests to view the document, the HTML document will be retrieved from the database and returned to the user. However, if a user requests to edit the document, then the Markdown document will be retrieved from the database and returned to the user so that she can edit it.
If the server only stores the Markdown text to the database, then when a user requests to view the document, the Markdown document will be retrieved from the database, converted to HTML and then returned to the user. However, if a user requests to edit the document, then the Markdown document will be retrieved from the database and returned to the user (skipping the conversion step) so that she can edit it.
Note that in either option, the server is doing the conversion to HTML. The only time the conversion happens client-side (in the browser) is for preview. But the "preview" conversion is not used to display the document outside of edit mode or to store the document in the database.
The only difference between something like StackOverflow and TinyMCE is that in TinyMCE the preview is also the editor. Behind the scenes the same process is still happening and when you submit, it is the Markdown which is sent to the server. The HTML used for preview is still discarded.
The primary concern when implementing such a system is that if the Markdown implementation used for preview is dissimilar from the implementation used by the server, the preview may not be very accurate. Therefore, it is generally best to choose two implementations that are very similar or, if available, use the same implementations for both.
It is actually very simple.
Historally, in forums, there used be BBCodes, which are basically pseudo-tags that allow you to format your text in some say. For example [b][/b] used to mean "make this text bold". In Markdown, it happens the exact same thing, but with other characters like *text* or **text**.
This happens so that you only allow your users to use a specific formatting, otherwise if you'd allow to write pure HTML, XSS (cross-site scripting) issues would arise and it's not really a good idea.
You should then save the HTML on the database. You can use, for example, markdown-js which is a Markdown parser that parses Markdown to HTML.
I have seen TinyMCE does not make use of Markdown by default, since it's simple a WYSIWYG editor, however it seems like it also supports a markdown-like formatting.
Laravel-Markdown is a server-side markdown render helper, you can use this on Laravel Blade views. markdown-js is instead client-side, it can be used, for example, to show a preview of what you're writing in real-time.

CKEditor and HTML in Xpages

I am understanding this better but still not there yet.
I have a notes document with a rich text field. I want to edit it in Xpages, so that the user can enter text for an email that an agent will generate. The idea is that the user should be able to enter styled text, hopefully including pasted graphics, and this is saved to the rich text field in such a way that a later agent can copy that field to the body of an email.
On the form I have checked the field "Store contents as HTML and MIME.
In the Xpage I have bound the CKEditor directly to the field (can bind it to a scope variable if necessary).
The code in my agent is as follows:
Set rtItmFrm = emlDoc.getFirstItem("Body")
Set rtItmTo = New NotesRichTextItem(mail,"Body")
Set rtItmTo = rtItmFrm.Copyitemtodocument(mail,"Body")
Any further suggestions on reading up on MIME/CKEditor etc would also be much appreciated.
Bryan
=========================================================================
I just discovered how to modify the CKEditor in Xpages (the Rich Text Control). I have the full menu and one or two more things turned out. However, I am really puzzled by how it treats HTML. I would like to put a template for a nice HTML email (like a newsletter). Anything even a little complicated it munges and the output is messed up.
I read enough online to understand that it is not supposed to be a HTML editor, but I am really having trouble getting the results I want. I would love to put some basic skeleton HTML in there, but everything but the simplest code doesn't work.
Is there anyway to import HTML and it not get messed up using this editor?
as Per and Stephan said, Have a look at ACF filtering that is 'server side' (This is not related to CKEditor itself, but it is related to XPages).
If you have a look at the inputRichText control you will see 2 properties.
htmlFilter
htmlFilterIn
These properties determine how to filter Html on the way in to your data, and also on the way out.
This can be used to strip styling out, and also to prevent dangerous tags like some bad code here etc.
By Default the htmlFilter is set ACF (Active Content Filtering) if you look at the default rules, you will see it strips things like 'margin' out.
see /properties/acf-config.xml-sample
There is a filter called 'identity' which means don't filter anything, however beware if you use this you are not protected from and maliciously entered html.
You should look into defining your own set of rules for your ACF filter, this way you can choose which elements to remove. There is a section in Mastering XPages book about this.
If you still have any trouble, then there are some settings in CKEditor config which also control ACF (totally separate to XPages server side)
I don't think CKE changes the HTML, it is the writing back to a RT field.
Try and bind your RichText Editor to a scoped variable instead of a RichText field. This way you have access to the raw HTML and can use that to generate a MIME email. You might want to have a look at Mustache for mail merge.
Use this article series as starter how to prepare CK editor to make this possible.
And as Per mentioned: check the filtering.

regarding image downloading function

Can anyone help me to solve my issue regarding the image downloading function? The situation goes like this: actually I wanna download a gantt chart image from a site that generates some string url as the image! not even http://www.example.com/img/image.png but something like http://www.example.com/img/index.php?=task&d=&Work=0...
Disregarding what language or environment you are working in, simply using the full URL with all the GET variables in place will provide you with the image.
It should not matter.
Judging from your comments below, your code is not working because you are using PHP's htmlspecialchars function.
Htmlspecialchars will turn symbols that you cannot represent in HTML output simply by adding them to the source of an html document (such as &, < and >) into identifiers that will let the browser know what kind of character to render.
for instance the ampersand (&) could be rendered by & the HtmlSpecialChars function does this for you.
When your backend code is outputting parts of html source that aren't visible to the user, such as the source of an image in this case, you do not want to use that function.
It will invalidate the URL by replacing all the & instances in the url by &
Simply do this:
<?php print($url); ?>

Download pdf or image through ajax

I would like to send a lot of data through ajax request to my server which will generate pdf or jpg format according to that data.
Now i have done all that, my issue is to how output that generated pdf/jpg back to the user trough ajax? I guess i might be able to use json for that, but im not really sure how, and i think there would be a lot issues with pdf.
Also if some one gonna suggest using form with hidden inputs that will not work since i have really big multidimensional array with lot's of data and it would simply take to much effort to make it work.
By the way, i am using jquery, but anything else is acceptable as long as it does the job done without making me to rewrite half of my script.
To display a JPG
AJAX: You can return the data hex encoded (be sure to set the content type appropriately: header('Content-type: image/jpeg')). Then you just inject an <img/> element into the DOM and set it's src attribute to the returned Data URI.
HTML: Also, you could inject the <img/>'s with a normal src URL to some location on your server.
For PDF
It's a little more tricky. Some browsers display PDF's natively (Chrome/Firefox), others rely on optional third-party plugins. You can detect these plugins, but can't control whether the PDF is displayed in a window/frame or is downloaded.
If you choose to display, you can create a new window/tab to display it or display it in an iframe dynamically.

How to retrieve plain text from a formatted website to use in UIWebView

Not sure if what I want to do is possible, but what I am hoping to do is somehow gather certain pieces of text from a website, remove the header, footer, background, all formatting, and place it into my application in a scrollview or something similar...
I'll give you an example... Imagine I was making wikipedia's iPhone app, I want to download the information about the wiki on dogs, without the header, side bars etc, just the text. How would I go about doing this?
I understand that for this I have not provided any example code or what I've tried or started, but that's just because in this case I'm lost! That doesn't mean I want full chunks of code either. Any help will do. If this doesn't work, I will just have to make a 'mobile optimised' version of the webpages I want to include in my app.
Thanks
(Edit: the term I was trying to use was 'strip the web page of its HTML coding')
You may be going about this the wrong way, or perhaps even asking the wrong question.
Does the target website have an API or datafeed of some kind?
Can you get the information you need in JSON or XML format directly from the site?
I think you've misunderstood the technology. HTML is merely the framwork on which the formatting and data is hung.
Parsing the HTML page seems like an awfully big headache, I doubt you'll ever be able to get it to work, because almost all sites these days are partially or wholly generated on the server side, the page is only the result.
Some sites hide the information in memory and others get it dynamically through ajax for example, which means that simply trying to get the data by parsing the HTML will get zero data.
Another issue you should be aware of though, is that simply copying the data from generated websites may open yourself up to copyright issues.
You have to parse the html code and search for the part that you want and "throw" away the part that you do not need. This is more or less like bruteforcing and the code of the website should not change otherwise you are screwed. So you have to write the parser by hand with this method. But maybe there is a atom or rss feed and you can parse this one. This will be much more easier and you are not depending on the website layout because the rss/atom feed is just about the data. For parsing rss you could try out NSXMLParser.
And then you have to make a valid html page out of the data and present it in the UIWebView

Resources