How to prevent trimming whitespaces in JSP page - jstl

Example
String var="welcome to JSP";
<c:out value=${test}"/>
the above standard tag library trimming the spaces from string var , also tried to display same var on JSP without JSTL still the whitespaces have been takenout.

The JSTL doesn't trim white spaces at all. Look at the generated HTML by right-clicking in the page and choosing "view page source", and you'll see that the white spaces are there.
HTML does that. One white space or 100 successive ones are rendered the same way in HTML (as a single white space), unless you use a CSS style that makes them relevant, like for example
<pre> Now white space is
relevant
</pre>

Related

CKEditor moving br tags

I'm having a problem with CKEditor changing my original paragraph formatting with negative side effects.
I start with a basic paragraph loaded into CKEditor using setData():
<p><span style="font-size:50px">My Text</span></p>
... more document content ...
In the editor, I move the cursor to the end of the phrase "My Text" and press enter (with config.enterMode=CKEDITOR.ENTER_BR setting enabled). Inspecting the markup inside the editor I now see:
<p><span style="font-size:50px">My Text<br><br></span></p>
... more document content ...
Then, when I call getData() to pull the contents from the editor and save the document to a database, the HTML extracted by getData() looks like this:
<p><span style="font-size:50px">My Text</span><br> </p>
... more document content ...
This is a problem because while editing, the <br> tag was inside the <span> and was subject to the 50px font size style. The user saw a 50px blank line before the next piece of document content. After saving the HTML to a database and reloading later the <br> tag is now outside the <span> and is not subject to the 50px font sizing and the blank line appears much smaller than before.
The round trip fidelity of the text formatting is not preserved and the user is frustrated by the results.
Can someone help me understand the results I'm seeing with <br> tags being reformatted and moved around during the editing life cycle, and how I might fix this problem?
Using CKEditor v4.4.1

Regex encapsulate full line and surround it

I can find examples of surrounding a line but not surrounding and replacing, and I'm a bit new to Regex.
I'm trying to ease up my markdown, so that I do not need to add in html just to get it to center images.
With pandoc, I apparently need to surround and image with DIV tags to get it to be centered, right justified, or what ever.
Instead of typing that every time, I'd like to just preprocess my markdown with a ruby script and have ruby add in the DIV's for me.
So I can type:
center![](image.jpg)
and then run a ruby script that will change it to
<div class="center">
![](image.jpg)
</div>
I want the regex to find "center!" and get rid of the word "center" and surround the rest with DIV tags.
How would I accomplish this?
A little example using gsub:
s = "a\ncenter![](image.jpg)\nb\n"
puts s.gsub(/^center(.*)$/, "<div class=\"center\">\n\\1\n</div>")
Result is:
a
<div class="center">
![](image.jpg)
</div>
b
Should get you started. The (.*) captures the content after center, and \\1 adds it back into the replacement. In this example I assumed that the item was on a line by itself - ^ indicates the start of a line and $ indicates the end of a line. If that isn't the case, you'll need to determine what makes what your regex unique so that it doesn't replace any random usage of "center" in your text.

Return strings separated by line break in Rack application

I have a Sinatra application and a method
get '/page123' do
"string1\nstring2"
end
If I go to '/page123', I'll see only one string of string1 string2 without line break between them.
How do I show them as
string1
string2
?
Since a <br> tag doesn't seem to work for you, try this:
get '/page123' do
content_type 'text/plain'
"string1\nstring2"
end
If you look at the source of your HTML, string1 string2 will actually be string1\nstring2, but, as expected, the \n is a new-line and invisible except for its effect of making the line break and start a new one.
Browsers gloss over (AKA ignore) embedded carriage returns and, instead, only honor tags like <p> and <br>, unless the text is embedded inside <pre> blocks. Try copying the following into a text file and open it in your browser:
<html>
<body>
<pre>
string1
string2
</pre>
<p>string1<br>string2</p>
</body>
</html>
They'll both show the two words, on separate lines, in different fonts. Both are valid ways of forcing a line-break in HTML.
It'd be good for you to learn the ways that browsers display HTML. They don't treat it as text, because it isn't. HTML is a language which the browser interprets and uses the tags as instructions about how to present the text.

HtmlUnit processing whitespace

I'm using HtmlUnit to do some processing of an Html page. My problem is that it does not seem to be correctly maintaining whitespace.
The original html looks like:
<div><cite>www.<b>example</b>.com</cite>
Which renders as:
www.example.com
After using html unit to do some parsing on other parts of the dom, I print the html back out using getXml(). Doing so causes the html to be pretty printed:
<div>
<cite>
www.
<b>
example
</b>
.com
</cite>
This ends up rendering as:
www. example .com
Note the extra space before and after example.
I tried just trimming the whitespace from resulting pretty-printed dom, but then you lose spaces in places where you actually want them.
Stepping through the generated dom, it appears that HtmlUnit trims all of the DomText nodes when it creates them, so the space information is lost.
Is there any way I can configure HtmlUnit to track this information? Or some alternative that better maintains the original html? I just need to be able to extra portions of the html via XPath.
I think this should return the original html:
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.yourpage.com");
String originalHtml = page.getWebResponse().getContentAsString();
Using JavaScript gets the html without the extra whitespace:
WebClient client = new WebClient(BrowserVersion.FIREFOX_17);
HtmlPage page = client.getPage(url);
client.waitForBackgroundJavaScript(5000);
String html = htmlPage.executeJavaScript("document.body.parentNode.outerHTML")
.getJavaScriptResult()
.toString();

Convert HTML to plain text and maintain structure/formatting, with ruby

I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for <br> tags, detecting paragraphs and formatting them as such, etc.
The input is pretty simple, usually well-formatted html (not entire documents, just a bunch of content, usually with no anchors or images).
I could put together a couple regexs that get me 80% there but figured there might be some existing solutions with more intelligence.
First, don't try to use regex for this. The odds are really good you'll come up with a fragile/brittle solution that will break with changes in the HTML or will be very hard to manage and maintain.
You can get part of the way there very quickly using Nokogiri to parse the HTML and extract the text:
require 'nokogiri'
html = '
<html>
<body>
<p>This is
some text.</p>
<p>This is some more text.</p>
<pre>
This is
preformatted
text.
</pre>
</body>
</html>
'
doc = Nokogiri::HTML(html)
puts doc.text
>> This is
>> some text.
>> This is some more text.
>>
>> This is
>> preformatted
>> text.
The reason this works is Nokogiri is returning the text nodes, which are basically the whitespace surrounding the tags, along with the text contained in the tags. If you do a pre-flight cleanup of the HTML using tidy you can sometimes get a lot nicer output.
The problem is when you compare the output of a parser, or any means of looking at the HTML, with what a browser displays. The browser is concerned with presenting the HTML in as pleasing way as possible, ignoring the fact that the HTML can be horribly malformed and broken. The parser is not designed to do that.
You can massage the HTML before extracting the content to remove extraneous line-breaks, like "\n", and "\r" followed by replacing <br> tags with line-breaks. There are many questions here on SO explaining how to replace tags with something else. I think the Nokogiri site also has that as one of the tutorials.
If you really want to do it right, you'll need to figure out what you want to do for <li> tags inside <ul> and <ol> tags, along with tables.
An alternate attack method would be to capture the output of one of the text browsers like lynx. Several years ago I needed to do text processing for keywords on websites that didn't use Meta-Keyword tags, and found one of the text-browsers that let me grab the rendered output that way. I don't have the source available so I can't check to see which one it was.

Resources