There is some file index.html (saved in UTF-8):
<html>
<head></head>
<body>
<h1> THE TITLE </h1>
Please click here
<br> ... Some text... <br>
Image: <img src="nature.png"/>
<br> ... Some another text... <br>
Image2: <img src="nature2.png" />
</body>
</html>
I need to fetch all the text containing inside the BODY tag, modify it, and save. So I do like this:
File input = new File("html/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Elements body = doc.select("BODY");
//do some manipulations with the data and print it
System.out.println(body.html());
The result is:
?
<h1> THE TITLE </h1> Please click
here
...
It's fine, except the question symbol at the begining. How can I avoid it?
Of course I can just delete it from the result string) But I would like to understand whats the matter.
First of all you need to make a PrintStream that understands UTF-8:
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(body.html());
Then try to redirect output to a file and see if there's still garbage when reading it as UTF-8.
If not then your console simply isn't UTF-8 and doesn't know how to handle it.
Related
I am using Verify.PlayWright and to take HTML element snapshots. When the compare opens, all the HTML is on one line. This makes it hard to see the differences. Is there a way to format the HTML in order to get a nicer comparison?
var root = await page.QuerySelectorAsync("#sectionContainer .tree-root");
await Verifier.Verify(root);
You can use Verify.AngleSharp. It has a feature that ppretty prints html](https://github.com/VerifyTests/Verify.AngleSharp#pretty-print) for comparison purposes.
install https://nuget.org/packages/Verify.AngleSharp/
Call VerifyAngleSharpDiffing.Initialize() once at assembly load time.
use PrettyPrintHtml in your test:
[Test]
public Task PrettyPrintHtml()
{
var html = #"<!DOCTYPE html>
<html><body><h1>My First Heading</h1>
<p>My first paragraph.</p></body></html>";
return Verifier.Verify(html)
.UseExtension("html")
.PrettyPrintHtml();
}
which will produce a verified file containing
<!DOCTYPE html>
<html>
<head></head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
I'm trying to write a docpad plugin that will allow me to insert meta tags unique to each page, for example og:title or og:description. I've been able to accomplish this globally with the populateCollections event for global values, but have not been able to do this per page.
I'd like for this to work without the need for a template function so that the meta tag is inserted automatically based on the document's meta. One way might be to grab the contentRendered value in the writeBefore event and do string manipulation that way, but that seems hacky.
Any ideas?
This worked for what I needed. Basically, I'm getting the rendered content right before the file is written using the writeBefore event, and doing a very simple string replace which adds the meta tags and their unique values, which is pulled from the model in the collection.
writeBefore: (opts) ->
docPad = #docPad
templateData = docpad.getTemplateData()
siteUrl = templateData.site.url
for model in opts.collection.models
if model.get('outExtension') == 'html'
url = #getTag('og:url', siteUrl+model.get('url'))
title = #getTag('og:title', model.get('title'))
content = model.get('contentRendered')
if content
content = content.replace(/<\/title>/, '</title>'+url+title+description)
model.set('contentRendered', content)
# Helper
getTag: (ogName, data) ->
return "\n <meta property=\"#{ogName}\" content=\"#{data}\" />"
Great answer David, leaving this one if someone faced the same issue I did.
Check if meta tag is broken, if it is - don't render:
renderBefore: (opts) ->
for model in opts.collection.models
if model.get('date').toLocaleDateString()=='Invalid Date'
model.set('write', false)
docpad.log model.get('title')+' has broken date format!\n\n\n\n\n'
false
I am using partials in with collections. Adding what is needed in the document like this:
```
title: Meetings and Events
layout: page
description: "This is my custom description."
tags: ['resources']
pageOrder: 3
pageclass: rc-events
```
I needed a custom CSS class by page. Then you can call it in your default template like this.
<div id="main" class="container <%= #document.pageclass %>">
Should be the same for meta
<meta name="description" content="<%= #document.description) %>" />
or check your docpad.coffee file and put together helper function for prepared content based off of a default site value combined with a #document value. Then you can just call something like the default:
<meta name="description" content="<%= #getPreparedDescription() %>" />
Which is built by this helper function:
# Get the prepared site/document description
getPreparedDescription: ->
# if we have a document description, then we should use that, otherwise use the site's description
#document.description or #site.description
I developed a website in Codeigniter, but it appends a hash string at the end of every url.
For example:
http://my_website.com/#.UR46O6Wj12I
I want to remove this hash string after every url or preventing it from being appended to the url.
Seems like you have also posted this question at ellislab, so I checked out your page through there. The problem lies in your javascript, not in codeigniter.
The code causing the hash is this in your html:
<div class="like_social">
<!-- AddThis Button BEGIN -->
<div class="addthis_toolbox addthis_default_style ">
<a class="addthis_button_facebook_like" fb:like:layout="button_count" style="width:75px; overflow:none"></a>
<a class="addthis_button_tweet" style="width:80px;overflow:none"></a>
<a class="addthis_button_linkedin_counter" style="width:100px;overflow:none"></a>
<a class="addthis_button_google_plusone" g:plusone:annotation="bubble"></a>
</div>
<script type="text/javascript">var addthis_config = {"data_track_addressbar":true};</script>
<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-5112601232209e07"></script>
<!-- AddThis Button END -->
</div>
Like Rick Calder suggested, it's from some sort of an add-on, in this case a button from AddThis. More info from their support documentation can be found regarding that hash.
If you want to still keep that button, it seems that you can disable it by going to the advanced tab of the AddThis settings page and unchecking the Track address bar shares.
Alternatively, you can set data_track_addressbar to false.
please put this code end of the index file
<script type="text/javascript">
var addthis_config = addthis_config||{};
addthis_config.data_track_addressbar = false;
</script>
if you use tpl ddon forget {literal}this code{/literal}
this will helpfull
I want to remove all style attribute in html tags using asp.net...
string source=#" <div style="font-size: 12pt;"> Hello world</div> <style id=fll margin:19px auto;text-align:center"></style>";
I want the result like this:
<div>Hello world </div>
For that i am using,
string expn =#"(?i)<(table|tr|td)(?:\s+(?:""[^""]""|'[^']'|[^""'>])*)?>";
return System.Text.RegularExpressions.Regex.Replace(source, expn, string.Empty);
I dont know which one is using,
Tell me the query what i have to use for this one....
This should work (though I don't understand the style tag at the end of your example):
string source="<div style=\"font-size: 12pt;\"> Hello world</div>";
string pattern = "style=\".*\"";
string result = Regex.Replace(source, pattern, "");
I try to validate my xhtml and i have a little problem with this:
The end of the document i have this little JS script which contain IMG and FONT tags, and i get error for this:
document type does not allow element "img" here;
document type does not allow element "font" here
$("#nick_name").change(function()
{var usr=$("#nick_name").val();if(usr.length>=4)
{$("#status").html('<img src="images/loader.gif" align="middle" alt="" title=""/>');
.
.
.
How can i validate this?
Thank you.
Put the script in a CDATA to validate; details. I found it is a good practice when dealing with javascript and validation.
Something like this
<script type="text/javascript">
<![CDATA[
$("#nick_name").change(function()
{var usr=$("#nick_name").val();if(usr.length>=4)
{$("#status").html('<img src="images/loader.gif" align="middle" alt="" title=""/>');
.
.
.
]]>
</script>