How to get the contents of a HTML element - go

I'm quite new to Go and I'm struggling a little at the moment with parsing some html.
The HTML looks like:
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<div>something</div>
<div id="publication">
<div>I want <span>this</span></div>
</div>
<div>
<div>not this</div>
</div>
</body>
</html>
And I want to get this as a string:
<div>I want <span>this</span></div>
I've tried html.NewTokenizer() (from golang.org/x/net/html) but can't seem to get the entire contents of an element back from a token or node. I've also tried using depth with this but it picked up other bits of code.
I've also had a go with goquery which seems perfect, code:
doc, err := goquery.NewDocument("{url}")
if err != nil {
log.Fatal(err)
}
doc.Find("#publication").Each(func(i int, s *goquery.Selection) {
fmt.Printf("Review %d: %s\n", i, s.Html())
})
But s.Text() will only print out the text and s.Html() doesn't seem to exist (?).
I think parsing it as XML would work, except the actual HTML is very deep and there would have to be a struct for each parent element...
Any help would be amazing!

You're not getting the result (s.Html() actually exist), because you haven't set the variable and error handler.
Please add this to your code and it will work fine:
doc.Find("#publication").Each(func(i int, s *goquery.Selection) {
inside_html,_ := s.Html() //underscore is an error
fmt.Printf("Review %d: %s\n", i, inside_html)
})

Related

How to inject Javascript in html template (html/template) with Golang?

Is there any way to inject Javascript as a variable in Golang html template (html/template). I was expecting the script to be injected in the template however script is injected as string inside ".
template.html
...
<head>
{{ .myScript }}
</head>
...
parser.go
...
fp := path.Join("dir", "shop_template.html")
tmpl, err := template.ParseFiles(fp)
if err != nil {
return err
}
return tmpl.Execute(writer, myObject{Script: "<script>console.log('Hello World!');</script>"})
...
rendered html output:
...
<head>
"<script>console.log('Hello World!');</script>"
</head>
...
Expected output
<head>
<script>console.log('Hello World!');</script>
// And should log Hello World! in the console.
</head>
Assuming you are using the html/template package, this is the expected behavior. Regular strings should not be able to inject HTML/JS code.
If you trust the content, you can use template.JS or template.HTML types to inject JS and HTML code.
return tmpl.Execute(writer, myObject{
Script: template.HTML("<script>console.log('Hello World!');</script>")})
Of course, you'll need to declare:
type myObject struct {
Script template.HTML
...
}
instead of string.

Why doesn't template.ParseFiles() detect this error?

If I specify a non-existent template in my template file, the error is not detected by ParseFiles() but by ExecuteTemplate(). One would expect parsing to detect any missing templates. Detecting such errors during parsing could also lead to performance improvements.
{{define "test"}}
<html>
<head>
<title> test </title>
</head>
<body>
<h1> Hello, world!</h1>
{{template "doesnotexist"}}
</body>
</html>
{{end}}
main.go
package main
import (
"html/template"
"os"
"fmt"
)
func main() {
t, err := template.ParseFiles("test.html")
if err != nil {
fmt.Printf("ParseFiles: %s\n", err)
return
}
err = t.ExecuteTemplate(os.Stdout, "test", nil)
if err != nil {
fmt.Printf("ExecuteTemplate: %s\n", err)
}
}
10:46:30 $ go run main.go
ExecuteTemplate: html/template:test.html:8:19: no such template "doesnotexist"
10:46:31 $
template.ParseFiles() doesn't report missing templates, because often not all the templates are parsed in a single step, and reporting missing templates (by template.ParseFiles()) would not allow this.
It's possible to parse templates using multiple calls, from multiple sources.
For example if you call the Template.Parse() method or your template, you can add more templates to it:
_, err = t.Parse(`{{define "doesnotexist"}}the missing piece{{end}}`)
if err != nil {
fmt.Printf("Parse failed: %v", err)
return
}
The above code will add the missing piece, and your template execution will succeed and generate the output (try it on the Go Playground):
<html>
<head>
<title> test </title>
</head>
<body>
<h1> Hello, world!</h1>
the missing piece
</body>
</html>
Going further, not requiring all templates to be parsed and "presented" gives you optimization possibilities. It's possible there are admin pages which are never used by "normal" users, and are only required if an admin user starts or uses your app. In that case you can speed up startup and same memory by not having to parse admin pages (only when / if an admin user uses your app).
See related: Go template name

Using http.Redirect to redirect to a webpage that is generated using html template

I want to redirect to homepage after successful logging in and I'm generating the HTML for the homepage using go HTML-template. When I login the url does change to /home and the HTML page loads too. But it doesn't load the page variables passed from server-side.
func LoginHandler(w http.ResponseWriter, r *http.Request) {
....
PageVars := Login{
Username: username,
}
http.Redirect(w,r,"/home",302)
t, err := template.ParseFiles("static/home.html")
if err != nil {
log.Printf("template parsing error: ", err)
}
err = t.Execute(w, PageVars)
if err != nil {
log.Printf("template executing error: ", err)
}
}
my html page is as follows:
<!DOCTYPE html>
<html>
<body>
<div id="mainContainer" class="alignCenter">
<header class = "nav">
<div class = "nav-links">
<span class="headerTitle" id="headerTitle">{{.Username}}</span>
<form action="http://localhost:8080/api/logout" method="POST">
<span id="logout" onclick="document.forms[0].submit()">Logout</span>
</form>
</div>
</header>
</div>
</body>
After logging-in, the page displays {{.Username}} in the header and not the logged-in username from server-side.
If I place the http.Redirect(w,r,"/home",302) after template execution, the username loads but the url directs to the api call, like this http://localhost:8080/api/login instead of http://localhost:8080/home. I've been coding go only since two days now. What am I doing the wrong way here, please help.

How can I preserve line breaks in html table cell when scraping with gocolly

I'm trying to preserve the formatting in table cells when I extract the contents of a <td> cell.
What happens is if there are two lines of text (for e.g, an address) in the , the code may look like:
<td> address line1<br>1 address line2</td>
When colly extracts this, I get the following:
address line1address line2
with no spacing or line breaks since all the html has been stripped from the text.
How can I work around / fix this so I receive readable text from the <td>
gocolly uses goquery under the hood. You can call all Selection methods, including the Html().
func (*Selection) Html
func (s *Selection) Html() (ret string, e error)
Html gets the HTML contents of the first element in the set of matched elements. It includes text and comment nodes.
This is how you can get the html content:
c.OnHTML("tr", func(e *colly.HTMLElement) {
// You can find the elem
h, _ := e.DOM.Find("td").Html()
fmt.Printf("=> %s \n", h)
// ...or you can loop thru all of them
elem.DOM.Each(func(_ int, s *goquery.Selection) {
h, _ := s.Html()
fmt.Printf("=> %s \n", h)
})
}
As far as I know gocolly does not support such formatting, but you can basically do something like below, by using htmlquery(which gocolly uses it internally) package's OutputHTML method
const htmlPage = `
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Your page title here</title>
</head>
<body>
<p>
AddressLine 1
<br>
AddresLine 2
</p>
</body>
</html>
`
doc, _ := htmlquery.Parse(strings.NewReader(htmlPage))
xmlNode := htmlquery.FindOne(doc, "//p")
result := htmlquery.OutputHTML(xmlNode, false)
output of result variable is like below now:
AddressLine 1
<br/>
AddresLine 2
You can now parse result by <br/> tag and achive what you want.
But I am also new in go, so maybe there may be better way to do it.

Why a un-closed html tag make the html template not rendering in go?

I come to a very bothering problem, And it took me about an hour to figure what cause the problem, but I don't know why:
I am using html/template to rending a web page and the code is like this:
t, _ := template.parseFiles("template/index.tmpl")
...
t.Execute(w, modelView) // w is a http.ResponseWriter and modelView is a data struct.
But unconsciously, I made a mistake that leave a <textarea> tag open:
<html>
<body>
<form id="batchAddUser" class="form-inline">
**this one** --> <textarea name="users" value="" row=3 placeholder="input username and password splited by space">
<button type="submit" class="btn btn-success" >Add</button>
</form>
</body>
</html>
And then Go gives no exception and other hint, but just give a blank page with nothing, and the status code is 200.
It toke effect to locate the problem since no information was offered, but why is that happen? How comes a un-colsed tag cause problem like that? And how to debug that?
It is telling you about the error, you are just ignoring it.
If you look at the error returned by Execute, it tells you that your html is bad.
You should always check for errors. Something like:
t, err := template.New("test").Parse(ttxt)
if err != nil {
...do something with error...
}
err = t.Execute(os.Stdout, nil) // w is a http.R
if err != nil {
...do something with error...
}
Here it is (with error printing) on Playground
Here it is, fixed on Playground
The Go's template package provides method Must that will make your program fail fast by panicking in case of such errors. You can free your code of some error checks, yet you will still be in control.
t := template.Must(template.parseFiles("template/index.tmpl"))

Resources