How to open an URL in a browser with an authentication header? - go

In Golang we can launch a browser window to open an URL using exec.Command method. An example can be found here
My question is how can we open that URL with a header?

If you're using Chrome, you could use Chrome DevTools Protocol to attach to a running Chrome instance and issue a command to navigate to a URL with specific headers.
First, Launch Chrome with Chrome Devtools Protocol enabled by using the flag --remote-debugging-port=9222
You'll get a response similar to DevTools listening on ws://127.0.0.1:9222/devtools/browser/2393d6e8-a85d-40a2-a79e-13f1585ff336
Pass that ws://... URL into the program below:
package main
import (
"context"
"flag"
"log"
"github.com/chromedp/cdproto/network"
"github.com/chromedp/chromedp"
)
func main() {
var devToolWsURL string
flag.StringVar(&devToolWsURL, "devtools-ws-url", "", "DevTools Websocket URL")
flag.Parse()
// Create contexts.
actxt, cancelActxt := chromedp.NewRemoteAllocator(context.Background(), devToolWsURL)
defer cancelActxt()
// Create new tab.
ctxt, _ := chromedp.NewContext(actxt)
// Custom header.
headers := map[string]interface{}{
"X-Header": "my request header",
}
task := chromedp.Tasks{
network.Enable(),
network.SetExtraHTTPHeaders(network.Headers(headers)),
chromedp.Navigate("http://google.com"),
}
// Run task.
err := chromedp.Run(ctxt, task)
if err != nil {
log.Fatal(err)
}
}
Notes:
9222 is the default port for this protocol but you can use any port
you want.
I didn't include the exec.Command code for brevity.
References:
Header example
Remote Chrome control example
UPDATE
Found a simpler way. You can just launch Chrome straight from chromedp by overriding the default headless option:
func main() {
// Create contexts.
opts := append(chromedp.DefaultExecAllocatorOptions[:], chromedp.Flag("headless", false))
actx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
ctx, cancel := chromedp.NewContext(actx)
// Call cancel() to close Chrome on some condition.
if false {
cancel()
}
// Custom header.
headers := map[string]interface{}{
"X-Header": "my request header",
}
task := chromedp.Tasks{
network.Enable(),
network.SetExtraHTTPHeaders(network.Headers(headers)),
chromedp.Navigate("http://tested.com"),
}
// Run task.
err := chromedp.Run(ctx, task)
if err != nil {
log.Fatal(err)
}
}

Related

How to reuse HTTP request instance in Go

I'm building an API that scrapes some data off a webpage.
To do so, i need to send a GET request to a home page, scrape a 'RequestVerificationToken' from the HTML, then send another POST request to the same URL with a username, password, and the RequestVerificationToken.
I've been able to do this previously with Python:
session_requests = requests.session()
result = session_requests.get(LOGIN_URL)
parser = createBS4Parser(result.text)
return parser.find('input', attrs={'name': '__RequestVerificationToken'})["value"]
pageDOM = session_requests.post(
LOGIN_URL,
data=requestPayload, //RequestVerificationToken is in here
headers=requestHeaders
)
It seems like when i reuse the session_requests variable in Python, it's reusing the previous instance of the HTTP request.
However, when i try to do this in Go, I get an error due to an invalid token. I assume that this is because for the POST request, Go is using a new instance.
Is there any way I can get the same behavior from Go as I was with Python?
package main
import (
"fmt"
"log"
"github.com/gocolly/colly"
"github.com/gocolly/colly/proxy"
)
func main() {
//initiates the configuration
c := colly.NewCollector(colly.AllowURLRevisit())
//defining the proxy chain
revpro, err := proxy.RoundRobinProxySwitcher("socks5://127.0.0.1:9050", "socks5://127.0.0.1:9050")
if err != nil {
log.Fatal(err)
}
c.SetProxyFunc(revpro)
//parsing the required field from html we are extracting the csrf_token required for the login
c.OnHTML("form[role=form] input[type=hidden][name=CSRF_TOKEN]", func(e *colly.HTMLElement) {
csrftok := e.Attr("value")
fmt.Println(csrftok)
//posting the csrf value along with password
err := c.Post("https://www.something.com/login.jsp", map[string]string{"CSRF_TOKEN": csrftok, "username": "username", "password": "password"})
if err != nil {
log.Fatal(err)
}
return
})
//The website to visit
c.Visit("https://www.something.com/login.jsp")
//maintaining the connection using clone not initiating a callback request
d := c.Clone()
d.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
fmt.Printf("Link found: %q -> %s\n", e.Text, link)
})
d.Visit("https://skkskskskk.htm")
}

In Go, what is the proper way to use context with pgx within http handlers?

Update 1: it seems that using a context tied to the HTTP request may lead to the 'context canceled' error. However, using the context.Background() as the parent seems to work fine.
// This works, no 'context canceled' errors
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Second)
// However, this creates 'context canceled' errors under mild load
// ctx, cancel := context.WithTimeout(r.Context(), 100*time.Second)
defer cancel()
app.Insert(ctx, record)
(updated code sample below to produce a self-contained example for repro)
In go, I have an http handler like the following code. On the first HTTP request to this endpoint I get a context cancelled error. However, the data is actually inserted into the database. On subsequent requests to this endpoint, no such error is given and data is also successfully inserted into the database.
Question: Am I setting up and passing the context correctly between the http handler and pgx QueryRow method? (if not is there a better way?)
If you copy this code into main.go and run go run main.go, go to localhost:4444/create and hold ctrl-R to produce a mild load, you should see some context canceled errors produced.
package main
import (
"context"
"fmt"
"log"
"math/rand"
"net/http"
"time"
"github.com/jackc/pgx/v4/pgxpool"
)
type application struct {
DB *pgxpool.Pool
}
type Task struct {
ID string
Name string
Status string
}
//HTTP GET /create
func (app *application) create(w http.ResponseWriter, r *http.Request) {
fmt.Println(r.URL.Path, time.Now())
task := &Task{Name: fmt.Sprintf("Task #%d", rand.Int()%1000), Status: "pending"}
// -------- problem code here ----
// This line works and does not generate any 'context canceled' errors
//ctx, cancel := context.WithTimeout(context.Background(), 100*time.Second)
// However, this linegenerates 'context canceled' errors under mild load
ctx, cancel := context.WithTimeout(r.Context(), 100*time.Second)
// -------- end -------
defer cancel()
err := app.insertTask(ctx, task)
if err != nil {
fmt.Println("insert error:", err)
return
}
fmt.Fprintf(w, "%+v", task)
}
func (app *application) insertTask(ctx context.Context, t *Task) error {
stmt := `INSERT INTO task (name, status) VALUES ($1, $2) RETURNING ID`
row := app.DB.QueryRow(ctx, stmt, t.Name, t.Status)
err := row.Scan(&t.ID)
if err != nil {
return err
}
return nil
}
func main() {
rand.Seed(time.Now().UnixNano())
db, err := pgxpool.Connect(context.Background(), "postgres://test:test123#localhost:5432/test")
if err != nil {
log.Fatal(err)
}
log.Println("db conn pool created")
stmt := `CREATE TABLE IF NOT EXISTS public.task (
id uuid NOT NULL DEFAULT gen_random_uuid(),
name text NULL,
status text NULL,
PRIMARY KEY (id)
); `
_, err = db.Exec(context.Background(), stmt)
if err != nil {
log.Fatal(err)
}
log.Println("task table created")
defer db.Close()
app := &application{
DB: db,
}
mux := http.NewServeMux()
mux.HandleFunc("/create", app.create)
log.Println("http server up at localhost:4444")
err = http.ListenAndServe(":4444", mux)
if err != nil {
log.Fatal(err)
}
}
TLDR: Using r.Context() works fine in production, testing using Browser is a problem.
An HTTP request gets its own context that is cancelled when the request is finished. That is a feature, not a bug. Developers are expected to use it and gracefully shutdown execution when the request is interrupted by client or timeout. For example, a cancelled request can mean that client never see the response (transaction result) and developer can decide to roll back that transaction.
In production, request cancelation does not happen very often for normally design/build APIs. Typically, flow is controlled by the server and the server returns the result before the request is cancelled.
Multiple Client requests does not affect each other because they get independent go-routine and context. Again, we are talking about happy path for normally designed/build applications. Your sample app looks good and should work fine.
The problem is how we test the app. Instead of creating multiple independent requests, we use Browser and refresh a single browser session. I did not check what exactly is going on, but assume that the Browser terminates the existing request in order to run a new one when you click ctrl-R. The server sees that request termination and communicates it to your code as context cancelation.
Try to test your code using curl or some other script/utility that creates independent requests. I am sure you will not see cancelations in that case.

chromedp - Go - Show invalid printer settings error (-32000) - When setting WithMarginTop

I'm playing around with chromedp and been trying to replicate the functionality in puppeteer node.js but in golang.
I'm finding that the same JSON payload to chromium is causing an error when using chromedp
package main
import (
"context"
"io/ioutil"
"log"
"github.com/chromedp/cdproto/page"
"github.com/chromedp/chromedp"
)
func main() {
// create context
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
// capture pdf
var buf []byte
if err := chromedp.Run(ctx, printToPDF(`<html><body><h1>Yeeeew!</h1></body></html>`, &buf)); err != nil {
log.Fatal(err)
}
if err := ioutil.WriteFile("sample.pdf", buf, 0644); err != nil {
log.Fatal(err)
}
}
// https://github.com/puppeteer/puppeteer/blob/4d9dc8c0e613f22d4cdf237e8bd0b0da3c588edb/src/common/PDFOptions.ts#L74
// https://github.com/puppeteer/puppeteer/blob/4d9dc8c0e613f22d4cdf237e8bd0b0da3c588edb/src/common/Page.ts#L3366
//https://github.com/chromedp/chromedp/issues/836
func printToPDF(html string, res *[]byte) chromedp.Tasks {
return chromedp.Tasks{
chromedp.Navigate("about:blank"),
chromedp.ActionFunc(func(ctx context.Context) error {
frameTree, err := page.GetFrameTree().Do(ctx)
if err != nil {
return err
}
return page.SetDocumentContent(frameTree.Frame.ID, html).Do(ctx)
}),
chromedp.ActionFunc(func(ctx context.Context) error {
buf, _, err := page.PrintToPDF().
// WithPrintBackground(false).
WithMarginTop(20.0).
// WithMarginLeft(20).
// WithMarginBottom(20.0).
// WithMarginRight(20).
Do(ctx)
if err != nil {
return err
}
*res = buf
return nil
}),
}
}
I've vendored the modules and edited cdproto/page/page.go to print out the JSON being sent to chromium
{"marginTop":20,"marginBottom":0,"marginLeft":0,"marginRight":0,"transferMode":"ReturnAsStream"}
I've also done this in node.js and logged out the json to compare
node index.js
PDF command: 'Page.printToPDF' {
transferMode: 'ReturnAsStream',
marginTop: 20,
marginBottom: 20,
marginLeft: 0,
marginRight: 0
}
I'm not sure why I'm getting this error? Any ideas?
TL;DR
The margin value is too big. I think you meant to pass 20.0/96 inches:
- WithMarginTop(20.0).
+ WithMarginTop(20.0/96).
Explanation
The error message returned from chromium is misleading. I guess here is what happened: the provided margin settings is invalid, and since chromium is running in headless mode, it can not show the error dialog. The error message can be interpreted as "I can not show a dialog to alert the user that the provided printer settings are invalid".
"marginTop":20 in the raw CDP message means 20 inches on the top, which is too big for an A4 page (A4: 8.27in x 11.7in). Please note that a number in puppeteer is treated as pixels and will be converted to inches before sending to chromium (see https://github.com/puppeteer/puppeteer/blob/4d9dc8c0e613f22d4cdf237e8bd0b0da3c588edb/src/common/Page.ts#L3366-L3402). So the fix is obvious.
BTW, there are easy ways to see the raw CDP messages:
chromedp: use the chromedp.WithDebugf option:
ctx, cancel = chromedp.NewContext(context.Background(), chromedp.WithDebugf(log.Printf))`
puppeteer: env DEBUG="puppeteer:*" node script.js. See https://github.com/puppeteer/puppeteer#debugging-tips

How to link html

I use this library to create the GUI: github.com/Equanox/gotron
I need this when I click the html button to run a Go code with the username and password.
Do you know any way to do this? I tried to google it, but I didn't find any code related to it, maybe I searched it the wrong way.
My code:
package main
import (
"github.com/Equanox/gotron"
"log"
"os"
"text/template"
)
var tpl *template.Template
func LinkinInit() {
tpl = template.Must(template.ParseFiles("tpl.html"))
}
func main() {
// Create a new browser window instance
window, err := gotron.New()
if err != nil {
panic(err)
}
// Alter default window size and window title.
window.WindowOptions.Width = 720
window.WindowOptions.Height = 485
window.WindowOptions.Title = "Login APP"
// Start the browser window.
// This will establish a Go <=> nodejs bridge using websockets,
// to control ElectronBrowserWindow with our window object.
done, err := window.Start()
if err != nil {
panic(err)
}
<-done
}
HTML CODE
https://hastebin.com/usitorimil.xml
Look at the Gotron readme file
https://github.com/Equanox/gotron#communicate-between-backend-and-frontend
Backend: Handle incoming events
window.On(&gotron.Event{Event: "event-name"}, func(bin []byte) {
//Handle event here
}
Frontend: Send event to backend
ws.send(JSON.stringify({
"event": "event-name",
"AtrNameInFrontend": "Hello World!",
}))

How to take screenshot of a website using Golang?

What I'm looking to do, given a URL and take a screenshot of the website using Golang. I searched for results but I didn't get any. Can anyone please help me.
You can use a Go version of Selenium if you want to go that route. https://godoc.org/github.com/tebeka/selenium
There is no pure golang way to do at the moment this since it must involve a browser is some form.
The easiest path to achieve this functionality is probably:
Find a nice NodeJS library to take website screenshots
Create a NodeJS script that is suits your needs for taking screenshots (i/o and settings)
Execute this NodeJS script from Golang and handle the results in your Golang code
Not the cleanest method to get this done though - if you want it cleaner you probably have to build/find a golang package that controls a browser so you can skip the NodeJS middleman.
I solved this issue using https://github.com/mafredri/cdp and a Chrome headless docker container.
You can see my service example here: https://gist.github.com/efimovalex/9f9b815b0d5b1b7889a51d46860faf8a
A few more tools using Go and Chrome/Chromium include:
gowitness CLI app
screenshot library
web2image CLI app based on chromedp
I was writing a program for this specific task. Here is a sample code that browse google.com and takes a screenshot.
package main
import (
"time"
driver "github.com/dreygur/webdriver"
)
func main() {
url := `https://google.com`
driver.RunServer("./geckodriver")
driver.GetSession()
driver.Get(url)
time.Sleep(8 * time.Second)
driver.Screenshot("google")
time.Sleep(8 * time.Second)
defer driver.Kill()
}
To install the module, run go get github.com/dreygur/webdriver
You can use chromedp.
But you need install chrome browser!
Example :
package main
import (
"context"
"fmt"
"os"
"time"
"github.com/chromedp/chromedp"
)
func TackScreenShot(ctx context.Context, url string) ([]byte, error) {
context, cancel := chromedp.NewContext(ctx)
defer cancel()
var filebyte []byte
if err := chromedp.Run(context, chromedp.Tasks{
chromedp.Navigate(url),
chromedp.Sleep(3 * time.Second),
chromedp.CaptureScreenshot(&filebyte),
}); err != nil {
return nil, err
}
return filebyte, nil
}
func main() {
url := "https://google.com"
ctx := context.TODO()
data, err := TackScreenShot(ctx, url)
if err != nil {
panic(err)
}
defer ctx.Done()
pngFile, err := os.Create("./shot.png")
if err != nil {
panic(err)
}
defer pngFile.Close()
pngFile.Write(data)
fmt.Println("screen shot tacked!")
}

Resources