How do I scrape TLS certificates using go-colly? - go

I am using Colly to scrape a website and I am trying to also get the TLS certificate that the site is presenting during the TLS handshake. I looked through the documentation and the response object but did not find what I was looking for.
According to the docs, I can customize some http options by changing the default HTTP roundtripper. I tried setting custom GetCertificate and GetClientCertificate functions, assuming that these functions would be used during the TLS handshake, but the print statements are never called.
// Instantiate default collector
c := colly.NewCollector(
// Visit only domains: hackerspaces.org, wiki.hackerspaces.org
colly.AllowedDomains("pkg.go.dev"),
)
c.WithTransport(&http.Transport{
TLSClientConfig: &tls.Config{
GetCertificate: func(ch *tls.ClientHelloInfo) (*tls.Certificate, error) {
fmt.Println("~~~GETCERT CALLED~~")
return nil, nil
},
GetClientCertificate: func(cri *tls.CertificateRequestInfo) (*tls.Certificate, error) {
fmt.Println("~~~GETCLIENTCERT CALLED~~")
return nil, nil
},
},
})
Please help me scrape TLS certificates using Colly.

This is a snippet to get leaf certificate from raw http.Response in case you give up getting certificate using Colly.
tls := ""
if res.TLS != nil && len(res.TLS.PeerCertificates) > 0 {
cert := res.TLS.PeerCertificates[0]
tls = base64.StdEncoding.EncodeToString(cert.Raw)
}

Related

How can I fix x509: “Tom Akehurst” certificate is not trusted?

I'm trying to use testcontainers-go with HTTPS mode enabled in tests:
req := testcontainers.ContainerRequest{
Image: "wiremock/wiremock",
ExposedPorts: []string{"8080/tcp", "8443/tcp"},
Cmd: []string{"--https-port", "8443", "--verbose"},
}
uri := fmt.Sprintf("https://%s:%s", hostIP, mappedPort.Port())
# see
# https://github.com/testcontainers/testcontainers-go/blob/main/docs/examples/cockroachdb.md
# https://github.com/wiremock/wiremock-docker#start-a-wiremock-container-with-wiremock-arguments
together with walkerus/go-wiremock (WireMock go client) and I'm running into
Post "https://localhost:59279/foo": x509: “Tom Akehurst” certificate is not trusted
I think the reason is go-wiremockconverts wiremock.Post(wiremock.URLPathEqualTo into a direct http call (i.e., it doesn't "expose" http client):
// A Client implements requests to the wiremock server.
type Client struct {
url string
}
so I can't override it to:
tr := &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
client := &http.Client{Transport: tr}
Is there other workaround?

I can't get `golang.org/x/crypto/acme/autocert` to work (for gRPC), I get an `acme_account+key` file and no X509 cert

UPDATE After much debugging, I uncovered Get "https://acme-v02.api.letsencrypt.org/directory": x509: certificate signed by unknown authority and suspect (!?) this results from the recent expiration of Let's Encrypt's root cert.
I accept that "This package is a work in progress and makes no API stability promises." but, if it no longer works (and it's much more likely that my code|deployment is at issue), then perhaps the repo can be marked e.g. Here be dragons.
The code results in an acme_account+key (EC PRIVATE KEY) but no certs I'm challenged to get autocert to disclose (log) its magic in order to understand where I'm going wrong.
The code is essentially the repo's Manager example with input from this answer. I assume that GetCertificate blocks on the completion of the ACME flow.
Code:
package main
import (
"crypto/tls"
"flag"
"fmt"
"log"
"net"
"net/http"
"golang.org/x/crypto/acme/autocert"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/health"
healthpb "google.golang.org/grpc/health/grpc_health_v1"
)
const (
email string = "my#email.com"
var (
host = flag.String("host", "foo.example.org", "Fully-qualified domain name")
port = flag.Uint("port", 443, "gRPC service port")
path = flag.String("path", "", "Folder location for certificate")
)
func main() {
flag.Parse()
if *host == "" {
log.Fatal("Flag --host is required")
}
log.Printf("Host: %s", *host)
log.Printf("Port: %d", *port)
if *path == "" {
log.Fatal("Flag --path is required")
}
log.Printf("Path: %s", *path)
addr := fmt.Sprintf(":%d", *port)
lis, err := net.Listen("tcp", addr)
if err != nil {
log.Fatalf("failed to listen: %s", err)
}
m := &autocert.Manager{
Cache: autocert.DirCache(*path),
Prompt: autocert.AcceptTOS,
HostPolicy: autocert.HostWhitelist(*host),
Email: email,
}
go func() {
log.Println("Starting HTTP server w/ autocert handler")
if err := http.ListenAndServe(":http", m.HTTPHandler(nil)); err != nil {
log.Fatalf("HTTP failure\n%s", err)
}
}()
tlsConfig := &tls.Config{
ClientAuth: tls.RequireAndVerifyClientCert,
GetCertificate: func(hello *tls.ClientHelloInfo) (*tls.Certificate, error) {
cert, err := m.GetCertificate(hello)
if err != nil {
log.Fatalf("GetCertificate\n%s", err)
}
return cert, err
},
}
opts := grpc.Creds(credentials.NewTLS(tlsConfig))
server := grpc.NewServer(opts)
healthcheck := health.NewServer()
healthpb.RegisterHealthServer(server, healthcheck)
log.Println("Starting gRPC server")
if err := server.Serve(lis); err != nil {
log.Fatalf("gRPC failure\n%s", err)
}
}
I'm deploying to a (Google Compute Engine) Container VM, the equivalent docker command is:
docker run \
--name=autocert \
--detach \
--net=host \
--volume=/tmp/certs:/certs \
${IMAGE} \
--host=${HOST} \
--port=${PORT} \
--path=/certs
And container logs:
2021/11/25 17:30:00 Host: [HOST]
2021/11/25 17:30:00 Port: 443
2021/11/25 17:30:00 Path: /certs
2021/11/25 17:30:00 Starting gRPC server
2021/11/25 17:30:00 Starting HTTP server
The host's /tmp/certs directory receives acme_account+key (which I've struggled to find explained by Google) but suspect (!?) is the initial phase of Domain Validation. It contains a private key (BEGIN EC PRIVATE KEY).
Even after some time with the server running, no further files are persisted.
I receive no emails from Let's Encrypt at the configured email address.
Unfortunately, while easy to use, autocert produces little logging and I've been unable to determine whether I can log the ACME flow that's (hopefully) taking place.
Since adding the anonymous function for GetCertificate, acme_account+key is no longer created (I removed the previous file to check whether it's recreated) and so I'm unable to gather any logging from it but the function is never invoked. Either this is because my anonymous function is incorrect or because I've exceeded requests against the ACME endpoint. Removing the function and reverting to m.GetCertificate does not result in recreation of acme_account+key so I'm at a loss.
The autocert Manager type documents an *acme.Client field which I'm not setting. The comment describes "if the Client.Key is nil, a new ECDSA P-256 key is generated" which is perhaps what I'm experiencing but it doesn't explain what I should do about it. Should I set this value to the content of acme_account+key?:
UPDATE I tried decoding the private key, creating a crypto.Signer and passing this in &acme.Client{Key: key} but it made no evident difference
// Client is used to perform low-level operations, such as account registration
// and requesting new certificates.
//
// If Client is nil, a zero-value acme.Client is used with DefaultACMEDirectory
// as the directory endpoint.
// If the Client.Key is nil, a new ECDSA P-256 key is generated and,
// if Cache is not nil, stored in cache.
//
// Mutating the field after the first call of GetCertificate method will have no effect.
Client *acme.Client
Evidently, I'm using this incorrectly. I'm not receiving a cert from Let's Encrypt and so I'm unable to get a cert from the endpoint and unable to invoke the gRPC endpoint:
openssl s_client -showcerts -connect ${HOST}:${PORT}
grpcurl \
-proto health.proto \
${HOST}:${PORT} \
grpc.health.v1.Health/Check
Failed to dial target host "${HOST}:${PORT}": remote error: tls: internal error
Guidance would be appreciated.
🤦‍♂️
Ugh :-(
I'd started using SCRATCH and hadn't copied the CA certificates
Once the container had CA certs, everything worked almost flawlessly.
I continue to have problems trying to use:
tlsConfig := &tls.Config{
ClientAuth: tls.RequireAndVerifyClientCert,
GetCertificate: m.GetCertificate
}
And am using m.TLSConfig()
So, autocert works like a treat (though it's difficult to debug self-inflicted errors 😊)

How to confirm gRPC traffic from Go client is TLS encrypted

I wrote a sample gRPC client a server in Go, both configured for server-authenticated TLS.
The client gRPC call succeeds, giving me the impression the TLS is configured properly, otherwise if the TLS handshake had failed, I would expect the client to fail and not make the gRPC request (i.e. not default to plaintext).
Yet I am puzzled by a result I obtain when I attach Wireshark to that network to sniff TCP packets. I do not see any packet with TLS, for e.g. I do not see the TLS CLIENT HELLO packet.
So is this because I'm misinterpreting what I see in Wireshark, or is my gRPC client actually doing plaintext gRPC?
The client code looks like this, note the grpc.withTransportCredentials which I think means it will use TLS or fail, but never plaintext:
// block the dial until connection is successful or 3 sec timeout
dialOptions := []grpc.DialOption{
grpc.WithBlock(),
grpc.WithTimeout(3 * time.Second),
}
// Load TLS Configuration
tlsCredentials, err := LoadTLSCredentials()
if err != nil {
log.Fatalf("Failed to load TLS credentials: %v", err)
}
dialOptions = append(dialOptions, grpc.WithTransportCredentials(tlsCredentials))
// Dial the gRPC server
log.Printf("Dialing %v", *address)
conn, err := grpc.Dial(*address, dialOptions...)
if err != nil {
log.Fatalf("Failed to connect to the server: %v", err)
}
defer conn.Close()
// then this application sets up a gRPC request, and logs the response to stdout,
// in my testing stdout shows the expected gRPC response, so I'd assume TLS is working.
func LoadTLSCredentials() (credentials.TransportCredentials, error) {
rootCA, err := ioutil.ReadFile("ca.cert")
if err != nil {
return nil, err
}
certPool := x509.NewCertPool()
if !certPool.AppendCertsFromPEM(rootCA) {
return nil, fmt.Errorf("Failed to add rootCA to x509 certificate pool")
}
config := &tls.Config{
MinVersion: tls.VersionTLS12,
RootCAs: certPool,
}
return credentials.NewTLS(config), nil
}
And here's a screenshot of Wireshark showing no TLS packet
whereas I would expect something similar to the following which clearly shows some TLS activity (not my app, image is from the web for illustration purposes)
I'm running Wireshark v2.6.10 on Ubuntu 16.04. The source and destination IPs match my gRPC client and server IPs (both are docker containers on the same docker network).
Not that it really matters, but as can be seen in my client code, I'm sharing a root CA certificate on the client (self signed). I can do this because I deploy both the client and the server.
As #steffanUllrich explained in the comments, this was a case of Wireshark can be better configured to show TLS. I confirmed the gRPC exchange is indeed TLS protected.
You should right click the packet list, and select 'decode as..' menu item, then select 'tls' to force wireshark dissect traffic in this tcp port as TLS.

net::ERR_CERT_INVALID using Go backend

I am currently having an API running on :443 as you can see below:
// RunAsRESTAPI runs the API as REST API
func (api *API) RunAsRESTAPI(restAddr string) error {
// Generate a `Certificate` struct
cert, err := tls.LoadX509KeyPair( ".certificates/my-domain.crt", ".certificates/my-domain.key" )
if err != nil {
return errors.New(fmt.Sprintf("couldn't load the X509 certificates: %v\n", err))
}
// create a custom server with `TLSConfig`
restAPI := &http.Server{
Addr: restAddr,
Handler: nil, // use `http.DefaultServeMux`
TLSConfig: &tls.Config{
Certificates: []tls.Certificate{ cert },
},
}
// Defining the routes
routes := map[string]func(http.ResponseWriter, *http.Request){
"": api.handleIndex,
}
// Initialize mux
mux := http.NewServeMux()
// Register endpoints handlers
for route, function := range routes {
endpoint := "/" + route
mux.HandleFunc(endpoint, function)
log.Printf("[%s] endpoint registered.\n", endpoint)
}
// cors.Default() setup the middleware with default options being
// all origins accepted with simple methods (GET, POST). See
// documentation below for more options.
restAPI.Handler = cors.Default().Handler(mux)
log.Printf("REST TLS Listening on %s\n", restAddr)
return restAPI.ListenAndServeTLS("", "")
}
I created my certificates like so:
$ openssl req -new -newkey rsa:2048 -nodes -keyout my-domain.key -out my-domain.csr
$ openssl x509 -req -days 365 -in my-domain.csr -signkey my-domain.key -out my-domain.crt
I then dockerized, then deployed to Google Compute Engine, but, I am still getting this net::ERR_CERT_INVALID while requesting my API from a ReactJs App (Google Chrome)
I have no issues on Postman.. I don't understand, it even says that this certificate has not been verified by a third party
I am a bit lost, to be honest, how can I solve this? So my app can request my HTTPS backend
Thanks
Options
Instead of using a self-signed cert, use LetsEncrypt to generate a free cert that is considered valid because it has a backing Certificate Authority.
If your cert is secure enough for your purposes, add it to your client (browser) as a root CA. No one else will be able to connect to the API without doing the same.
This line should have error checking:
cert, err := tls.LoadX509KeyPair( ".certificates/my-domain.crt", ".certificates/my-domain.key" )
if err != nil {
log.Fatalf("key-pair error: %v", err)
}
While it may work locally, you can't be sure it also works in your Docker container.
Are those key/cert files available? If not - and without error checking - you are effectively passing a nil cert to your TLS config which will have no effect and no other discernable error.
You're using a self-signed certificate, so your browser doesn't trust it -- by design. Postman is telling you this with that message, too. This isn't a go issue, which is doing precisely what you told it to do: serving your application with an insecure certificate.
Thanks for your answers. I actually found a way to make it work in a SUPER EASY WAY:
func (api *API) RunAsRESTAPI() error {
// Defining the routes
routes := map[string]func(http.ResponseWriter, *http.Request){
"": api.handleIndex,
}
// Initialize mux
mux := http.NewServeMux()
// Register endpoints handlers
for route, function := range routes {
endpoint := "/" + route
mux.HandleFunc(endpoint, function)
log.Printf("[%s] endpoint registered.\n", endpoint)
}
// cors.Default() setup the middleware with default options being
// all origins accepted with simple methods (GET, POST). See
// documentation below for more options.
handler := cors.Default().Handler(mux)
log.Printf("REST TLS Listening on %s\n", "api.my-domain.com")
return http.Serve(autocert.NewListener("api.my-domain.com"), handler)
}
autocert.NewListener("api.my-domain.com") directly fixed my problem :)

golang: Send http request with certificate

For first I am newbie in golang.
I try to send https request. I create http.Client like this:
func httpClient(c *Config) (httpClient *http.Client) {
cert, _ := tls.LoadX509KeyPair(c.CertFile, c.KeyFile)
ssl := &tls.Config{
Certificates: []tls.Certificate{cert},
InsecureSkipVerify: true,
}
ssl.Rand = rand.Reader
return &http.Client{
Transport: &http.Transport{
TLSClientConfig: ssl,
},
}
}
But as result I get local error: no renegotiation.
Thanks for any help!
This is likely a problem with the remote server you're accessing, but it is a known problem (with Microsoft Azure services for one).
There may be a workaround on the way for go1.4, but until then the go client still doesn't support TLS renegotiation.
Relevant issue: https://code.google.com/p/go/issues/detail?id=5742
It looks as though renegotiation (and client certificate authentication) was previously unsupported. This looks to have been fixed by commit https://github.com/golang/go/commit/af125a5193c75dd59307fcf1b26d885010ce8bfd

Resources