HTTP request fails when executed asynchronously - go

I'm trying to write a tiny application in Go that can send an HTTP request to all IP addresses in hopes to find a specific content. The issue is that the application seems to crash in a very peculiar way when the call is executed asynchronously.
ip/validator.go
package ip
import (
"io/ioutil"
"net/http"
"regexp"
"time"
)
type ipValidator struct {
httpClient http.Client
path string
exp *regexp.Regexp
confirmationChannel *chan string
}
func (this *ipValidator) validateUrl(url string) bool {
response, err := this.httpClient.Get(url)
if err != nil {
return false
}
defer response.Body.Close()
if response.StatusCode != http.StatusOK {
return false
}
bodyBytes, _ := ioutil.ReadAll(response.Body)
result := this.exp.Match(bodyBytes)
if result && this.confirmationChannel != nil {
*this.confirmationChannel <- url
}
return result
}
func (this *ipValidator) ValidateIp(addr ip) bool {
httpResult := this.validateUrl("http://" + addr.ToString() + this.path)
httpsResult := this.validateUrl("https://" + addr.ToString() + this.path)
return httpResult || httpsResult
}
func (this *ipValidator) GetSuccessChannel() *chan string {
return this.confirmationChannel
}
func NewIpValidadtor(path string, exp *regexp.Regexp) ipValidator {
return newValidator(path, exp, nil)
}
func NewAsyncIpValidator(path string, exp *regexp.Regexp) ipValidator {
c := make(chan string)
return newValidator(path, exp, &c)
}
func newValidator(path string, exp *regexp.Regexp, c *chan string) ipValidator {
httpClient := http.Client{
Timeout: time.Second * 2,
}
return ipValidator{httpClient, path, exp, c}
}
main.go
package main
import (
"./ip"
"fmt"
"os"
"regexp"
)
func processOutput(c *chan string) {
for true {
url := <- *c
fmt.Println(url)
}
}
func main() {
args := os.Args[1:]
fmt.Printf("path: %s regex: %s", args[0], args[1])
regexp, regexpError := regexp.Compile(args[1])
if regexpError != nil {
fmt.Println("The provided regexp is not valid")
return
}
currentIp, _ := ip.NewIp("172.217.22.174")
validator := ip.NewAsyncIpValidator(args[0], regexp)
successChannel := validator.GetSuccessChannel()
go processOutput(successChannel)
for currentIp.HasMore() {
go validator.ValidateIp(currentIp)
currentIp = currentIp.Increment()
}
}
Note the line that says go validator.ValidateIp(currentIp) in main.go. Should I remove the word "go" to execute everything within the main routine, the code works as expected -> it sends requests to IP addresses starting 172.217.22.174 and should one of them return a legitimate result that matches the regexp that the ipValidator was initialized with, the URL is passed to the channel and the value is printed out by processOutput function from main.go. The issue is that simply adding go in front of validator.ValidateIp(currentIp) breaks that functionality. In fact, according to the debugger, I never seem to go past the line that says response, err := this.httpClient.Get(url) in validator.go.
The struggle is real. Should I decide to scan the whole internet, there's 256^4 IP addresses to go through. It will take years, unless I find a way to split the process into multiple routines.

Related

How to validate API key with function in Martini?

So I currently have a function that will take in a string APIKey to check it against my Mongo collection. If nothing is found (not authenticated), it returns false - if a user is found, it returns true. My problem, however, is I'm unsure how to integrate this with a Martini POST route. Here is my code:
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/go-martini/martini"
_ "github.com/joho/godotenv/autoload"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
type User struct {
Name string
APIKey string
}
func validateAPIKey(users *mongo.Collection, APIKey string) bool {
var user User
filter := bson.D{{"APIKey", APIKey}}
if err := users.FindOne(context.TODO(), filter).Decode(&user); err != nil {
fmt.Printf("Found 0 results for API Key: %s\n", APIKey)
return false
}
fmt.Printf("Found: %s\n", user.Name)
return true
}
func uploadHandler() {
}
func main() {
mongoURI := os.Getenv("MONGO_URI")
mongoOptions := options.Client().ApplyURI(mongoURI)
client, _ := mongo.Connect(context.TODO(), mongoOptions)
defer client.Disconnect(context.TODO())
if err := client.Ping(context.TODO(), nil); err != nil {
log.Fatal(err, "Unable to access MongoDB server, exiting...")
}
// users := client.Database("sharex_api").Collection("authorized_users") // commented out when testing to ignore unused warnings
m := martini.Classic()
m.Post("/api/v1/upload", uploadHandler)
m.RunOnAddr(":8085")
}
The validateAPIKey function works exactly as intended if tested alone, I am just unsure how I would run this function for a specific endpoint (in this case, /api/v1/upload).

Golang web spider with pagination processing

I'm working on a golang web crawler that should parse the search results on some specific search engine. The main difficulty - parsing with concurrency, or rather, in processing pagination such as
← Previous 1 2 3 4 5 ... 34 Next →. All things work fine except recursive crawling of paginated results. Look at my code:
package main
import (
"bufio"
"errors"
"fmt"
"net"
"strings"
"github.com/antchfx/htmlquery"
"golang.org/x/net/html"
)
type Spider struct {
HandledUrls []string
}
func NewSpider(url string) *Spider {
// ...
}
func requestProvider(request string) string {
// Everything is good here
}
func connectProvider(url string) net.Conn {
// Also works
}
// getContents makes request to search engine and gets response body
func getContents(request string) *html.Node {
// ...
}
// CheckResult controls empty search results
func checkResult(node *html.Node) bool {
// ...
}
func (s *Spider) checkVisited(url string) bool {
// ...
}
// Here is the problems
func (s *Spider) Crawl(url string, channelDone chan bool, channelBody chan *html.Node) {
body := getContents(url)
defer func() {
channelDone <- true
}()
if checkResult(body) == false {
err := errors.New("Nothing found there")
ErrFatal(err)
}
channelBody <- body
s.HandledUrls = append(s.HandledUrls, url)
fmt.Println("Handled ", url)
newUrls := s.getPagination(body)
for _, u := range newUrls {
fmt.Println(u)
}
for i, newurl := range newUrls {
if s.checkVisited(newurl) == false {
fmt.Println(i)
go s.Crawl(newurl, channelDone, channelBody)
}
}
}
func (s *Spider) getPagination(node *html.Node) []string {
// ...
}
func main() {
request := requestProvider(*requestFlag)
channelBody := make(chan *html.Node, 120)
channelDone := make(chan bool)
var parsedHosts []*Host
s := NewSpider(request)
go s.Crawl(request, channelDone, channelBody)
for {
select {
case recievedNode := <-channelBody:
// ...
for _, h := range newHosts {
parsedHosts = append(parsedHosts, h)
fmt.Println("added", h.HostUrl)
}
case <-channelDone:
fmt.Println("Jobs finished")
}
break
}
}
It always returns the first page only, no pagination. Same GetPagination(...) works good. Please tell me, where is my error(s).
Hope Google Translate was correct.
The problem is probably that main exits before all goroutine finished.
First, there is a break after the select statement and it runs uncodintionally after first time a channel is read. That ensures the main func returns after the first time you send something over channelBody.
Secondly, using channelDone is not the right way here. The most idomatic approach would be using a sync.WaitGroup. Before starting each goroutine, use WG.Add(1) and replace the defer with defer WG.Done(); In main, use WG.Wait(). Please be aware that you should use a pointer to refer to the WaitGroup. You can read more here.

Golang downcasting list of structs

I want to be able to unmarshal yaml files less rigidly. That is, my library has a predefined number of options the yaml file must have. Then, the user should be able to extend this to include any custom options.
Here is what I have
package main
import (
"net/http"
"yamlcms"
"github.com/julienschmidt/httprouter"
)
type Page struct {
*yamlcms.Page
Title string
Date string
}
func getBlogRoutes() {
pages := []*Page{}
yamlcms.ReadDir("html", pages)
}
// This section is a work in progress, I only include it for loose context
func main() {
router := httprouter.New()
//blogRoutes := getBlogRoutes()
//for _, blogRoute := range *blogRoutes {
// router.Handle(blogRoute.Method, blogRoute.Pattern,
// func(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {})
//}
http.ListenAndServe(":8080", router)
}
Here is the yamlcms package:
package yamlcms
import (
"io/ioutil"
"os"
"strings"
"gopkg.in/yaml.v2"
)
type Page struct {
Slug string `yaml:"slug"`
File string `yaml:"file"`
}
func (page *Page) ReadFile(file string) (err error) {
fileContents, err := ioutil.ReadFile(file)
if err != nil {
return
}
err = yaml.Unmarshal(fileContents, &page)
return
}
func isYamlFile(fileInfo os.FileInfo) bool {
return !fileInfo.IsDir() && strings.HasSuffix(fileInfo.Name(), ".yaml")
}
func ReadDir(dir string, pages []*Page) (err error) {
filesInfo, err := ioutil.ReadDir(dir)
if err != nil {
return
}
for i, fileInfo := range filesInfo {
if isYamlFile(fileInfo) {
pages[i].ReadFile(fileInfo.Name())
}
}
return
}
There is a compiler issue here:
src/main.go:19: cannot use pages (type []*Page) as type []*yamlcms.Page in argument to yamlcms.ReadDir
My main intent in this question is to learn the idiomatic way of doing this kind of thing in Go. Other 3rd-party solutions may exist but I am not immediately interested in them because I have problems like this frequently in Go having to do with inheritance, etc. So along the lines of what I've presented, how can I best (idiomatically) accomplish what I am going for?
EDIT:
So I've made some changes as suggested. Now I have this:
type FileReader interface {
ReadFile(file string) error
}
func ReadDir(dir string, pages []*FileReader) (err error) {
filesInfo, err := ioutil.ReadDir(dir)
if err != nil {
return
}
for i, fileInfo := range filesInfo {
if isYamlFile(fileInfo) {
(*pages[i]).ReadFile(fileInfo.Name())
}
}
return
}
However, I still get a similar compiler error:
src/main.go:19: cannot use pages (type []*Page) as type []*yamlcms.FileReader in argument to yamlcms.ReadDir
Even though main.Page should be a FileReader because it embeds yamlcms.Page.
EDIT: I forgot that slices of interfaces don't work like that. You'd need to allocate a new slice, convert all pages to FileReaders, call the function, and convert them back.
Another possible solution is refactoring yamlcms.ReadDir to return the contents of the files, so that they could be unmarshaled later:
// In yamlcms.
func ReadYAMLFilesInDir(dir string) ([][]byte, error) { ... }
// In client code.
files := yamlcms.ReadYAMLFilesInDir("dir")
for i := range pages {
if err := yaml.Unmarshal(files[i], &pages[i]); err != nil { return err }
}
The original answer:
There are no such things as inheritance or casting in Go. Prefer composition and interfaces in your designs. In your case, you can redefine your yamlcms.ReadDir to accept an interface, FileReader.
type FileReader interface {
ReadFile(file string) error
}
Both yamlcms.Page and main.Page will implement this, as the latter embeds the former.

Go url parameters mapping

Is there a native way for inplace url parameters in native Go?
For Example, if I have a URL: http://localhost:8080/blob/123/test I want to use this URL as /blob/{id}/test.
This is not a question about finding go libraries. I am starting with the basic question, does go itself provide a basic facility to do this natively.
There is no built in simple way to do this, however, it is not hard to do.
This is how I do it, without adding a particular library. It is placed in a function so that you can invoke a simple getCode() function within your request handler.
Basically you just split the r.URL.Path into parts, and then analyse the parts.
// Extract a code from a URL. Return the default code if code
// is missing or code is not a valid number.
func getCode(r *http.Request, defaultCode int) (int, string) {
p := strings.Split(r.URL.Path, "/")
if len(p) == 1 {
return defaultCode, p[0]
} else if len(p) > 1 {
code, err := strconv.Atoi(p[0])
if err == nil {
return code, p[1]
} else {
return defaultCode, p[1]
}
} else {
return defaultCode, ""
}
}
Well, without external libraries you can't, but may I recommend two excellent ones:
httprouter - https://github.com/julienschmidt/httprouter - is extremely fast and very lightweight. It's faster than the standard library's router, and it creates 0 allocations per call, which is great in a GCed language.
Gorilla Mux - http://www.gorillatoolkit.org/pkg/mux -
Very popular, nice interface, nice community.
Example usage of httprouter:
func Hello(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {
fmt.Fprintf(w, "hello, %s!\n", ps.ByName("name"))
}
func main() {
router := httprouter.New()
router.GET("/hello/:name", Hello)
log.Fatal(http.ListenAndServe(":8080", router))
}
What about trying using regex, and find a named group in your url, like playground:
package main
import (
"fmt"
"net/url"
"regexp"
)
var myExp = regexp.MustCompile(`/blob/(?P<id>\d+)/test`) // use (?P<id>[a-zA-Z]+) if the id is alphapatic
func main() {
s := "http://localhost:8080/blob/123/test"
u, err := url.Parse(s)
if err != nil {
panic(err)
}
fmt.Println(u.Path)
match := myExp.FindStringSubmatch(s) // or match := myExp.FindStringSubmatch(u.Path)
result := make(map[string]string)
for i, name := range myExp.SubexpNames() {
if i != 0 && name != "" {
result[name] = match[i]
}
}
fmt.Printf("id: %s\n", result["id"])
}
output
/blob/123/test
id: 123
Below full code to use it with url, that is receiving http://localhost:8000/hello/John/58 and returning http://localhost:8000/hello/John/58:
package main
import (
"fmt"
"net/http"
"regexp"
"strconv"
)
var helloExp = regexp.MustCompile(`/hello/(?P<name>[a-zA-Z]+)/(?P<age>\d+)`)
func hello(w http.ResponseWriter, req *http.Request) {
match := helloExp.FindStringSubmatch(req.URL.Path)
if len(match) > 0 {
result := make(map[string]string)
for i, name := range helloExp.SubexpNames() {
if i != 0 && name != "" {
result[name] = match[i]
}
}
if _, err := strconv.Atoi(result["age"]); err == nil {
fmt.Fprintf(w, "Hello, %v year old named %s!", result["age"], result["name"])
} else {
fmt.Fprintf(w, "Sorry, not accepted age!")
}
} else {
fmt.Fprintf(w, "Wrong url\n")
}
}
func main() {
http.HandleFunc("/hello/", hello)
http.ListenAndServe(":8090", nil)
}
How about writing your own url generator (extend net/url a little bit) as below.
// --- This is how does it work like --- //
url, _ := rest.NewURLGen("http", "stack.over.flow", "1234").
Pattern(foo/:foo_id/bar/:bar_id).
ParamQuery("foo_id", "abc").
ParamQuery("bar_id", "xyz").
ParamQuery("page", "1").
ParamQuery("offset", "5").
Do()
log.Printf("url: %s", url)
// url: http://stack.over.flow:1234/foo/abc/bar/xyz?page=1&offset=5
// --- Your own url generator would be like below --- //
package rest
import (
"log"
"net/url"
"strings"
"straas.io/base/errors"
"github.com/jinzhu/copier"
)
// URLGen generates request URL
type URLGen struct {
url.URL
pattern string
paramPath map[string]string
paramQuery map[string]string
}
// NewURLGen new a URLGen
func NewURLGen(scheme, host, port string) *URLGen {
h := host
if port != "" {
h += ":" + port
}
ug := URLGen{}
ug.Scheme = scheme
ug.Host = h
ug.paramPath = make(map[string]string)
ug.paramQuery = make(map[string]string)
return &ug
}
// Clone return copied self
func (u *URLGen) Clone() *URLGen {
cloned := &URLGen{}
cloned.paramPath = make(map[string]string)
cloned.paramQuery = make(map[string]string)
err := copier.Copy(cloned, u)
if err != nil {
log.Panic(err)
}
return cloned
}
// Pattern sets path pattern with placeholder (format `:<holder_name>`)
func (u *URLGen) Pattern(pattern string) *URLGen {
u.pattern = pattern
return u
}
// ParamPath builds path part of URL
func (u *URLGen) ParamPath(key, value string) *URLGen {
u.paramPath[key] = value
return u
}
// ParamQuery builds query part of URL
func (u *URLGen) ParamQuery(key, value string) *URLGen {
u.paramQuery[key] = value
return u
}
// Do returns final URL result.
// The result URL string is possible not escaped correctly.
// This is input for `gorequest`, `gorequest` will handle URL escape.
func (u *URLGen) Do() (string, error) {
err := u.buildPath()
if err != nil {
return "", err
}
u.buildQuery()
return u.String(), nil
}
func (u *URLGen) buildPath() error {
r := []string{}
p := strings.Split(u.pattern, "/")
for i := range p {
part := p[i]
if strings.Contains(part, ":") {
key := strings.TrimPrefix(p[i], ":")
if val, ok := u.paramPath[key]; ok {
r = append(r, val)
} else {
if i != len(p)-1 {
// if placeholder at the end of pattern, it could be not provided
return errors.Errorf("placeholder[%s] not provided", key)
}
}
continue
}
r = append(r, part)
}
u.Path = strings.Join(r, "/")
return nil
}
func (u *URLGen) buildQuery() {
q := u.URL.Query()
for k, v := range u.paramQuery {
q.Set(k, v)
}
u.RawQuery = q.Encode()
}
With net/http the following would trigger when calling localhost:8080/blob/123/test
http.HandleFunc("/blob/", yourHandlerFunction)
Then inside yourHandlerFunction, manually parse r.URL.Path to find 123.
Note that if you don't add a trailing / it won't work. The following would only trigger when calling localhost:8080/blob:
http.HandleFunc("/blob", yourHandlerFunction)
As of 19-Sep-22, with go version 1.19, instance of http.request URL has a method called Query, which will return a map, which is a parsed query string.
func helloHandler(res http.ResponseWriter, req *http.Request) {
// when request URL is `http://localhost:3000/?first=hello&second=world`
fmt.Println(req.URL.Query()) // outputs , map[second:[world] first:[hello]]
res.Write([]byte("Hello World Web"))
}
No way without standard library. Why you don't want to try some library? I think its not so hard to use it, just go get bla bla bla
I use Beego. Its MVC style.
how about a simple utility function ?
func withURLParams(u url.URL, param, val string) url.URL{
u.Path = strings.ReplaceAll(u.Path, param, val)
return u
}
you can use it like this:
u, err := url.Parse("http://localhost:8080/blob/:id/test")
if err != nil {
return nil, err
}
u := withURLParams(u, ":id","123")
// now u.String() is http://localhost:8080/blob/123/test
If you need a framework and you think it will be slow because it's 'bigger' than a router or net/http, then you 're wrong.
Iris is the fastest go web framework that you will ever find, so far according to all benchmarks.
Install by
go get gopkg.in/kataras/iris.v6
Django templates goes easy with iris:
import (
"gopkg.in/kataras/iris.v6"
"gopkg.in/kataras/iris.v6/adaptors/httprouter"
"gopkg.in/kataras/iris.v6/adaptors/view" // <-----
)
func main() {
app := iris.New()
app.Adapt(iris.DevLogger())
app.Adapt(httprouter.New()) // you can choose gorillamux too
app.Adapt(view.Django("./templates", ".html")) // <-----
// RESOURCE: http://127.0.0.1:8080/hi
// METHOD: "GET"
app.Get("/hi", hi)
app.Listen(":8080")
}
func hi(ctx *iris.Context){
ctx.Render("hi.html", iris.Map{"Name": "iris"})
}

Ensure a URI is valid

I'm trying to ensure that URLs passed to my go program are valid. However, I can't seem to work out how to. I thought I could just feed it through url.Parse, but that doesn't seem to do the job.
package main
import (
"fmt"
"net/url"
)
func main() {
url, err := url.Parse("http:::/not.valid/a//a??a?b=&&c#hi")
if err != nil {
panic(err)
}
fmt.Println("It's valid!", url.String())
}
playground
Is there anything along the lines of filter_var I can use?
You can check that your URL has a Scheme, Host, and/or a Path.
If you inspect the URL returned, you can see that the invalid part is inserted into the Opaque data section (so in a sense, it is valid).
url.URL{Scheme:"http", Opaque:"::/not.valid/a//a", Host:"", Path:"", RawQuery:"?a?b=&&c", Fragment:"hi"}
If you parse a URL and don't have a Scheme, Host and Path you can probably assume it's not valid. (though a host without a path is often OK, since it implies /, so you need to check for that)
u, err := url.Parse("http:::/not.valid/a//a??a?b=&&c#hi")
if err != nil {
log.Fatal(err)
}
if u.Scheme == "" || u.Host == "" || u.Path == "" {
log.Fatal("invalid URL")
}
have a try of this package go validator and the IsURL func is what you want.(you can use regexp package as well)
package main
import (
"fmt"
"regexp"
)
const URL string = `^((ftp|http|https):\/\/)?(\S+(:\S*)?#)?((([1-9]\d?|1\d\d|2[01]\d|22[0-3])(\.(1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.([0-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(((([a-z\x{00a1}-\x{ffff}0-9]+-?-?_?)*[a-z\x{00a1}-\x{ffff}0-9]+)\.)?)?(([a-z\x{00a1}-\x{ffff}0-9]+-?-?_?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.([a-z\x{00a1}-\x{ffff}]{2,}))?)|localhost)(:(\d{1,5}))?((\/|\?|#)[^\s]*)?$`
func Matches(str, pattern string) bool {
match, _ := regexp.MatchString(pattern, str)
return match
}
func main() {
u1 := "http:::/not.valid/a//a??a?b=&&c#hi"
u2 := "http://golang.fastrl.com"
func(us ...string) {
for _, u := range us {
fmt.Println(Matches(u, URL))
}
}(u1, u2)
}
The url.Parse() function will return an error mainly if viaRequest is true, meaning if the URL is assumed to have arrived via an HTTP request.
Which is not the case when you call url.Parse() directly: see the source code.
if url, err = parse(u, false); err != nil {
return nil, err
}
And func parse(rawurl string, viaRequest bool) (url *URL, err error) { only returns err if viaRequest is true.
That is why you never see any error when using url.Parse() directly.
In that latter case, where no err is ever returned, you can check the fields of the URL object returned.
An empty url.Scheme, or an url.Path which isn't the one expected would indicate an error.

Resources