How to join URLs in Go? - go

I'm creating a web crawler in GO. And after parsing and scraping all the URLs on the page, there are scenarios where I'm getting the hyperlinks in different formats:
/my/next/page
my/next/page
//my_next_page
https://different-domain.com
As you can see, there are many combinations here, and in some cases, the URL is entirely a different domain. The regular joins of the URLs will not work here. How to join URLs correctly so each resultant URL join can be fed into the crawler again to parse and scrape continuously?

URLs handling is different because of differences in the hyperlinks. As mentioned above, the hyperlink in an anchor tag can be in many formats. Here is the solution to handle the joins of a requested URL to the crawl and hyperlinks respectively to that URL.
func joinURLs(baseURL, hyperlink string) string {
parse, err := url.Parse(hyperlink)
if err != nil {
return ""
}
base, err := url.Parse(baseURL)
if err != nil {
return ""
}
nextURLToCrawl := base.ResolveReference(parse)
return nextURLToCrawl.String()
}
The best thing about this function is that it can handle functions that are on a different domain without having to maintain the checks whether the hostname is the same or not. Now your code is much more declarative.

Related

I add file to my API and got invalid character '-' in numeric literal in POST API

I know this code need to send a JSON instead of form data in the API
err := ctx.ShouldBindJSON(&modelAdd)
if err != nil {
return err
}
But I need to add file, is there anything like ShouldBindJSON but for FormData?
You can use ShouldBind to get data from form data as the documentation says
https://github.com/gin-gonic/gin#model-binding-and-validation

Serving image from string in http golang

I need to display image with HTTP GET but the thing is i can only use String as the response body.
So for example (headers:image/png, body:Aeacxxffsaf(encoded representation or whatever) )
It's more or less like this web https://codebeautify.org/base64-to-image-converter, but i want the string to output into image when using http GET.
Some code snippets explanations here:
//string that is generated from image (encoded)
encString := "iVBORw0KGgoAAAANSUhEUgAAANIAAAAzCAYAAADigVZl..."
//set http headers to png
//and assign the encString to the body
Is there any way for that? By using string only to serve image
Sorry if my question is a bit confusing but it is the best i can describe it, i have been searching for the answer since several days ago
You do that just like with any other content, just decode the base64 first.
func handler(w http.ResponseWriter, r *http.Request) {
encString := "iVBORw0KGgoAAAANSUhEUgAAANIAAAAzCAYAAADigVZl..."
bytes, err := base64.StdEncoding.DecodeString(encString)
if err != nil {
// todo
}
w.Header().Set("Content-Type", "image/png")
_, err = w.Write(bytes)
if err != nil {
// todo
}
}
However, if you want to display it in browser without decoding, then you will have to do some client-side hacking.
It depends on how you interpret it on the other side.
The encoded b64, b32, byte array, etc. string all represent the same pattern of bytes.
To get it to display as an image on the other side all depends on how you interpret it from the other side.
EDIT:
I see what you mean now. Have a look at the image/jpeg package.
Sadly I don't have a code snippet to share with you right now, but with this you should be able to load your image onto a buffer and decode it to get an image.Image object.
Afterwards you can use that image.Image object to write it into your response body. Sadly I can't provide a code snippet right this second, but do let me know if it works for you.
remember to set your writer's appropriate header.
w.Header().Set("Content-Type", "image/jpeg")

Basic web tweaks that all applications should have

Currently my web app is just a router and handlers.
What are some important things I am missing to make this production worthy?
I believe I have to set the # of procs to ensure this uses maximum goroutines?
Should I be using output buffering?
Anything else you see missing that is best-practise?
var (
templates = template.Must(template.ParseFiles("templates/home.html")
)
func main() {
r := mux.NewRouter()
r.HandleFunc("/", WelcomeHandler)
http.ListenAndServe(":9000", r)
}
func WelcomeHandler(w http.ResponseWriter, r *http.Request) {
homePage, err := api.LoadHomePage()
if err != nil {
}
tmpl := "home"
renderTemplate(w, tmpl, homePage)
}
func renderTemplate(w http.ResponseWriter, tmpl string, hp *HomePage) {
err := templates.ExecuteTemplate(w, tmpl+".html", hp)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
You don't need to set/change runtime.GOMAXPROCS() as since Go 1.5 it defaults to the number of available CPU cores.
Buffering output? From the performance point of view, you don't need to. But there may be other considerations for which you may.
For example, your renderTemplate() function may potentially panic. If executing the template starts writing to the output, it involves setting the HTTP response code and other headers prior to writing data. And if a template execution error occurs after that, it will return an error, and so your code attempts to send back an error response. At this point HTTP headers are already written, and this http.Error() function will try to set headers again => panic.
One way to avoid this is to first render the template into a buffer (e.g. bytes.Buffer), and if no error is returned by the template execution, then you can write the content of the buffer to the response writer. If error occurs, then of course you won't write the content of the buffer, but send back an error response just like you did.
To sum it up, your code is production ready performance-wise (excluding the way you handle template execution errors).
WelcomeHandler should return when err != nil is true.
Log the error when one is hit to help investigation.
Place templates = template.Must(template.ParseFiles("templates/home.html") in the init. Split it into separate lines. If template.ParseFiles returns an then error make a Fatal log. And if you have multiple templates to initialize then initialize them in goroutines with a common WaitGroup to speed up the startup.
Since you are using mux, HTTP Server is too clean with its URLs might also be good to know.
You might also want to reconsider the decision of letting the user's know why they got the http.StatusInternalServerError response.
Setting the GOMAXPROCS > 1 if you have more the one core would definitely be a good idea but I would keep it less than number of cores available.

Scandinavian characters not working in go-lang go-instagram API bindings

Hi I'm trying to wrap my head around what seems to be a problem with multibyte support in this open source library (https://github.com/carbocation/go-instagram/). I am using the code below to retrieve information about the tag blue in swedish. How ever I get an empty array when trying.
fmt.Println("Starting instagram download.")
client := instagram.NewClient(nil)
client.ClientID = "myid"
media, _, _ := client.Tags.RecentMedia("blÄ", nil)
fmt.Println(media)
I have tried using the api trough the browser and there are several pictures tagged with the tag. I have also tried using the code snippet with tags in English like blue and that returns the latest pictures as well. I would be glad if any one could explain why this might happen. Id like to update the lib so it supports multi-byte but I haven't got the go knowledge required. Is this a go problem or a problem with the library?
Thank you
The problem is in validTagName():
// Strip out things we know Instagram won't accept. For example, hyphens.
func validTagName(tagName string) (bool, error) {
//\W matches any non-word character
reg, err := regexp.Compile(`\W`)
if err != nil {
return false, err
}
if reg.MatchString(tagName) {
return false, nil
}
return true, nil
}
In Go, \W matches precisely [^0-9A-Za-z_]. This validation check is incorrect.

How do I extract a string from an interface{} variable in Go?

I'm new to the Go language.
I'm making a small web application with Go, the Gorilla toolkit, and the Mustache template engine.
Everything works great so far.
I use hoisie/mustache and gorilla/sessions, but I'm struggling with passing variables from one to the other. I have a map[string]interface{} that I pass to the template engine. When a user is logged in, I want to take the user's session data and merge it with my map[string]interface{} so that the data becomes available for rendering.
The problem is that gorilla/sessions returns a map[interface{}]interface{} so the merge cannot be done (with the skills I have in this language).
I thought about extracting the string inside the interface{} variable (reflection?).
I also thought about making my session data a map[interface{}]interface{} just like what gorilla/sessions provides. But I'm new to Go and I don't know if that can be considered best practice. As a Java guy, I feel like working with variables of type Object.
I would like to know the best approach for this problem in your opinion.
Thanks in advance.
You'll need to perform type assertions: specifically this section of Effective Go.
str, ok := value.(string)
if ok {
fmt.Printf("string value is: %q\n", str)
} else {
fmt.Printf("value is not a string\n")
}
A more precise example given what you're trying to do:
if userID, ok := session.Values["userID"].(string); ok {
// User ID is set
} else {
// User ID is not set/wrong type; raise an error/HTTP 500/re-direct
}
type M map[string]interface{}
err := t.ExecuteTemplate(w, "user_form.tmpl", M{"current_user": userID})
if err != nil {
// handle it
}
What you're doing is ensuring that the userID you pull out of the interface{} container is actually a string. If it's not, you handle it (if you don't, you'll program will panic as per the docs).
If it is, you pass it to your template where you can access it as {{ .current_user }}. M is a quick shortcut that I use to avoid having to type out map[string]interface{} every time I call my template rendering function.

Resources