Right now I'm fetching urls from indiegogo as part of a side project using the basic get request template found [here][1]. I then translate the byte data into a string using
responseText, err:= ioutil.ReadAll(response.Body)
trueText:= string(responseText)
with appropriate error handling where needed
It works fine for repeated attempts at getting and some other urls of varying length(at least as large as the previous url and some longer than the next).
Strangely, when I attempt to get it breaks and throws a runtime error of
panic: runtime error: index out of range
and exits with a status of 2. I'm curious as to what the issue could be.
I know it isn't indiegogo getting angry about my once a minute requests and cutting my connection because I can request continiously for 20 minutes at with no issue. Give it a bit of downtime and it still completely breaks on
Thanks for the assistance
EDIT, it appears as though it was a malformed bit of html in some of the pages that messed with a loop I was running based on the content that managed to break go in the runtime on only some urls. Thanks for the help
[1]:
There is no error when getting from the url and converting the body to the Go string type. For example,
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
url := "http://www.indiegogo.com/projects/culcharge-smallest-usb-charge-and-data-cable-for-iphone-and-android"
res, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
body, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
text := string(body)
fmt.Println(len(body), len(text))
}
Output:
66363 66363
You didn't provide us with a small fragment of code which compiles, runs, and fails in the manner you describe. That leaves us all guessing.
Related
I'd like to profile my Go HTTP server application to see if there are places where it can be optimized. I'm using the fasthttp package with the fasthttp/router package, and I'm struggling to figure out how to hook up pprof.
The basic setup looks like this, obviously very abridged:
func main() {
flag.Parse()
r := router.New()
r.GET("/", index)
r.GET("/myroute", myrouteFn)
h := r.Handler
if err := limitedListenAndServe(*addr, fasthttplogger.Tiny(h)); err != nil {
log.Fatalf("Error in ListenAndServe: %s", err)
}
}
First, I tried following a fairly straightforward guide like this one, and added this line in my main() function (per the guide) in addition to the corresponding import. That changed the above to this:
func main() {
flag.Parse()
r := router.New()
r.GET("/", index)
r.GET("/myroute", myrouteFn)
h := r.Handler
if err := limitedListenAndServe(*addr, fasthttplogger.Tiny(h)); err != nil {
log.Fatalf("Error in ListenAndServe: %s", err)
}
defer profile.Start().Stop()
}
After doing that, I ran my program, made a bunch of requests that I was interested in profiling, and then terminated the server. That created a cpu.pprof file, but the file was empty (zero bytes) when I went to run it through the go tool pprof command to generate the graph.
After a bit more sleuthing I found this Gist that I suspect would work if I were using totally vanilla fasthttp.
Trying to combine that with my application, I'm still stuck. Conceptually, I think the solution is to use the fasthttpadaptor to convert the net/http/pprof to a fasthttp route. But the Gist that I was looking at is using its own router to do the mux, I'd and rather not rewrite all the routes using a different router in my server.
How would I go about making the profiling data available here?
You can use net/http/pprof for profiling.
fasthttp provides custom implementation for the same. You can use that same as the net/http/pprof.
To use this, register the handler as:
import "github.com/valyala/fasthttp/pprofhandler"
...
r.GET("/debug/pprof/{profile:*}", pprofhandler.PprofHandler)
Then you can use this to profile.
go tool pprof http://host:port/debug/pprof/profile
Or you can also visit http://host:port/debug/pprof/ to see more types of profiles available.
I have a very strange problem with a simple HTTP Get Request in Golang.
Every request in Golang to https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook needs about 6-8 seconds (!)
If same request fired in Chrome, with Postman or with Powershell it needs less than a second.
Does somebody has a clue why this happens?
My Code:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, _ := http.NewRequest("GET", "https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook", nil)
response, err := client.Do(req)
if err != nil && response == nil {
log.Fatalf("Error on request. %v", err)
}
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatalf("Couldn't get response body. %v", err)
}
fmt.Print(string(body))
}
The site you are trying to access is behind the Akamai CDN:
$ dig www.alltron.ch
...
www.alltron.ch. 152 IN CNAME competec.botmanager.edgekey.net.
competec.botmanager.edgekey.net. 7052 IN CNAME e9179.f.akamaiedge.net.
e9179.f.akamaiedge.net. 162 IN A 2.20.176.40
Akamai offers its customers a detection of web clients which are not browsers so that the customers can keep bots away or slowing bots down.
As can be seen from Strange CURL issue with a particular website SSL certificate and Scraping attempts getting 403 error this kind of detection mainly cares about having a Accept-Language header, having a Connection header with the value Keep-Alive and having a User-Agent which matches Mozilla/....
This means the following code changes result in an immediate response:
req, _ := http.NewRequest("GET", "https://www.alltron.ch/json/searchSuggestion?searchTerm=notebook", nil)
req.Header.Set("Connection","Keep-Alive")
req.Header.Set("Accept-Language","en-US")
req.Header.Set("User-Agent","Mozilla/5.0")
Still, the site obviously does not like bots and you should adhere to these wishes and not stress the site too much (like doing lots of information scraping). And, the bot detection done by Akamai might change without notice, i.e. even if this code fixes the problem now it might no longer work in the future. Such changes will be especially true if many clients bypass the bot detection.
try to disable cache in your chrome and compare to golang
I am trying to figure out how to get a simple bq load command to work with https://godoc.org/cloud.google.com/go/bigquery#Table.LoaderFrom
Running it manually it looks like this:
bq load --source_format=AVRO --ignore_unknown_values --replace=true mydataset.mytable gs://mybucket/table/*
And running it in my golang with exec.Command() successfully looks like this:
exec.Command("bq", "load", "--source_format=AVRO", "--ignore_unknown_values",
"--replace=true", "mydataset.mytable",
"gs://mybucket/table/*")
However, I cannot get this program to run without a segmentation fault when trying to get the load and job.wait to run successfully it seems to be getting a segmentation violation at the job.wait line of the program
package main
import (
"context"
"log"
"cloud.google.com/go/bigquery"
)
func main(){
ctx := context.Background()
client, err := bigquery.NewClient(ctx, "my-project-id")
if err != nil {
// TODO: Handle error.
}
gcsRef := bigquery.NewGCSReference("gs://mybucket/table/*")
gcsRef.SourceFormat = "AVRO"
gcsRef.IgnoreUnknownValues = true
// TODO: set other options on the GCSReference.
ds := client.Dataset("mydataset")
loader := ds.Table("mytable").LoaderFrom(gcsRef)
// TODO: set other options on the Loader.
job, err := loader.Run(ctx)
if err != nil {
// TODO: Handle error.
}
status, err := job.Wait(ctx) //seg faults right here
if err != nil {
// TODO: Handle error.
}
if status.Err() != nil {
// TODO: Handle error.
}
}
The panic is probably coming from a nil pointer reference to the job variable.
I would suggest including a log.Fatal(err)
In all of your err!= nil blocks.
This will help get you closer to why job is not being assigned correctly.
When you're writing one off scripts like this one in go log.Fatal is a great way to exit the program and print exactly what the issue is.
With go you're always trying to bubble errors up the stack to determine if the code should continue to execute, if things can be recovered, or if it's just a fatal thing and you should end the program.
For more info on the logging package checkout here: https://golang.org/pkg/log/
If you're just starting out learning go here are some awesome resources that can help give you ideas on how different types of programs can be designed.
https://github.com/dashpradeep99/https-github.com-miguellgt-books/tree/master/go
Best,
Christopher
I'm studying Go and am a real newbie in this field.
I am facing a problem when I try to copy some value.
What I am doing is:
I want to get some response in [response] using httpRequest.
httpClient := &http.Client{}
response, err := httpClient.Do(req)
if err != nil {
panic(err)
}
After that, I want to save the stored value in response at 'origin.txt'
origin_ ,_:= ioutil.ReadAll(response.Body)
f_, err := os.Create("origin.txt")
f_.Write(origin_);
And I want to get a specific value by using goquery package.
doc, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
log.Fatal(err)
}
doc.Find(".className").Each(func(i int, s *goquery.Selection) {
w.WriteString("============" + strconv.Itoa(i) + "============")
s.Find("tr").Each(func(i int, s_ *goquery.Selection) {
fmt.Println(s_.Text())
w.WriteString(s_.Text())
})
}
)
But in this case, I can get a value exactly what I want from 2) but cannot get anything from 3).
At first, I think the problem is, the response object at 3) is affected by 2) action. Because it is a reference object.
So I tried to copy it to another object and then do it again.
origin := *response
but, I got the same result as first.
What should I do?
How can I assign a reference value to another one by its value?
Should I request it twice for each attempt?
I actually don't see where you use shared resources between 2 and 3.
However that being said origin := *response won't buy you much. The data (response.Body) is a io.ReadCloser. The ioutil.ReadAll() will consume and store all the data that the stream has. You only get to do this once.
However you have the data stored in origin. If you need another io.Reader for that data (say for case 3), then you can make that byte slice look like an io.Reader again: bytes.NewReader(origin).
I'm trying to use ServeContent to serve files (which may be large movie files, so it will use byte ranges), but I'm not sure how to handle the modified time. If I use the following program to serve a movie, it fails if I give the actual modified time of the file as shown. I think what happens is that the first request works, but subsequent ones (of different byte ranges of the file) think it already has the file and therefore they fail and the movie doesn't play. Is there something I am doing wrong?
Note that the code works (and the movie plays properly) if I use time.Now() instead of the actual modified time of the file, but that isn't correct of course.
package main
import (
"fmt"
"net/http"
"os"
"path"
"time"
)
func main() {
http.HandleFunc("/", handler)
http.ListenAndServe(":3000", nil)
}
func handler(w http.ResponseWriter, r *http.Request) {
filePath := "." + r.URL.Path
file, err := os.Open(filePath)
if err != nil {
fmt.Printf("%s not found\n", filePath)
w.WriteHeader(http.StatusNotFound)
fmt.Fprint(w, "<html><body style='font-size:100px'>four-oh-four</body></html>")
return
}
defer file.Close()
fileStat, err := os.Stat(filePath)
if err != nil {
fmt.Println(err)
}
fmt.Printf("serve %s\n", filePath)
_, filename := path.Split(filePath)
t := fileStat.ModTime()
fmt.Printf("time %+v\n", t)
http.ServeContent(w, r, filename, t, file)
}
According to the documentation,
If modtime is not the zero time, ServeContent includes it in a Last-Modified header in the response. If the request includes an If-Modified-Since header, ServeContent uses modtime to decide whether the content needs to be sent at all.
So, depending on whether the client sends the If-Modified-Since header, this function will behave correctly or not. This seems to be the intended behaviour, and is indeed useful in normal situations to optimize the server's bandwidth.
In your case, however, as you have to handle partial-content requests, unless the first request returns a 30X HTTP code, you have no reason to handle this mechanism for subsequent requests.
The correct way to disable this behaviour is to pass a "zero" date to ServeContent:
http.ServeContent(w, r, filename, time.Time{}, file)
You could try to parse the request range header in order to only pass a zero date if necessary.