Parsing prometheus metrics from file and updating counters - go

I've a go application that gets run periodically by a batch. Each run, it should read some prometheus metrics from a file, run its logic, update a success/fail counter, and write metrics back out to a file.
From looking at How to parse Prometheus data as well as the godocs for prometheus, I'm able to read in the file, but I don't know how to update app_processed_total with the value returned by expfmt.ExtractSamples().
This is what I've done so far. Could someone please tell me how should I proceed from here? How can I typecast the Vector I got into a CounterVec?
package main
import (
"fmt"
"net/http"
"strings"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
"github.com/prometheus/common/model"
)
var (
fileOnDisk = prometheus.NewRegistry()
processedTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "app_processed_total",
Help: "Number of times ran",
}, []string{"status"})
)
func doInit() {
prometheus.MustRegister(processedTotal)
}
func recordMetrics() {
go func() {
for {
processedTotal.With(prometheus.Labels{"status": "ok"}).Inc()
time.Sleep(5 * time.Second)
}
}()
}
func readExistingMetrics() {
var parser expfmt.TextParser
text := `
# HELP app_processed_total Number of times ran
# TYPE app_processed_total counter
app_processed_total{status="ok"} 300
`
parseText := func() ([]*dto.MetricFamily, error) {
parsed, err := parser.TextToMetricFamilies(strings.NewReader(text))
if err != nil {
return nil, err
}
var result []*dto.MetricFamily
for _, mf := range parsed {
result = append(result, mf)
}
return result, nil
}
gatherers := prometheus.Gatherers{
fileOnDisk,
prometheus.GathererFunc(parseText),
}
gathering, err := gatherers.Gather()
if err != nil {
fmt.Println(err)
}
fmt.Println("gathering: ", gathering)
for _, g := range gathering {
vector, err := expfmt.ExtractSamples(&expfmt.DecodeOptions{
Timestamp: model.Now(),
}, g)
fmt.Println("vector: ", vector)
if err != nil {
fmt.Println(err)
}
// How can I update processedTotal with this new value?
}
}
func main() {
doInit()
readExistingMetrics()
recordMetrics()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe("localhost:2112", nil)
}

I believe you would need to use processedTotal.WithLabelValues("ok").Inc() or something similar to that.
The more complete example is here
func ExampleCounterVec() {
httpReqs := prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "How many HTTP requests processed, partitioned by status code and HTTP method.",
},
[]string{"code", "method"},
)
prometheus.MustRegister(httpReqs)
httpReqs.WithLabelValues("404", "POST").Add(42)
// If you have to access the same set of labels very frequently, it
// might be good to retrieve the metric only once and keep a handle to
// it. But beware of deletion of that metric, see below!
m := httpReqs.WithLabelValues("200", "GET")
for i := 0; i < 1000000; i++ {
m.Inc()
}
// Delete a metric from the vector. If you have previously kept a handle
// to that metric (as above), future updates via that handle will go
// unseen (even if you re-create a metric with the same label set
// later).
httpReqs.DeleteLabelValues("200", "GET")
// Same thing with the more verbose Labels syntax.
httpReqs.Delete(prometheus.Labels{"method": "GET", "code": "200"})
}
This is taken from the Promethus examples on Github
To use the value of vector you can do the following:
vectorFloat, err := strconv.ParseFloat(vector[0].Value.String(), 64)
if err != nil {
panic(err)
}
processedTotal.WithLabelValues("ok").Add(vectorFloat)
This is assuming you will only ever get a single vector value in your response. The value of the vector is stored as a string but you can convert it to a float with the strconv.ParseFloat method.

Related

Is there a better way where I can check if a template property was not resolved?

I am trying to build a string using text/template, where the template string could have arbitrary properties that are resolved via a map.
What I am trying to accomplish is identifying where one/any of the template properties is not resolved and return an error.
At the moment, I am using regexp but reaching out to the community of see if there was a better solution.
package main
import (
"bytes"
"fmt"
"regexp"
"text/template"
)
func main() {
data := "teststring/{{.someData}}/{{.notExist}}/{{.another}}"
// the issue here is that data can be arbitrary so i cannot do
// a lot of unknown if statements
t := template.Must(template.New("").Parse(data))
var b bytes.Buffer
fillers := map[string]interface{}{
"someData": "123",
"another": true,
// in this case, notExist is not defined, so the template will
// note resolve it
}
if err := t.Execute(&b, fillers); err != nil {
panic(err)
}
fmt.Println(b.String())
// teststring/123/<no value>/true
// here i am trying to catch if a the template required a value that was not provided
hasResolved := regexp.MustCompile(`<no value>`)
fmt.Println(hasResolved.MatchString(b.String()))
// add notExist to the fillers map
fillers["notExist"] = "testdata"
b.Reset()
if err := t.Execute(&b, fillers); err != nil {
panic(err)
}
fmt.Println(b.String())
fmt.Println(hasResolved.MatchString(b.String()))
// Output:
// teststring/123/<no value>/true
// true
// teststring/123/testdata/true
// false
}
You can let it fail by settings the options on the template:
func (t *Template) Option(opt ...string) *Template
"missingkey=default" or "missingkey=invalid"
The default behavior: Do nothing and continue execution.
If printed, the result of the index operation is the string
"<no value>".
"missingkey=zero"
The operation returns the zero value for the map type's element.
"missingkey=error"
Execution stops immediately with an error.
If you set it to missingkey=error, you get what what want.
t = t.Options("missingkey=error")

Problems with map, goroutine and mutex

This project is made to receive POST routes that will finally count as access to later write to a database. The intuition is to save interaction with the database of another project in production. I decided to do it in go, but I'm new to the language and I'm struggling to understand. I'm trying to make it so that there is no loss or that there are more accesses.
The project basically consists of a controller, a service and two models, just enough to meet the need for which it was created. In my controller I have the function that will be responsible for receiving the POST.
controllers/views.go:
func StoreViews(c *fiber.Ctx) error {
var songview models.SongView
err := c.BodyParser(&songview)
if err != nil {
return c.Status(403).JSON(fiber.Map{
"errors": fiber.Map{"request": err.Error()},
})
}
songview.Date = time.Now()
errs := utils.ValidateStruct(songview)
if len(errs) > 0 {
return c.Status(403).JSON(map[string]interface{}{"errors": errs})
}
go services.StoreViews(songview)
return c.SendStatus(fiber.StatusOK)
}
To handle the received data I made these three functions in my service:
services/views.go
var (
StoreViewsMap = make(map[string]*models.SongView)
StoreControl sync.RWMutex
)
func StoreViews(sview models.SongView) bool {
nameKey := strconv.Itoa(int(sview.SongId)) + sview.Lang + sview.Date.Format("2006-01-02")
songview := getSongView(nameKey)
initSongView(nameKey, songview, sview)
return true
}
func getSongView(name string) *models.SongView {
StoreControl.RLock()
defer StoreControl.RUnlock()
return StoreViewsMap[name]
}
func initSongView(name string, songview *models.SongView, sview models.SongView) bool {
StoreControl.Lock()
defer StoreControl.Unlock()
if songview == nil {
insert := models.SongView{
SongId: sview.SongId,
Lang: sview.Lang,
Date: sview.Date,
Views: 0,
}
songViewNew := &insert // see if & is needed
StoreViewsMap[name] = songViewNew
} else {
songview.Views = songview.Views + 1
}
return true
}
I tried to implement RWMutex to get it to do everything without overlapping anything, but it's not working as it should, sometimes it disappears with views, other times it rescues "songview" in the getSongView function wrongly, among several other problems that I found modifying and reviewing my code. The current code is not in the version that I managed to get closer to the expected result, but I didn't save this version so I decided to bring the current code to exemplify what I'm facing.
I would like you to help me understand how I can deal with several concurrent requests disputing, how I can interact with the data in the best possible way and if there is an error in the use of a pointer I am open to understand. To simulate a POST "attack" to my code I'm using this code in another main.go I made for this test.
var limit int = 10
func main() {
channel := make(chan string)
for i := 0; i < limit; i++ {
go func(i int) {
post("http://localhost:3000/views/store", "lang=pt&song_id=296", i)
channel <- "ok"
}(i)
go func(i int) {
post("http://localhost:3000/views/store", "lang=en&song_id=3016", i)
channel <- "ok"
}(i)
go func(i int) {
post("http://localhost:3000/views/store", "lang=pt&song_id=3016", i)
channel <- "ok"
}(i)
}
for i := 0; i < limit*3; i++ {
<-channel
}
}
func post(url string, json string, index int) {
payload := strings.NewReader(json)
client := &http.Client{}
req, err := http.NewRequest("POST", url, payload)
if err != nil {
fmt.Println(err)
return
}
req.Header.Add("Content-Type", "application/x-www-form-urlencoded")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer res.Body.Close()
_, err = ioutil.ReadAll(res.Body)
if err != nil {
fmt.Println(err)
return
}
if res.StatusCode != 200 {
fmt.Println(res.StatusCode)
}
}
My song-view model is this: (I'm just using it to sort the data, although the project is connected to the bank of the project in production, it is read-only)
type SongView struct {
Id int64 `json:"id"`
SongId int64 `json:"song_id" form:"song_id" gorm:"notNull" validate:"required,number"`
ArtistId int64 `json:"artist_id"`
Lang string `json:"lang" validate:"required,oneof=pt en es de fr"`
Date time.Time `json:"date" gorm:"column:created_at" validate:"required"`
Views int64 `json:"views"`
}
I believe that this code can be written a little more easily in Go, but that is not the question. From your description, it appears that the data is lost somewhere. Have you tried the Go data race detector tool? Below is a link
https://go.dev/doc/articles/race_detector
Can you provide examples of input data where errors/missing items appear?
It happens because your code has a race condition in between read and write to map.
Example:
G1 - goroutine 1
G2 - goroutine 2
G1: ReadLock and Read songview named "MySong"
G1: MySong doesn't exists. Nil will be inserted and returned.
G1: Unlock
G2: ReadLock and Read songview named "MySong"
G2: MySong exist Nil. Nil will be returned.
G1: WriteLock.
G1: songView = nil, so create a new one. Set counter to 1.
G1: set counter to 1. Insert to map on key "MySong"
G1: Unlock
G2: WriteLock: songView = nil(because you read it on step 2). Create new SongView. Set counter to 1.
G2: unlock
As a result you have 1 "MySong" with counter 1 because you rewrite a previous value.
The idea of locking - Atomicity. So, all your operation should be atomic.
func initSongView(name string, sview models.SongView) bool {
StoreControl.Lock()
defer StoreControl.Unlock()
songview := StoreViewsMap[name]
if songview == nil {
insert := models.SongView{
SongId: sview.SongId,
Lang: sview.Lang,
Date: sview.Date,
Views: 1, // counter should be 1 because it's a first view
}
StoreViewsMap[name] = &insert
} else {
songview.Views = songview.Views + 1
}
return true
}

Mock/test basic http.get request

I am leaning to write unit tests and I was wondering the correct way to unit test a basic http.get request.
I found an API online that returns fake data and wrote a basic program that gets some user data and prints out an ID:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)
type UserData struct {
Meta interface{} `json:"meta"`
Data struct {
ID int `json:"id"`
Name string `json:"name"`
Email string `json:"email"`
Gender string `json:"gender"`
Status string `json:"status"`
} `json:"data"`
}
func main() {
resp := sendRequest()
body := readBody(resp)
id := unmarshallData(body)
fmt.Println(id)
}
func sendRequest() *http.Response {
resp, err := http.Get("https://gorest.co.in/public/v1/users/1841")
if err != nil {
log.Fatalln(err)
}
return resp
}
func readBody(resp *http.Response) []byte {
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
}
return body
}
func unmarshallData(body []byte) int {
var userData UserData
json.Unmarshal(body, &userData)
return userData.Data.ID
}
This works and prints out 1841. I then wanted to write some tests that validate that the code is behaving as expected, e.g. that it correctly fails if an error is returned, that the data returned can be unmarshalled. I have been reading online and looking at examples but they are all far more complex that what I feel I am trying to achieve.
I have started with the following test that ensures that the data passed to the unmarshallData function can be unmarshalled:
package main
import (
"testing"
)
func Test_unmarshallData(t *testing.T) {
type args struct {
body []byte
}
tests := []struct {
name string
args args
want int
}{
{name: "Unmarshall", args: struct{ body []byte }{body: []byte("{\"meta\":null,\"data\":{\"id\":1841,\"name\":\"Piya\",\"email\":\"priya#gmai.com\",\"gender\":\"female\",\"status\":\"active\"}}")}, want: 1841},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := unmarshallData(tt.args.body); got != tt.want {
t.Errorf("unmarshallData() = %v, want %v", got, tt.want)
}
})
}
}
Any advise on where to go from here would be appreciated.
before moving on to the testing, your code has a serious flow, which will become a problem if you don't take care about it in your future programming tasks.
https://pkg.go.dev/net/http See the second example
The client must close the response body when finished with it
Let's fix that now (we will have to come back on this subject later), two possibilities.
1/ within main, use defer to Close that resource after you have drained it;
func main() {
resp := sendRequest()
defer body.Close()
body := readBody(resp)
id := unmarshallData(body)
fmt.Println(id)
}
2/ Do that within readBody
func readBody(resp *http.Response) []byte {
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatalln(err)
}
return body
}
Using a defer is the expected manner to close the resource. It helps the reader to identify the lifetime span of the resource and improve readability.
Notes : I will not be using much of the table test driven pattern, but you should, like you did in your OP.
Moving on to the testing part.
Tests can be written under the same package or its fellow version with a trailing _test, such as [package target]_test. This has implications in two ways.
Using a separate package, they will be ignored in the final build. Which will help to produce smaller binaries.
Using a separate package, you test the API in a black box manner, you can access only the identifiers it explicitly exposes.
Your current tests are white boxed, meaning you can access any declaration of main, public or not.
About sendRequest, writing a test around this is not very interesting because it does too little, and your tests should not be written to test the std library.
But for the sake of the demonstration, and for good reasons we might want to not rely on external resources to execute our tests.
In order to achieve that we must make the global dependencies consumed within it, an injected dependency. So that later on, it is possible to replace the one thing it depends on to react, the http.Get method.
func sendRequest(client interface{Get() (*http.Response, error)}) *http.Response {
resp, err := client.Get("https://gorest.co.in/public/v1/users/1841")
if err != nil {
log.Fatalln(err)
}
return resp
}
Here i use an inlined interface declaration interface{Get() (*http.Response, error)}.
Now we can add a new test which injects a piece of code that will return exactly the values that will trigger the behavior we want to test within our code.
type fakeGetter struct {
resp *http.Response
err error
}
func (f fakeGetter) Get(u string) (*http.Response, error) {
return f.resp, f.err
}
func TestSendRequestReturnsNilResponseOnError(t *testing.T) {
c := fakeGetter{
err: fmt.Errorf("whatever error will do"),
}
resp := sendRequest(c)
if resp != nil {
t.Fatal("it should return a nil response when an error arises")
}
}
Now run this test and see the result. It is not conclusive because your function contains a call to log.Fatal, which in turns executes an os.Exit; We cannot test that.
If we try to change that, we might think we might call for panic instead because we can recover.
I don't recommend doing that, in my opinion, this is smelly and bad, but it exists, so we might consider. This is also the least possible change to the function signature. Returning an error would break even more the current signatures. I want to minimize this for that demonstration. But, as a rule of thumb, return an error and always check them.
In the sendRequest function, replace this call log.Fatalln(err) with panic(err) and update the test to capture the panic.
func TestSendRequestReturnsNilResponseOnError(t *testing.T) {
var hasPanicked bool
defer func() {
_ = recover() // if you capture the output value or recover, you get the error gave to the panic call. We have no use of it.
hasPanicked = true
}()
c := fakeGetter{
err: fmt.Errorf("whatever error will do"),
}
resp := sendRequest(c)
if resp != nil {
t.Fatal("it should return a nil response when an error arises")
}
if !hasPanicked {
t.Fatal("it should have panicked")
}
}
We can now move on to the other execution path, the non error return.
For that we forge the desired *http.Response instance we want to pass into our function, we will then check its properties to figure out if what the function does is inline with what we expect.
We will consider we want to ensure it is returned unmodified : /
Below test only sets two properties, and I will do it to demonstrate how to set the Body with a NopCloser and strings.NewReader as it is often needed later on using the Go language;
I also use reflect.DeepEqual as brute force equality checker, usually you can be more fine grained and get better tests. DeepEqual does the job in this case but it introduces complexity that does not justify systematic use of it.
func TestSendRequestReturnsUnmodifiedResponse(t *testing.T) {
c := fakeGetter{
err: nil,
resp: &http.Response{
Status: http.StatusOK,
Body: ioutil.NopCloser(strings.NewReader("some text")),
},
}
resp := sendRequest(c)
if !reflect.DeepEqual(resp, c.resp) {
t.Fatal("the response should not have been modified")
}
}
At that point you may have figured that this small function sendRequest is not good, if you did not I ensure you it is not. It does too little, it merely wraps the http.Get method and its testing is of little interest for the survival of the business logic.
Moving on to readBody function.
All remarks that applied for sendRequest apply here too.
it does too little
it os.Exits
One thing does not apply. As the call to ioutil.ReadAll does not rely on external resources, there is no point in attempting to inject that dependency. We can test around.
Though, for the sake of the demonstration, it is the time to talk about the missing call to defer resp.Body.Close().
Let us assume we go for the second proposition made in introduction and test for that.
The http.Response struct adequately exposes its Body recipient as an interface.
To ensure the code calls for the `Close, we can write a stub for it.
That stub will record if that call was made, the test can then check for that and trigger an error if it was not.
type closeCallRecorder struct {
hasClosed bool
}
func (c *closeCallRecorder) Close() error {
c.hasClosed = true
return nil
}
func (c *closeCallRecorder) Read(p []byte) (int, error) {
return 0, nil
}
func TestReadBodyCallsClose(t *testing.T) {
body := &closeCallRecorder{}
res := &http.Response{
Body: body,
}
_ = readBody(res)
if !body.hasClosed {
t.Fatal("the response body was not closed")
}
}
Similarly, and for the sake of the demonstration, we might want to test if the function has called for Read.
type readCallRecorder struct {
hasRead bool
}
func (c *readCallRecorder) Read(p []byte) (int, error) {
c.hasRead = true
return 0, nil
}
func TestReadBodyHasReadAnything(t *testing.T) {
body := &readCallRecorder{}
res := &http.Response{
Body: ioutil.NopCloser(body),
}
_ = readBody(res)
if !body.hasRead {
t.Fatal("the response body was not read")
}
}
We an also verify the body was not modified in betwen,
func TestReadBodyDidNotModifyTheResponse(t *testing.T) {
want := "this"
res := &http.Response{
Body: ioutil.NopCloser(strings.NewReader(want)),
}
resp := readBody(res)
if got := string(resp); want != got {
t.Fatal("invalid response, wanted=%q got %q", want, got)
}
}
We have almost done, lets move one to the unmarshallData function.
You have already wrote a test about it. It is okish, though, i would write it this way to make it leaner:
type UserData struct {
Meta interface{} `json:"meta"`
Data Data `json:"data"`
}
type Data struct {
ID int `json:"id"`
Name string `json:"name"`
Email string `json:"email"`
Gender string `json:"gender"`
Status string `json:"status"`
}
func Test_unmarshallData(t *testing.T) {
type args struct {
body []byte
}
tests := []UserData{
UserData{Data: Data{ID: 1841}},
}
for _, u := range tests {
want := u.ID
b, _ := json.Marshal(u)
t.Run("Unmarshal", func(t *testing.T) {
if got := unmarshallData(b); got != want {
t.Errorf("unmarshallData() = %v, want %v", got, want)
}
})
}
}
Then, the usual apply :
don't log.Fatal
what are you testing ? the marshaller ?
Finally, now that we have gathered all those pieces, we can refactor to write a more sensible function and re use all those pieces to help us testing such code.
I won't do it, but here is a starter, which still panics, and I still don't recommend, but the previous demonstration has shown everything needed to test a version of it that returns an error.
type userFetcher struct {
Requester interface {
Get(u string) (*http.Response, error)
}
}
func (u userFetcher) Fetch() int {
resp, err := u.Requester.Get("https://gorest.co.in/public/v1/users/1841") // it does not really matter that this string is static, using the requester we can mock the response, its body and the error.
if err != nil {
panic(err)
}
defer resp.Body.Close() //always.
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
panic(err)
}
var userData UserData
err = json.Unmarshal(body, &userData)
if err != nil {
panic(err)
}
return userData.Data.ID
}

How to make goroutines work with anonymous functions returning value in a loop

I am working on a custom script to fetch data from RackSpace cloudfiles container and make a list of all the files in a given container (container has around 100 million files) and I have been working on parallelizing the code and currently stuck.
// function to read data from channel and display
// currently just displaying, but there will be allot of processing done on this data
func extractObjectItemsFromList(objListChan <-chan []string) {
fmt.Println("ExtractObjectItemsFromList")
for _, c := range <-objListChan {
fmt.Println(urlPrefix, c, "\t", count)
}
}
func main()
// fetching data using flags
ao := gophercloud.AuthOptions{
Username: *userName,
APIKey: *apiKey,
}
provider, err := rackspace.AuthenticatedClient(ao)
client, err := rackspace.NewObjectStorageV1(provider,gophercloud.EndpointOpts{
Region: *region,
})
if err != nil {
logFatal(err)
}
// We have the option of filtering objects by their attributes
opts := &objects.ListOpts{
Full: true,
Prefix: *prefix,
}
var objectListChan = make(chan []string)
go extractObjectItemsFromList(objectListChan)
// Retrieve a pager (i.e. a paginated collection)
pager := objects.List(client, *containerName, opts)
// Not working
// By default EachPage contains 10000 records
// Define an anonymous function to be executed on each page's iteration
lerr := pager.EachPage(func(page pagination.Page) (bool, error) { // Get a slice of objects.Object structs
objectList, err := objects.ExtractNames(page)
if err != nil {
logFatal(err)
}
for _, o := range objectList {
_ = o
}
objectListChan <- objectList
return true, nil
})
if lerr != nil {
logFatal(lerr)
}
//---------------------------------------------------
// below code is working
//---------------------------------------------------
// working, but only works inside the loop, this keeps on fetching new pages and showing new records, 10000 per page
// By default EachPage contains 10000 records
// Define an anonymous function to be executed on each page's iteration
lerr := pager.EachPage(func(page pagination.Page) (bool, error) { // Get a slice of objects.Object structs
objectList, err := objects.ExtractNames(page)
if err != nil {
logFatal(err)
}
for _, o := range objectList {
fmt.Println(o)
}
return true, nil
})
if lerr != nil {
logFatal(lerr)
}
The first 10000 records are displayed but then it stuck and nothing happens. If I do not use channel and just run the plain loop it works perfectly fine, which kills the purpose of parallelizing.
for _, c := range <-objListChan {
fmt.Println(urlPrefix, c, "\t", count)
}
Your async worker pops one list from the channel, iterates it and exits. You need to have two loops: one reading the channel (range objListChan), the other - reading the (just retrieved) object list.

Output of GET request different to view source

I'm trying to extract match data from whoscored.com. When I view the source on firefox, I find on line 816 a big json string with the data I want for that matchid. My goal is to eventually get this json.
In doing this, I've tried to download every page of https://www.whoscored.com/Matches/ID/Live where ID is the id of the match. I wrote a little Go program to GET request each ID up to a certain point:
package main
import (
"fmt"
"io/ioutil"
"net/http"
"os"
)
// http://www.whoscored.com/Matches/614052/Live is the match for
// Eveton vs Manchester
const match_address = "http://www.whoscored.com/Matches/"
// the max id we get
const max_id = 300
const num_workers = 10
// function that get the bytes of the match id from the website
func match_fetch(matchid int) {
url := fmt.Sprintf("%s%d/Live", match_address, matchid)
resp, err := http.Get(url)
if err != nil {
fmt.Println(err)
return
}
// if we sucessfully got a response, store the
// body in memory
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println(err)
return
}
// write the body to memory
pwd, _ := os.Getwd()
filepath := fmt.Sprintf("%s/match_data/%d", pwd, matchid)
err = ioutil.WriteFile(filepath, body, 0644)
if err != nil {
fmt.Println(err)
return
}
}
// data type to send to the workers,
// last means this job is the last one
// matchid is the match id to be fetched
// a matchid of -1 means don't fetch a match
type job struct {
last bool
matchid int
}
func create_worker(jobs chan job) {
for {
next_job := <-jobs
if next_job.matchid != -1 {
match_fetch(next_job.matchid)
}
if next_job.last {
return
}
}
}
func main() {
// do the eveton match as a reference
match_fetch(614052)
var joblist [num_workers]chan job
var v int
for i := 0; i < num_workers; i++ {
job_chan := make(chan job)
joblist[i] = job_chan
go create_worker(job_chan)
}
for i := 0; i < max_id; i = i + num_workers {
for index, c := range joblist {
if i+index < max_id {
v = i + index
} else {
v = -1
}
c <- job{false, v}
}
}
for _, c := range joblist {
c <- job{true, -1}
}
}
The code seems to work in that it fills a directory called match_data with html. The problem is that this html is completely different to what I get in the browser! Here is the section which I think does this: (from the body of the GET request of http://www.whoscored.com/Matches/614052/Live.
(function() {
var z="";var b="7472797B766172207868723B76617220743D6E6577204461746528292E67657454696D6528293B766172207374617475733D227374617274223B7661722074696D696E673D6E65772041727261792833293B77696E646F772E6F6E756E6C6F61643D66756E6374696F6E28297B74696D696E675B325D3D22723A222B286E6577204461746528292E67657454696D6528292D74293B646F63756D656E742E637265617465456C656D656E742822696D6722292E7372633D222F5F496E63617073756C615F5265736F757263653F4553324C555243543D363726743D373826643D222B656E636F6465555249436F6D706F6E656E74287374617475732B222028222B74696D696E672E6A6F696E28292B222922297D3B69662877696E646F772E584D4C4874747052657175657374297B7868723D6E657720584D4C48747470526571756573747D656C73657B7868723D6E657720416374697665584F626A65637428224D6963726F736F66742E584D4C4854545022297D7868722E6F6E726561647973746174656368616E67653D66756E6374696F6E28297B737769746368287868722E72656164795374617465297B6361736520303A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2072657175657374206E6F7420696E697469616C697A656420223B627265616B3B6361736520313A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2073657276657220636F6E6E656374696F6E2065737461626C6973686564223B627265616B3B6361736520323A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2072657175657374207265636569766564223B627265616B3B6361736520333A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2070726F63657373696E672072657175657374223B627265616B3B6361736520343A7374617475733D22636F6D706C657465223B74696D696E675B315D3D22633A222B286E6577204461746528292E67657454696D6528292D74293B6966287868722E7374617475733D3D323030297B706172656E742E6C6F636174696F6E2E72656C6F616428297D627265616B7D7D3B74696D696E675B305D3D22733A222B286E6577204461746528292E67657454696D6528292D74293B7868722E6F70656E2822474554222C222F5F496E63617073756C615F5265736F757263653F535748414E45444C3D313536343032333530343538313538333938362C31373139363833393832313930303534313833392C31333935303737313737393531363432383234342C3132363636222C66616C7365293B7868722E73656E64286E756C6C297D63617463682863297B7374617475732B3D6E6577204461746528292E67657454696D6528292D742B2220696E6361705F6578633A20222B633B646F63756D656E742E637265617465456C656D656E742822696D6722292E7372633D222F5F496E63617073756C615F5265736F757263653F4553324C555243543D363726743D373826643D222B656E636F6465555249436F6D706F6E656E74287374617475732B222028222B74696D696E672E6A6F696E28292B222922297D3B";for (var i=0;i<b.length;i+=2){z=z+parseInt(b.substring(i, i+2), 16)+",";}z = z.substring(0,z.length-1); eval(eval('String.fromCharCode('+z+')'));})();
The reason I think this is the case is that the javascript in the page fetches and edits the DOM to what I see on view source. How can I get golang to run the javascript? Is there are library to do this? Better still, could I directly grab the JSON from the servers?
This can be done with https://godoc.org/github.com/sourcegraph/webloop#View.EvaluateJavaScript
Read their main example https://github.com/sourcegraph/webloop
What you need is a "headless browser" in general.
In general it is better to use an Web API vs. scraping. For example, whoscored themselves use OPTA which you should be able to access directly.
http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/#opta

Resources