Getting same file downloaded multiple times concurrently - go

I am concurrently downloading files (with a WaitGroup) from a slice of config objects (where each config object contains the URL that needs to be downloaded), but when I use concurrency, I get the same exact data written with every execution.
I believe I included everything below for a minimal reproducible example.
Here are my imports:
package main
import (
"encoding/json"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"os"
"path"
"path/filepath"
"strconv"
"strings"
"sync"
)
The method that's looping through my objects and executing the go routine to download each file is here:
func downloadAllFiles(configs []Config) {
var wg sync.WaitGroup
for i, config := range configs {
wg.Add(1)
go config.downloadFile(&wg)
}
wg.Wait()
}
Basically, my function is downloading a file from a URL into a directory stored on NFS.
Here is the download function:
func (config *Config) downloadFile(wg *sync.WaitGroup) {
resp, _ := http.Get(config.ArtifactPathOrUrl)
fmt.Println("Downloading file: " + config.ArtifactPathOrUrl)
fmt.Println(" to location: " + config.getNfsFullFileSystemPath())
defer resp.Body.Close()
nfsDirectoryPath := config.getBaseNFSFileSystemPath()
os.MkdirAll(nfsDirectoryPath, os.ModePerm)
fullFilePath := config.getNfsFullFileSystemPath()
out, err := os.Create(fullFilePath)
if err != nil {
panic(err)
}
defer out.Close()
io.Copy(out, resp.Body)
wg.Done()
}
Here's a minimal part of the Config struct:
type Config struct {
Namespace string `json:"namespace,omitempty"`
Tenant string `json:"tenant,omitempty"`
Name string `json:"name,omitempty"`
ArtifactPathOrUrl string `json:"artifactPathOrUrl,omitempty"`
}
Here are the instance/helper functions:
func (config *Config) getDefaultNfsURLBase() string {
return "http://example.domain.nfs.location.com/"
}
func (config *Config) getDefaultNfsFilesystemBase() string {
return "/data/nfs/location/"
}
func (config *Config) getBaseNFSFileSystemPath() string {
basePath := filepath.Dir(config.getNfsFullFileSystemPath())
return basePath
}
func (config *Config) getNfsFullFileSystemPath() string {
// basePath is like: /data/nfs/location/
trimmedBasePath := strings.TrimSuffix(config.getDefaultNfsFilesystemBase(), "/")
fileName := config.getBaseFileName()
return trimmedBasePath + "/" + config.Tenant + "/" + config.Namespace + "/" + config.Name + "/" + fileName
}
Here is how I'm getting the configs and unmarshalling them:
func getConfigs() string {
b, err := ioutil.ReadFile("pulsarDeploy_example.json")
if err != nil {
fmt.Print(err)
}
str := string(b) // convert content to a 'string'
return str
}
func deserializeJSON(configJson string) []Config {
jsonAsBytes := []byte(configJson)
configs := make([]Config, 0)
err := json.Unmarshal(jsonAsBytes, &configs)
if err != nil {
panic(err)
}
return configs
}
For a minimal example, I think this data for the pulsarDeploy_example.json file should work:
[{ "artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/sample/sample.jar.zip",
"namespace": "exampleNamespace1",
"name": "exampleName1",
"tenant": "exampleTenant1"
},
{
"artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/sample-calculator/sample-calculator-bundle-2.0.jar.zip",
"namespace": "exampleNamespace1",
"name": "exampleName2",
"tenant": "exampleTenant1"
},
{
"artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/helloworld/helloworld.jar.zip",
"namespace": "exampleNamespace1",
"name": "exampleName3",
"tenant": "exampleTenant1"
},
{
"artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/fabric-activemq/fabric-activemq-demo-7.0.2.fuse-097.jar.zip",
"namespace": "exampleNamespace1",
"name": "exampleName4",
"tenant": "exampleTenant1"
}
]
(Note that the example file URLs were just random Jars I grabbed online.)
When I run my code, instead of downloading each file, it repeatedly downloads the same file, and the information it prints to the console (from the Downloading file: and to location: lines) is exactly the same for each object (instead of printing the messages that are unique to each object), which definitely is a concurrency issue.
This issue reminds me of what happens if you try to run a for loop with a closure and end up locking a single object instance into your loop and executing repeatedly on the same object.
What is causing this behavior, and how do I resolve it?

I'm pretty sure that your guess
This issue reminds me of what happens if you try to run a for loop with a closure and end up locking a single object instance into your loop and executing repeatedly on the same object.
is correct. The simple fix is to "assign to local var" like
for _, config := range configs {
wg.Add(1)
cur := config
go cur.downloadFile(&wg)
}
but I don't like APIs which take waitgroup as a parameter so I suggest
for _, config := range configs {
wg.Add(1)
go func(cur Config) {
defer wg.Done()
cur.downloadFile()
}(config)
}
and change downloadFile signature to func (config *Config) downloadFile() and drop the wg usage in it.

Related

Parsing prometheus metrics from file and updating counters

I've a go application that gets run periodically by a batch. Each run, it should read some prometheus metrics from a file, run its logic, update a success/fail counter, and write metrics back out to a file.
From looking at How to parse Prometheus data as well as the godocs for prometheus, I'm able to read in the file, but I don't know how to update app_processed_total with the value returned by expfmt.ExtractSamples().
This is what I've done so far. Could someone please tell me how should I proceed from here? How can I typecast the Vector I got into a CounterVec?
package main
import (
"fmt"
"net/http"
"strings"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
"github.com/prometheus/common/model"
)
var (
fileOnDisk = prometheus.NewRegistry()
processedTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "app_processed_total",
Help: "Number of times ran",
}, []string{"status"})
)
func doInit() {
prometheus.MustRegister(processedTotal)
}
func recordMetrics() {
go func() {
for {
processedTotal.With(prometheus.Labels{"status": "ok"}).Inc()
time.Sleep(5 * time.Second)
}
}()
}
func readExistingMetrics() {
var parser expfmt.TextParser
text := `
# HELP app_processed_total Number of times ran
# TYPE app_processed_total counter
app_processed_total{status="ok"} 300
`
parseText := func() ([]*dto.MetricFamily, error) {
parsed, err := parser.TextToMetricFamilies(strings.NewReader(text))
if err != nil {
return nil, err
}
var result []*dto.MetricFamily
for _, mf := range parsed {
result = append(result, mf)
}
return result, nil
}
gatherers := prometheus.Gatherers{
fileOnDisk,
prometheus.GathererFunc(parseText),
}
gathering, err := gatherers.Gather()
if err != nil {
fmt.Println(err)
}
fmt.Println("gathering: ", gathering)
for _, g := range gathering {
vector, err := expfmt.ExtractSamples(&expfmt.DecodeOptions{
Timestamp: model.Now(),
}, g)
fmt.Println("vector: ", vector)
if err != nil {
fmt.Println(err)
}
// How can I update processedTotal with this new value?
}
}
func main() {
doInit()
readExistingMetrics()
recordMetrics()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe("localhost:2112", nil)
}
I believe you would need to use processedTotal.WithLabelValues("ok").Inc() or something similar to that.
The more complete example is here
func ExampleCounterVec() {
httpReqs := prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "How many HTTP requests processed, partitioned by status code and HTTP method.",
},
[]string{"code", "method"},
)
prometheus.MustRegister(httpReqs)
httpReqs.WithLabelValues("404", "POST").Add(42)
// If you have to access the same set of labels very frequently, it
// might be good to retrieve the metric only once and keep a handle to
// it. But beware of deletion of that metric, see below!
m := httpReqs.WithLabelValues("200", "GET")
for i := 0; i < 1000000; i++ {
m.Inc()
}
// Delete a metric from the vector. If you have previously kept a handle
// to that metric (as above), future updates via that handle will go
// unseen (even if you re-create a metric with the same label set
// later).
httpReqs.DeleteLabelValues("200", "GET")
// Same thing with the more verbose Labels syntax.
httpReqs.Delete(prometheus.Labels{"method": "GET", "code": "200"})
}
This is taken from the Promethus examples on Github
To use the value of vector you can do the following:
vectorFloat, err := strconv.ParseFloat(vector[0].Value.String(), 64)
if err != nil {
panic(err)
}
processedTotal.WithLabelValues("ok").Add(vectorFloat)
This is assuming you will only ever get a single vector value in your response. The value of the vector is stored as a string but you can convert it to a float with the strconv.ParseFloat method.

HTTP request fails when executed asynchronously

I'm trying to write a tiny application in Go that can send an HTTP request to all IP addresses in hopes to find a specific content. The issue is that the application seems to crash in a very peculiar way when the call is executed asynchronously.
ip/validator.go
package ip
import (
"io/ioutil"
"net/http"
"regexp"
"time"
)
type ipValidator struct {
httpClient http.Client
path string
exp *regexp.Regexp
confirmationChannel *chan string
}
func (this *ipValidator) validateUrl(url string) bool {
response, err := this.httpClient.Get(url)
if err != nil {
return false
}
defer response.Body.Close()
if response.StatusCode != http.StatusOK {
return false
}
bodyBytes, _ := ioutil.ReadAll(response.Body)
result := this.exp.Match(bodyBytes)
if result && this.confirmationChannel != nil {
*this.confirmationChannel <- url
}
return result
}
func (this *ipValidator) ValidateIp(addr ip) bool {
httpResult := this.validateUrl("http://" + addr.ToString() + this.path)
httpsResult := this.validateUrl("https://" + addr.ToString() + this.path)
return httpResult || httpsResult
}
func (this *ipValidator) GetSuccessChannel() *chan string {
return this.confirmationChannel
}
func NewIpValidadtor(path string, exp *regexp.Regexp) ipValidator {
return newValidator(path, exp, nil)
}
func NewAsyncIpValidator(path string, exp *regexp.Regexp) ipValidator {
c := make(chan string)
return newValidator(path, exp, &c)
}
func newValidator(path string, exp *regexp.Regexp, c *chan string) ipValidator {
httpClient := http.Client{
Timeout: time.Second * 2,
}
return ipValidator{httpClient, path, exp, c}
}
main.go
package main
import (
"./ip"
"fmt"
"os"
"regexp"
)
func processOutput(c *chan string) {
for true {
url := <- *c
fmt.Println(url)
}
}
func main() {
args := os.Args[1:]
fmt.Printf("path: %s regex: %s", args[0], args[1])
regexp, regexpError := regexp.Compile(args[1])
if regexpError != nil {
fmt.Println("The provided regexp is not valid")
return
}
currentIp, _ := ip.NewIp("172.217.22.174")
validator := ip.NewAsyncIpValidator(args[0], regexp)
successChannel := validator.GetSuccessChannel()
go processOutput(successChannel)
for currentIp.HasMore() {
go validator.ValidateIp(currentIp)
currentIp = currentIp.Increment()
}
}
Note the line that says go validator.ValidateIp(currentIp) in main.go. Should I remove the word "go" to execute everything within the main routine, the code works as expected -> it sends requests to IP addresses starting 172.217.22.174 and should one of them return a legitimate result that matches the regexp that the ipValidator was initialized with, the URL is passed to the channel and the value is printed out by processOutput function from main.go. The issue is that simply adding go in front of validator.ValidateIp(currentIp) breaks that functionality. In fact, according to the debugger, I never seem to go past the line that says response, err := this.httpClient.Get(url) in validator.go.
The struggle is real. Should I decide to scan the whole internet, there's 256^4 IP addresses to go through. It will take years, unless I find a way to split the process into multiple routines.

prevent access to files in folder with a golang server

I've a server in golang who handle folder path like that :
fs := http.FileServer(http.Dir("./assets"))
http.Handle("/Images/", fs)
http.ListenAndServe(":8000", nil)
But in this folder there are privates images, and it shouldn't be possible to access files. So how can i secure image access and prevent anybody to access content of folder.
like that for example :
If you want to block a directory using http package, maybe this will be useful to you :
https://groups.google.com/forum/#!topic/golang-nuts/bStLPdIVM6w
package main
import (
"net/http"
"os"
)
type justFilesFilesystem struct {
fs http.FileSystem
}
func (fs justFilesFilesystem) Open(name string) (http.File, error) {
f, err := fs.fs.Open(name)
if err != nil {
return nil, err
}
return neuteredReaddirFile{f}, nil
}
type neuteredReaddirFile struct {
http.File
}
func (f neuteredReaddirFile) Readdir(count int) ([]os.FileInfo, error) {
return nil, nil
}
func main() {
fs := justFilesFilesystem{http.Dir("/tmp/")}
http.ListenAndServe(":8080", http.FileServer(fs))
}
A little wrapper over FileServer() solves your problem, now you have to add some sort of logic to do Authorization, it looks like you have unique names, that's good, so I just filter the image name for you creating a map of names, now you can add something more dynamic like a key/store(memcached, redis. etc.) Hope you can follow the comments
package main
import (
"log"
"net/http"
"strings"
)
// put the allowed hashs or keys here
// you may consider put them in a key/value store
//
var allowedImages = map[string]bool{
"key-abc.jpg": true,
"key-123.jpg": true,
}
func main() {
http.Handle("/Images/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// here we can do any kind of checking, in this case we'll just split the url and
// check if the image name is in the allowedImages map, we can check in a DB or something
//
parts := strings.Split(r.URL.Path, "/")
imgName := parts[len(parts)-1]
if _, contains := allowedImages[imgName]; !contains { // if the map contains the image name
log.Printf("Not found image: %q path: %s\n", imgName, r.URL.Path)
// if the image is not found we write a 404
//
// Bonus: we don't list the directory, so nobody can know what's inside :)
//
http.NotFound(w, r)
return
}
log.Printf("Serving allowed image: %q\n", imgName)
fileServer := http.StripPrefix("/Images/", http.FileServer(http.Dir("./assets")))
fileServer.ServeHTTP(w, r) // StripPrefix() and FileServer() return a Handler that implements ServerHTTP()
}))
http.ListenAndServe(":8000", nil)
}
https://play.golang.org/p/ehrd_AWXim

Runtime access to symbols in Go

In writing a Web server in Go, I'd like to be able to dereference symbols at runtime, to allow me to figure out which functions to call from a configuration file, something like the call to the fictional "eval" function in the example below. That would allow me to select handlers from a library of handlers, and to deploy a new server with just a config file. Is there any way to accomplish this in Go?
config.json
{ "url": "/api/apple", "handler": "Apple", "method": "get" }
{ "url": "/api/banana", "handler": "Banana", "method": "get" }
play.go
package main
import (
"github.com/gorilla/mux"
"net/http"
"encoding/json"
"log"
)
type ConfigEntry struct {
URL string `json:"url"`
Method string `json:"method"`
Handler string `json:"handler"`
}
func main() {
ifp, err := os.Open("config.json")
if err != nil {
log.Fatal(err)
}
dec := json.NewDecoder(ifp)
r := mux.NewRouter()
for {
var config ConfigEntry
if err = dec.Decode(&m); err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
r.HandleFunc(config.URL, eval(config.Handler + "Handler")).Methods(config.Method)
}
http.Handle("/", r)
http.ListenAndServe(8080, nil)
}
func AppleHandler(w http.ResponseWriter, r *http.Request) (status int, err error) {
w.Write("Apples!\n")
return http.StatusOK, nil
}
func BananaHandler(w http.ResponseWriter, r *http.Request) (status int, err error) {
w.Write("Bananas!\n")
return http.StatusOK, nil
}
There's nothing like eval in Go, which is a good thing since things like that are very dangerous.
What you can do is have a map mapping the handler strings in your config file to the handler functions in your code:
var handlers = map[string]func(http.ResponseWriter, *http.Request) (int, error){
"Apple": AppleHandler,
"Banana": BananaHandler,
}
Then you can register those handlers by simply doing:
handler, ok := handlers[config.Handler]
if !ok {
log.Fatal(fmt.Errorf("Handler not found"))
}
r.HandleFunc(config.URL, handler).Methods(config.Method)
There's some limited way to access things during runtime with the reflect package. However it doesn't allow you to search for all suitable standalone functions in a package. It would be possible if they are all methods on a known struct type/value.
As an alternative your given example you could simply use a map[string]func(...) to store all handlers, initialize it at startup (during init()) and fetch the handlers from there. But that also more or less what the existing http muxes are doing.

Golang downcasting list of structs

I want to be able to unmarshal yaml files less rigidly. That is, my library has a predefined number of options the yaml file must have. Then, the user should be able to extend this to include any custom options.
Here is what I have
package main
import (
"net/http"
"yamlcms"
"github.com/julienschmidt/httprouter"
)
type Page struct {
*yamlcms.Page
Title string
Date string
}
func getBlogRoutes() {
pages := []*Page{}
yamlcms.ReadDir("html", pages)
}
// This section is a work in progress, I only include it for loose context
func main() {
router := httprouter.New()
//blogRoutes := getBlogRoutes()
//for _, blogRoute := range *blogRoutes {
// router.Handle(blogRoute.Method, blogRoute.Pattern,
// func(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {})
//}
http.ListenAndServe(":8080", router)
}
Here is the yamlcms package:
package yamlcms
import (
"io/ioutil"
"os"
"strings"
"gopkg.in/yaml.v2"
)
type Page struct {
Slug string `yaml:"slug"`
File string `yaml:"file"`
}
func (page *Page) ReadFile(file string) (err error) {
fileContents, err := ioutil.ReadFile(file)
if err != nil {
return
}
err = yaml.Unmarshal(fileContents, &page)
return
}
func isYamlFile(fileInfo os.FileInfo) bool {
return !fileInfo.IsDir() && strings.HasSuffix(fileInfo.Name(), ".yaml")
}
func ReadDir(dir string, pages []*Page) (err error) {
filesInfo, err := ioutil.ReadDir(dir)
if err != nil {
return
}
for i, fileInfo := range filesInfo {
if isYamlFile(fileInfo) {
pages[i].ReadFile(fileInfo.Name())
}
}
return
}
There is a compiler issue here:
src/main.go:19: cannot use pages (type []*Page) as type []*yamlcms.Page in argument to yamlcms.ReadDir
My main intent in this question is to learn the idiomatic way of doing this kind of thing in Go. Other 3rd-party solutions may exist but I am not immediately interested in them because I have problems like this frequently in Go having to do with inheritance, etc. So along the lines of what I've presented, how can I best (idiomatically) accomplish what I am going for?
EDIT:
So I've made some changes as suggested. Now I have this:
type FileReader interface {
ReadFile(file string) error
}
func ReadDir(dir string, pages []*FileReader) (err error) {
filesInfo, err := ioutil.ReadDir(dir)
if err != nil {
return
}
for i, fileInfo := range filesInfo {
if isYamlFile(fileInfo) {
(*pages[i]).ReadFile(fileInfo.Name())
}
}
return
}
However, I still get a similar compiler error:
src/main.go:19: cannot use pages (type []*Page) as type []*yamlcms.FileReader in argument to yamlcms.ReadDir
Even though main.Page should be a FileReader because it embeds yamlcms.Page.
EDIT: I forgot that slices of interfaces don't work like that. You'd need to allocate a new slice, convert all pages to FileReaders, call the function, and convert them back.
Another possible solution is refactoring yamlcms.ReadDir to return the contents of the files, so that they could be unmarshaled later:
// In yamlcms.
func ReadYAMLFilesInDir(dir string) ([][]byte, error) { ... }
// In client code.
files := yamlcms.ReadYAMLFilesInDir("dir")
for i := range pages {
if err := yaml.Unmarshal(files[i], &pages[i]); err != nil { return err }
}
The original answer:
There are no such things as inheritance or casting in Go. Prefer composition and interfaces in your designs. In your case, you can redefine your yamlcms.ReadDir to accept an interface, FileReader.
type FileReader interface {
ReadFile(file string) error
}
Both yamlcms.Page and main.Page will implement this, as the latter embeds the former.

Resources