Gonum Plot Loop Through Slice - go

I'm trying to add multiple plots by using a loop, but I can't seem to figure out how to put the lines in. Here is the code I'm working on:
func plot_stochastic_processes(processes [][]float64, title string) {
p, err := plot.New()
if err != nil {
panic(err)
}
p.Title.Text = title
p.X.Label.Text = "X"
p.Y.Label.Text = "Y"
err = plotutil.AddLinePoints(p,
"Test", getPoints(processes[1]),
//Need to figure out how to loop through processes
)
if err != nil {
panic(err)
}
// Save the plot to a PNG file.
if err := p.Save(4*vg.Inch, 4*vg.Inch, "points.png"); err != nil {
panic(err)
}
}
My getPoints function looks like this:
func getPoints(line []float64) plotter.XYs {
pts := make(plotter.XYs, len(line))
for j, k := range line {
pts[j].X = float64(j)
pts[j].Y = k
}
return pts
}
I get an error when trying to put a loop where the commented section is. I know this should be fairly straightforward. Perhaps a loop prior to this to get the list of lines?
Something like
for i, process := range processes {
return "title", getPoints(process),
}
Obviously I know that isn't correct, but not I'm not sure how to go about it.

I think you want to first extract your data into a []interface{}, and then call into AddLinePoints. Roughly (I didn't test):
lines := make([]interface{},0)
for i, v := range processes {
lines = append(lines, "Title" + strconv.Itoa(i))
lines = append(lines, getPoints(v))
}
plotutil.AddLinePoints(p, lines...)

Related

How to test a function that downloads a stream of HLS into one file?

I am trying to write a test function for a function that downloads a stream of hls into one .mp3 file
// the download function
func DownloadM3u8(filepath string, dlbar *bar.ProgressBar, segments []string) error {
file, _ := os.OpenFile(filepath, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0644)
// the go routine now
var wg sync.WaitGroup
for _, segment := range segments {
wg.Add(1)
downloadSeg(&wg, segment, file, dlbar)
}
return nil
}
func downloadSeg(wg *sync.WaitGroup, segmentURI string, file *os.File, dlbar *bar.ProgressBar) {
defer wg.Done()
resp, err := http.Get(segmentURI)
if err != nil {
return
}
defer resp.Body.Close()
// append to the file
if dlbar == nil {
_, err = io.Copy(io.MultiWriter(file), resp.Body)
} else {
_, err = io.Copy(io.MultiWriter(file, dlbar), resp.Body)
}
if err != nil {
return
}
}
The passed segments are list of URIs like this sample :
[
https://url-adc.com/s/933030/30032,
https://url-adc.com/s/933030/303220,
https://url-adc.com/s/933030/34230,
https://url-adc.com/s/933030/35290,
]
The test function basically mimic the download, as I have actual downloaded file : Test[medium].mp3, I am trying to split/splice that file into []byte and assign each byte range to some url, then with the help of httptest package I return the corresponding []byte.
func TestDownloadM3u8(t *testing.T) {
path := "../../testdata/"
track := readTestFile("Test[medium].mp3") // []byte
fileName := "some filename that exists" // the filename of the tmp file.mp3 that is expected to be downloaded.
path = filepath.Join(path, fileName)
// the segments from the actual file, valid file
segments := extractSegments(fileResp, track)
// modifiying the seg urls to mimic the actual urls
testServer := httptest.NewServer(http.HandlerFunc(func(res http.ResponseWriter, req *http.Request) {
// write bytes based on the url
n, _ := strconv.Atoi(req.URL.String()[1:])
if bs, ok := segments[n]; ok {
res.WriteHeader(http.StatusOK)
res.Write(bs)
}
}))
defer testServer.Close()
segmentURIs := make([]string, 0)
for k := range segments {
segmentURIs = append(segmentURIs, testServer.URL+"/"+strconv.Itoa(k))
}
soundcloud.DownloadM3u8(path, nil, segmentURIs)
// read the downloaded file
file, err := ioutil.ReadFile(path)
if err != nil {
t.Errorf("An error happened while reading the track, error : %s", err)
}
// TODO: not the best method, as it loads all the file in memeory, but for this test it's ok since the size isn't that big + I have RAM
if !bytes.Equal(file, track) {
t.Errorf("Expected the 2 files to be the same")
}
// remove the file
// os.Remove(path)
}
I tried to extract the segments by splitting the file into []byte with the same length as the segments length that I get from reading the playlist.m3u8 file (that file contains segment.URIs, it's returned by other function).
func extractSegments(fileP []byte, testfile []byte) map[int][]byte {
segments := make(map[int][]byte, 0)
reader := bytes.NewReader(fileP)
pl, listType, err := m3u8.DecodeFrom(reader, true)
if err != nil {
return nil
}
switch listType {
case m3u8.MEDIA:
mediapl := pl.(*m3u8.MediaPlaylist)
for i, segment := range mediapl.Segments {
if segment == nil {
continue
}
// here I want to get portion of the testfile
segments[i] = ...
}
}
return segments
}
I tried to write a function that splits the testfile into [][]bytes but the test fails as the 2 files aren't identical, I compared the 2 file, but they aren't the same.
var segmentSize int
numSegments := len(segments)
fileSize := len(testfile)
segmentSize = fileSize / numSegments
for i := 0; i < numSegments-1; i++ {
start := i * segmentSize
end := (i + 1) * segmentSize
segments[i] = testfile[start:end]
}
// handle the last segment
start := (numSegments - 1) * segmentSize
end := fileSize
if fileSize%numSegments == 0 {
end = (numSegments-1)*segmentSize + segmentSize
}
segments[int(numSegments-1)] = testfile[start:end]

Ignore a line containing a pattern from a long text file in Go

I'm trying to implement a function to ignore a line containing a pattern from a long text file (ASCII guaranteed) in Go
The functions I have below withoutIgnore and withIgnore, both take a filename argument input and return a *byte.Buffer, which can be subsequently used to write to a io.Writer.
The withIgnore function takes an additional argument pattern to exclude the line containing the pattern from the file. The function works, but with benchmarking, found it to be 5x slower than withoutIgnore. Is there a way it could be improved?
package main
import (
"bufio"
"bytes"
"io"
"log"
"os"
)
func withoutIgnore(f string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
inputBuffer := make([]byte, 1048576)
var bytesRead int
var bs []byte
opBuffer := bytes.NewBuffer(bs)
for {
bytesRead, err = rfd.Read(inputBuffer)
if err == io.EOF {
return opBuffer, nil
}
if err != nil {
return nil, nil
}
_, err = opBuffer.Write(inputBuffer[:bytesRead])
if err != nil {
return nil, err
}
}
return opBuffer, nil
}
func withIgnore(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
scanner := bufio.NewScanner(rfd)
var bs []byte
buffer := bytes.NewBuffer(bs)
for scanner.Scan() {
if !bytes.Contains(scanner.Bytes(), []byte(pattern)) {
_, err := buffer.WriteString(scanner.Text() + "\n")
if err != nil {
return nil, nil
}
}
}
return buffer, nil
}
func main() {
// buff, err := withoutIgnore("base64dump.log")
buff, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
log.Fatal(err)
}
_, err = buff.WriteTo(os.Stdout)
if err != nil {
log.Fatal(err)
}
}
Benchmark test
package main
import "testing"
func BenchmarkTestWithoutIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withoutIgnore("base64dump.log")
if err != nil {
b.Fatal(err)
}
}
}
func BenchmarkTestWithIgnore(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := withIgnore("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
and the "base64dump.log" can be generated in the command line using
base64 /dev/urandom | head -c 10000000 > base64dump.log
Since ASCII is guaranteed, one can work directly at byte level.
Still if one checks each byte for line breaks when reading the input and then searches for the pattern again within the line, operations are applied to each byte.
If, on the other hand, one reads chunks of the input and performs an optimized search for the pattern in the text, not even examining each input byte, one minimizes the operations per input byte.
For example, there is the Boyer-Moore string search algorithm. Go's built-in bytes.Index function is also optimized. The achieved speed depends of course on the input data and the actual pattern. For the input as specified in the question, `bytes.Index turned out to be significantly more performant when measured.
Procedure
read in a chunk, where the chunk size should be significantly longer than the maximum line length, a value >= 64KB should probably be good, in the test 1MB was used as in the question.
a chunk usually doesn't end at a linefeed, so search from the end of the chunk to the next linefeed, limit the search to this slice and remember the remaining data for the next pass
the last chunk does not necessarily end in a linefeed
with the help of the performant GO function bytes.Index you can find the places where the pattern occurs in the chunk
from the found location one searches for the preceding and the following linefeed
then the block is output up to the corresponding beginning of the line
and the search is continued from the end of the line where the pattern occurred
if the search does not find another location, the rest is output
read the next chunk and apply the described steps again until the end of the file is reached
Noteworthy
A read operation may return less data than the chunk size, so it makes sense to repeat the read operation until the chunk size data has been read.
Benchmark
Optimized code is often significantly more complicated, but the performance is also significantly better, as we will see in a moment.
BenchmarkTestWithoutIgnore-8 270 4137267 ns/op
BenchmarkTestWithIgnore-8 54 22403931 ns/op
BenchmarkTestFilter-8 150 7947454 ns/op
Here, the optimized code BenchmarkTestFilter-8 is only about 1.9x slower than the operation without filtering while the BenchmarkTestWithIgnore-8 method is 5.4x slower than the comparison value without filtering.
Looked at another way: the optimized code is 2.8 times faster than the unoptimized one.
Code
Of course, here is the code for your own tests:
func filterFile(f, pattern string) (*bytes.Buffer, error) {
rfd, err := os.Open(f)
if err != nil {
log.Fatal(err)
}
defer func() {
if err := rfd.Close(); err != nil {
log.Fatal(err)
}
}()
reader := bufio.NewReader(rfd)
return filter(reader, []byte(pattern), 1024*1024)
}
// chunkSize must be larger than the longest line
// a reasonable size is probably >= 64K
func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error) {
var bs []byte
buffer := bytes.NewBuffer(bs)
chunk := make([]byte, chunkSize)
var remaining []byte
for lastChunk := false; !lastChunk; {
n, err := readChunk(reader, chunk, remaining, chunkSize)
if err != nil {
if err == io.EOF {
lastChunk = true
} else {
return nil, err
}
}
remaining = remaining[:0]
if !lastChunk {
for i := n - 1; i > 0; i-- {
if chunk[i] == '\n' {
remaining = append(remaining, chunk[i+1:n]...)
n = i + 1
break
}
}
}
s := 0
for s < n {
hit := bytes.Index(chunk[s:n], pattern)
if hit < 0 {
break
}
hit += s
startOfLine := hit
for ; startOfLine > 0; startOfLine-- {
if chunk[startOfLine] == '\n' {
startOfLine++
break
}
}
endOfLine := hit + len(pattern)
for ; endOfLine < n; endOfLine++ {
if chunk[endOfLine] == '\n' {
break
}
}
endOfLine++
_, err = buffer.Write(chunk[s:startOfLine])
if err != nil {
return nil, err
}
s = endOfLine
}
if s < n {
_, err = buffer.Write(chunk[s:n])
if err != nil {
return nil, err
}
}
}
return buffer, nil
}
func readChunk(reader io.Reader, chunk, remaining []byte, chunkSize int) (int, error) {
copy(chunk, remaining)
r := len(remaining)
for r < chunkSize {
n, err := reader.Read(chunk[r:])
r += n
if err != nil {
return r, err
}
}
return r, nil
}
And the benchmark part might look something like this:
func BenchmarkTestFilter(b *testing.B) {
for i := 0; i < b.N; i++ {
_, err := filterFile("base64dump.log", "AUDIT")
if err != nil {
b.Fatal(err)
}
}
}
The filter function was split and the actual job is done in func filter(reader io.Reader, pattern []byte, chunkSize int) (*bytes.Buffer, error).
By injecting a reader and a chunkSize, the creation of unit tests is already prepared or contemplated, which is missing here, but is definitely recommended when dealing with indexes.
However, the main point here was to find a way to significantly improve it in terms of performance.

Efficient way to use a csv.Reader() for a "chan string"

I have a "chan string", where each entry is a CSV log line that I would like to convert to columns "[]string", currently I am (un-efficiently) creating a csv.NewReader(strings.NewReader(i)) for each item, which looks a lot more work than it really needs to be:
for i := range feederChan {
r := csv.NewReader(strings.NewReader(i))
a, err := r.Read()
if err != nil {
// log error...
continue
}
// then do stuff with 'a'
// ...
}
So, I'd really appreciate sharing if there's a more efficient way to do that, like creating the csv.Reader once, then feeding it the chan content somehow (stream 'chan' content to something that implements the 'io.Reader' interface?).
Use the following to convert a channel of strings to a reader:
type chanReader struct {
c chan string
buf string
}
func (r *chanReader) Read(p []byte) (int, error) {
// Fill the buffer when we have no data to return to the caller
if len(r.buf) == 0 {
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
}
n := copy(p, r.buf)
r.buf = r.buf[n:]
return n, nil
}
Use it like this:
r := csv.NewReader(&chanReader{c: feederChan})
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
Run it on the playground
If the application assumes that newlines separate the values received from the channel, then append a newline to each value received:
...
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
r.buf += "\n"
...
The += "\n" copies the string. If this does not meet the application's efficiency requirements, then introduce a new field to manage line separators.
type chanReader struct {
c chan string // source of lines
buf string // the current line
nl bool // true if line separator is pending
}
func (r *chanReader) Read(p []byte) (int, error) {
// Fill the buffer when we have no data to return to the caller
if len(r.buf) == 0 && !r.nl {
var ok bool
r.buf, ok = <-r.c
if !ok {
// Return eof on channel closed
return 0, io.EOF
}
r.nl = true
}
// Return data if we have it
if len(r.buf) > 0 {
n := copy(p, r.buf)
r.buf = r.buf[n:]
return n, nil
}
// No data, return the line separator
n := copy(p, "\n")
r.nl = n == 0
return n, nil
}
Run it on the playground.
Another approach is to use an io.Pipe and goroutine to convert the channel to a io.Reader as suggested in a comment to the question. A first pass at this approach is:
var nl = []byte("\n")
func createChanReader(c chan string) io.Reader {
r, w := io.Pipe()
go func() {
defer w.Close()
for s := range c {
io.WriteString(w, s)
w.Write(nl)
}
}
}()
return r
}
Use it like this:
r := csv.NewReader(createChanReader(feederChan))
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
This first pass at the io.Pipe solution leaks a goroutine when the application exits the loop before reading the pipe to EOF. The application might break out early because the CSV reader detected a syntax error, the application panicked because of a programmer error, or any number of other reasons.
To fix the goroutine leak, exit the writing goroutine on write error and close the pipe reader when done reading.
var nl = []byte("\n")
func createChanReader(c chan string) *io.PipeReader {
r, w := io.Pipe()
go func() {
defer w.Close()
for s := range c {
if _, err := io.WriteString(w, s); err != nil {
return
}
if _, err := w.Write(nl); err != nil {
return
}
}
}()
return r
}
Use it like this:
cr := createChanReader(feederChan)
defer cr.Close() // Required for goroutine cleanup
r := csv.NewReader(cr)
for {
a, err := r.Read()
if err != nil {
// handle error, break out of loop
}
// do something with a
}
Run it on the playground.
Even though "ThunderCat's" answer was really useful and appreciated, I ended up using io.Pipe() "as mh-cbon mentioned" which is much simpler and looks like more efficient (explained below):
rp, wp := io.Pipe()
go func() {
defer wp.Close()
for i := range feederChan {
fmt.Fprintln(wp, i)
}
}()
r := csv.NewReader(rp)
for { // keep reading
a, err := r.Read()
if err == io.EOF {
break
}
// do stuff with 'a'
// ...
}
The io.Pipe() is synchronous, and should be fairly efficient: it pipes data from writer to a reader; I fed the csv.NewReader() the reader part, and created a goroutine that drains the chan writing to the writer part.
Thanks a lot.
EDIT: ThunderCat added the io.Pipe approach to his answer (after I posted this I guess) ... his answer is much more comprehensive and was accepted as such.

Golang parse array

I am trying to figure out why my code is not working. I wish to take a slice of numbers and strings, and separate it into three slices. For each element in the slice, if it is a string, append it to the strings slice, and if it is a positive number, append it to the positive numbers, and likewise with negative. Yet, here is the output
Names:
EvTremblay
45.39934611083154
-75.71148292845268
[Crestview -75.73795670904249
BellevueManor -75.73886856878032
Dutchie'sHole -75.66809864107668 ...
Positives:[45.344387632924054 45.37223315413918 ... ]
Negatives: []
Here is my code. Can someone tell me what is causing the Negatives array to not have any values?
func main() {
fmt.Printf("%q\n", strings.Split("a,b,c", ","))
var names []string
var positives, negatives []float64
bs, err := ioutil.ReadFile("poolss.txt")
if err != nil {
return
}
str := string(bs)
fmt.Println(str)
tokens := strings.Split(str, ",")
for _, token := range tokens {
if num, err := strconv.ParseFloat(token, 64); err == nil {
if num > 0 {
positives = append(positives, num)
} else {
negatives = append(negatives, num)
}
} else {
names = append(names, token)
}
fmt.Println(token)
}
fmt.Println(fmt.Sprintf("Strings: %v",names))
fmt.Println(fmt.Sprintf("Positives: %v", positives))
fmt.Println(fmt.Sprintf("Negatives: %v",negatives))
for i := range names{
fmt.Println(names[i])
fmt.Println(positives[i])
fmt.Println(negatives[i])
}
}
Your code has strings as a variable name:
var strings []string
and strings as a package name:
tokens := strings.Split(str, ",")
Don't do that!
strings.Split undefined (type []string has no field or method Split)
Playground: https://play.golang.org/p/HfZGj0jOT-P
Your problem above I think lies with the extra \n attached to each float probably - you get no negative entries if you end in a linefeed or you would get one if you have no linefeed at the end. So insert a printf so that you can see the errors you're getting from strconv.ParseFloat and all will become clear.
Some small points which may help:
Check errors, and don't depend on an error to be of only one type (this is what is confusing you here) - always print the error if it arrives, particularly when debugging
Don't use the name of a package for a variable (strings), it won't end well
Use a datastructure which reflects your data
Use the CSV package to read CSV data
So for example for storing the data you might want:
type Place struct {
Name string
Latitude int64
Longitude int64
}
Then read the data into that, depending on the fact that cols are in a given order, and store it in a []Place.
Here's what I tried, it works now! Thanks for the help, everyone!
func main() {
findRoute("poolss.csv", 5)
}
func findRoute( filename string, num int) []Edge {
var route []Edge
csvFile, err := os.Open(filename)
if err != nil {
return route
}
reader := csv.NewReader(bufio.NewReader(csvFile))
var pools []Pool
for {
line, error := reader.Read()
if error == io.EOF {
break
} else if error != nil {
log.Fatal(error)
}
lat, err := strconv.ParseFloat(line[1], 64)
long, err := strconv.ParseFloat(line[2], 64)
if err == nil {
pools = append(pools, Pool{
name: line[0],
latitude: lat,
longitude: long,
})
}
}
return route
}

How do I read in a large flat file

I have a flat file that has 339276 line of text in it for a size of 62.1 MB. I am attempting to read in all the lines, parse them based on some conditions I have and then insert them into a database.
I originally attempted to use a bufio.Scan() loop and bufio.Text() to get the line but I was running out of buffer space. I switched to using bufio.ReadLine/ReadString/ReadByte (I tried each) and had the same problem with each. I didn't have enough buffer space.
I tried using read and setting the buffer size but as the document says it actually a const that can be made smaller but never bigger that 64*1024 bytes. I then tried to use File.ReadAt where I set the starting postilion and moved it along as I brought in each section to no avail. I have looked at the following examples and explanations (not an exhaustive list):
Read text file into string array (and write)
How to Read last lines from a big file with Go every 10 secs
reading file line by line in go
How do I read in an entire file (either line by line or the whole thing at once) into a slice so I can then go do things to the lines?
Here is some code that I have tried:
file, err := os.Open(feedFolder + value)
handleError(err)
defer file.Close()
// fileInfo, _ := file.Stat()
var linesInFile []string
r := bufio.NewReader(file)
for {
path, err := r.ReadLine("\n") // 0x0A separator = newline
linesInFile = append(linesInFile, path)
if err == io.EOF {
fmt.Printf("End Of File: %s", err)
break
} else if err != nil {
handleError(err) // if you return error
}
}
fmt.Println("Last Line: ", linesInFile[len(linesInFile)-1])
Here is something else I tried:
var fileSize int64 = fileInfo.Size()
fmt.Printf("File Size: %d\t", fileSize)
var bufferSize int64 = 1024 * 60
bytes := make([]byte, bufferSize)
var fullFile []byte
var start int64 = 0
var interationCounter int64 = 1
var currentErr error = nil
for currentErr != io.EOF {
_, currentErr = file.ReadAt(bytes, st)
fullFile = append(fullFile, bytes...)
start = (bufferSize * interationCounter) + 1
interationCounter++
}
fmt.Printf("Err: %s\n", currentErr)
fmt.Printf("fullFile Size: %s\n", len(fullFile))
fmt.Printf("Start: %d", start)
var currentLine []string
for _, value := range fullFile {
if string(value) != "\n" {
currentLine = append(currentLine, string(value))
} else {
singleLine := strings.Join(currentLine, "")
linesInFile = append(linesInFile, singleLine)
currentLine = nil
}
}
I am at a loss. Either I don't understand exactly how the buffer works or I don't understand something else. Thanks for reading.
bufio.Scan() and bufio.Text() in a loop perfectly works for me on a files with much larger size, so I suppose you have lines exceeded buffer capacity. Then
check your line ending
and which Go version you use path, err :=r.ReadLine("\n") // 0x0A separator = newline? Looks like func (b *bufio.Reader) ReadLine() (line []byte, isPrefix bool, err error) has return value isPrefix specifically for your use case
http://golang.org/pkg/bufio/#Reader.ReadLine
It's not clear that it's necessary to read in all the lines before parsing them and inserting them into a database. Try to avoid that.
You have a small file: "a flat file that has 339276 line of text in it for a size of 62.1 MB." For example,
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
)
func readLines(filename string) ([]string, error) {
var lines []string
file, err := ioutil.ReadFile(filename)
if err != nil {
return lines, err
}
buf := bytes.NewBuffer(file)
for {
line, err := buf.ReadString('\n')
if len(line) == 0 {
if err != nil {
if err == io.EOF {
break
}
return lines, err
}
}
lines = append(lines, line)
if err != nil && err != io.EOF {
return lines, err
}
}
return lines, nil
}
func main() {
// a flat file that has 339276 lines of text in it for a size of 62.1 MB
filename := "flat.file"
lines, err := readLines(filename)
fmt.Println(len(lines))
if err != nil {
fmt.Println(err)
return
}
}
It seems to me this variant of readLines is shorter and faster than suggested peterSO
func readLines(filename string) (map[int]string, error) {
lines := make(map[int]string)
data, err := ioutil.ReadFile(filename)
if err != nil {
return nil, err
}
for n, line := range strings.Split(string(data), "\n") {
lines[n] = line
}
return lines, nil
}
package main
import (
"fmt"
"os"
"log"
"bufio"
)
func main() {
FileName := "assets/file.txt"
file, err := os.Open(FileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}

Resources