Getting IP addresses from big nfcapd binary files - go

I need to get information about source IPs and destination IPs from nfcapd binary file. The problem is in file's size. I know that it is not desirable to open and read very large (more than 1 GB) files with io or os package.
Here is my hacking and draft start:
package main
import (
"fmt"
"time"
"os"
"github.com/tehmaze/netflow/netflow5"
"log"
"io"
"bytes"
)
type Message interface {}
func main() {
startTime := time.Now()
getFile := os.Args[1]
processFile(getFile)
endTime := time.Since(startTime)
log.Printf("Program executes in %s", endTime)
}
func processFile(fileName string) {
file, err := os.Open(fileName)
// Check if file is not empty. If it is, then exit from program
if err != nil {
fmt.Println(err)
os.Exit(1)
}
// Useful to close file after getting information about it
defer file.Close()
Read(file)
}
func Read(r io.Reader) (Message, error) {
data := [2]byte{}
if _, err := r.Read(data[:]); err != nil {
return nil, err
}
buffer := bytes.NewBuffer(data[:])
mr := io.MultiReader(buffer, r)
return netflow5.Read(mr)
}
I want to split file into chunks with 24 flows and process it concurrently after reading with netflow package. But I do not imagine how to do it without losing any data during division.
Please fix me if I missed something in code or description. I spend a lot of time in searching my solution on the web and thinking about another possible implementations.
Any help and/or advice will be highly appreciated.
File has the following properties (command file -I <file_name> in terminal):
file_name: application/octet-stream; charset=binary
The output of file after command nfdump -r <file_name> has this structure:
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
Every property is on own column.
UPDATE 1:
Unfortunately, it is impossible to parse file with netflow package due to difference in binary file structure after saving it on disk via nfcapd. This answer was given by one of the nfdump contributors.
The only way now is to run nfdump from terminal in go program like pynfdump.
Another possible solution in the future is to use gopacket.

IO is is almost always going to be the limiting factor when parsing a file, and unless there is heavy computation involved, reading a single file serially is going to be the fastest way to process it.
Wrap the file in a bufio.Reader and give it to the Read function:
file, err := os.Open(fileName)
if err != nil {
log.Fatal((err)
}
defer file.Close()
packet, err := netflow5.Read(bufio.NewReader(file))
Once it's parsed, you can then split up the records if you need to handle the chunks separately.

Related

Getting `panic: os: invalid use of WriteAt on file opened with O_APPEND`

I am a newbie to Go. Was starting to write my first code in which I have to download a bunch of CSV's from AWS. I don't understand why it is giving me the below error with O_APPEND mode. If I remove os.O_APPEND, I only get the last file data which is not the objective.
The objective is to download all CSV files into one file locally. I'd like to understand what I'm doing incorrectly.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
const (
AccessKeyId = "xxxxxxxxx"
SecretAccessKey = "xxxxxxxxxxxxxxxxxxxx"
Region = "eu-central-1"
Bucket = "dexter-reports"
bucketKey = "Jenkins/pluginVersions/"
)
func main() {
// Load the Shared AWS Configuration
os.Setenv("AWS_ACCESS_KEY_ID", AccessKeyId)
os.Setenv("AWS_SECRET_ACCESS_KEY", SecretAccessKey)
filename := "JenkinsPluginDetais.txt"
cred := credentials.NewStaticCredentials(AccessKeyId, SecretAccessKey, "")
config := aws.Config{Credentials: cred, Region: aws.String(Region), Endpoint: aws.String("s3.amazonaws.com")}
file, err := os.OpenFile(filename, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0666)
if err != nil {
panic(err)
}
defer file.Close()
sess, err := session.NewSession(&config)
if err != nil {
fmt.Println(err)
}
//list Buckets
ObjectList := listBucketObjects(sess)
//loop over the obectlist. First initialize the s3 downloader via s3manager
downloader := s3manager.NewDownloader(sess)
for _, item := range ObjectList.Contents {
csvFile := filepath.Base(*item.Key)
if csvFile != "pluginVersions" {
downloadBucketObjects(downloader, file, csvFile)
}
}
}
func listBucketObjects(sess *session.Session) *s3.ListObjectsV2Output {
//create a new s3 client
svc := s3.New(sess)
resp, err := svc.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(Bucket),
Prefix: aws.String(bucketKey),
})
if err != nil {
panic(err)
}
return resp
}
func downloadBucketObjects(downloader *s3manager.Downloader, file *os.File, keyobj string) {
fileToDownload := bucketKey + keyobj
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
fmt.Println("Downloaded", file.Name(), numBytes, "bytes")
}
Firstly, I don't get it why do you even need os.O_APPEND flag in the first place. As per my understanding, you can omit os.O_APPEND.
Now, let's come to the actual problem of why it's happening:
Doc for O_APPEND (Ref: https://man7.org/linux/man-pages/man2/open.2.html):
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as
if with lseek(2). The modification of the file offset and
the write operation are performed as a single atomic step.
So for every call to write the file offset is positioned at the end of the file.
But (*s3Manager.Download).Download supposedly be using WriteAt method, i.e.,
Doc for WriteAt:
$ go doc os WriteAt
package os // import "os"
func (f *File) WriteAt(b []byte, off int64) (n int, err error)
WriteAt writes len(b) bytes to the File starting at byte offset off. It
returns the number of bytes written and an error, if any. WriteAt returns a
non-nil error when n != len(b).
If file was opened with the O_APPEND flag, WriteAt returns an error.
Notice the last line, that if the file's opened with O_APPEND flag it will result in an error and it's even right because WriteAt's second argument is an offset but mixing O_APPEND's behaviour and WriteAt offset seeking might create problem resulting in unexpected results and it errors out.
Consider the definition of s3manager.Downloader:
func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)
The first argument is an io.WriterAt; this interface is:
type WriterAt interface {
WriteAt(p []byte, off int64) (n int, err error)
}
This means that the Download function is going to call the WriteAt method in the File you are passing it. As per the documentation for File.WriteAt
If file was opened with the O_APPEND flag, WriteAt returns an error.
So this explains why you are getting the error but raises the question "why is Download using WriteAt and not accepting an io.Writer (and calling Write)?"; the answer can be found in the documentation:
The w io.WriterAt can be satisfied by an os.File to do multipart concurrent downloads, or in memory []byte wrapper using aws.WriteAtBuffer
So, to increase performance, Downloader might make multiple simultaneous requests for parts of the file and then write these out as they are received (meaning it may not write the data in order). This also explains why calling the function multiple times with the same File results in overwritten data (when Downloader retrieves the each chunk of the file it writes it out at the appropriate position in the output file; this overwrites any data already there).
The above quote from the documentation also points to a possible solution; use an aws.WriteAtBuffer and, once the download is finished, write the data to your file (which could then be opened with O_APPEND) - something like this:
buf := aws.NewWriteAtBuffer([]byte{})
numBytes, err := downloader.Download(buf,
&s3.GetObjectInput{
Bucket: aws.String(Bucket),
Key: aws.String(fileToDownload),
})
if err != nil {
panic(err)
}
_, err = file.Write(buf.Bytes())
if err != nil {
panic(err)
}
An alternative would be to download into a temporary file and then append that to your output file (you may need to do this if the files are large).

How to detect deleted file?

Writing to a non-existent file does not produce an error in Go.
For example, here's a sample program writing to a file in a loop:
package main
import (
"log"
"os"
"time"
)
func main() {
f, err := os.OpenFile("mytest.log", os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
}
}
While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString(). (I tested on Linux, it may be different for other OS's)
Is there a way to detect if the file was deleted (other than doing a stat on the file before every write)? And presumably the bytes written are simply discarded by the operating system?
While this is running, I issue rm mytest.log from the command line and observe that the program does not produce an error on the next call to WriteString()
Yes, that's exactly the behavior that's specified. Also the file hasn't been removed. The only thing that rm does remove is that particular path entry in the filesystem. A single file can have multiple paths, also called hardlinks.
The actual file is deleted only, when the last reference to it, either by filesystem entry (link) or by file descriptor (file open in a program) has been closed.
This particular behavior of the Unix file model was used for a long time to implement "unnamed" shared memory, by creating and opening a file in /dev/shm and then removing the filesystem entry – because this particular way of doing things introduces a race condition, for security sensitive applications new syscalls were introduced, that allow creating anonymous memory maps, and very recently Linux even got a function to create a file in a filesystem, without creating a path entry (open with O_TMPFILE flag).
On more recent versions of Linux you can even re-/create filesystem entries for files which last entry already was removed using the linkat syscall.
Update
The question is, do you really want to error out if the last filesystem entry vanishes? It's not a bad condition after all, you can safely write and read, without problems, just be aware, that once you close the last file descriptor to the file, it will be lost.
It is perfectly possible to detect if the last filesystem entry has been removed and abort file operations if so – however be aware, that such code might introduce it's very own share of problems, for example if the program expects to create a new filesystem entry, once everything has been written to the file properly, using linkat.
Anyway, what you can do, is fstat-ing the file (file.Stat in Go) and look at the number of hardlinks the file has. If that number drops to zero, all filesystem entries are gone. Actually getting that number is a little bit tricky in Go, it's described here Counting hard links to a file in Go
package main
import (
"fmt"
"log"
"os"
"syscall"
"time"
)
func main() {
fmt.Println("Test Operation")
f, err := os.OpenFile("test.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0644)
if err != nil {
log.Fatal(err)
}
for {
n, err := f.WriteString("blah\n")
if err != nil {
log.Fatal(err)
}
log.Printf("wrote %d bytes\n", n)
time.Sleep(2 * time.Second)
stat, err := f.Stat()
if err != nil{
log.Fatal(err)
}
if sys := stat.Sys(); sys != nil {
if stat, ok := sys.(*syscall.Stat_t); ok {
nlink := uint64(stat.Nlink)
if 0 == nlink {
log.Printf("All filesystem entries to original file removed, exiting")
break
}
}
}
}
}

Using Go-IPFS Programmatically

I would love to be able to use Go-IPFS within my Go program, however it is totally undocumented. This is where my reasearch lead me:
package main
import (
"context"
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
"gx/ipfs/QmSP88ryZkHSRn1fnngAaV2Vcn63WUJzAavnRM9CVdU1Ky/go-ipfs-cmdkit/files"
"github.com/ipfs/go-ipfs/core"
"github.com/ipfs/go-ipfs/core/coreunix"
)
func main() {
ctx := context.Background()
node, err := core.NewNode(ctx, &core.BuildCfg{})
if err != nil {
log.Fatalf("Failed to start IPFS node: %v", err)
}
reader, err := coreunix.Cat(ctx, node, "QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB")
if err != nil {
log.Fatalf("Failed to look up IPFS welcome page: %v", err)
}
blob, err := ioutil.ReadAll(reader)
if err != nil {
log.Fatalf("Failed to retrieve IPFS welcome page: %v", err)
}
fmt.Println(string(blob))
}
However I am not sure about the difference of
context.Background() Vs context.TODO() Vs context.WithCancel(context.Background()).
And most importantly, how can choose where IPFS will put the IPFS repo and make sure it also initializes it?
How can I enable and use Pubsub along with its commands subscribe and publish?
How can I add and pin a file to IPFS with the possibility to also input a stream for big files?
Is coreunix.Cat suitable to read files with a stream as well?
How can I keep the node "listening" like when you run the ipfs daemon from the CLI and have everything runnng on all the ports like webui, swarm, etc.?
How about this to add files? Does this use streams or reads the entire file into memory? How can this be improved?
func addFile(ctx *context.Context, node *core.IpfsNode, path *string) error {
file, err := os.Open(*path)
if err != nil {
return err
}
adder, err := coreunix.NewAdder(*ctx, node.Pinning, node.Blockstore, node.DAG)
if err != nil {
return err
}
filename := filepath.Base(*path)
fileReader := files.NewReaderFile(filename, filename, file, nil)
adder.AddFile(fileReader)
adder.PinRoot()
return nil
}
You may want to breakdown your question into smaller pieces, I have been playing with a source code of go-ipfs for a while and here's the general instruction I could give you:
Most of the data structures like, context, DAG, IPFSNode and so on are defined in form of go structures, and you should be able to find them in gx/.../... directory where also you should be able to see detailed information about each variable used ( simple string search through the directory should get you to your needed source file)
All the methods are defined in folder github.com/../.. directory
Clear the concept of pointers as they are using pointers most of the time to pass parameters to functions

Golang reading from serial

I'm trying to read from a serial port (a GPS device on a Raspberry Pi).
Following the instructions from http://www.modmypi.com/blog/raspberry-pi-gps-hat-and-python
I can read from shell using
stty -F /dev/ttyAMA0 raw 9600 cs8 clocal -cstopb
cat /dev/ttyAMA0
I get well formatted output
$GNGLL,5133.35213,N,00108.27278,W,160345.00,A,A*65
$GNRMC,160346.00,A,5153.35209,N,00108.27286,W,0.237,,290418,,,A*75
$GNVTG,,T,,M,0.237,N,0.439,K,A*35
$GNGGA,160346.00,5153.35209,N,00108.27286,W,1,12,0.67,81.5,M,46.9,M,,*6C
$GNGSA,A,3,29,25,31,20,26,23,21,16,05,27,,,1.11,0.67,0.89*10
$GNGSA,A,3,68,73,83,74,84,75,85,67,,,,,1.11,0.67,0.89*1D
$GPGSV,4,1,15,04,,,34,05,14,040,21,09,07,330,,16,45,298,34*40
$GPGSV,4,2,15,20,14,127,18,21,59,154,30,23,07,295,26,25,13,123,22*74
$GPGSV,4,3,15,26,76,281,40,27,15,255,20,29,40,068,19,31,34,199,33*7C
$GPGSV,4,4,15,33,29,198,,36,23,141,,49,30,172,*4C
$GLGSV,3,1,11,66,00,325,,67,13,011,20,68,09,062,16,73,12,156,21*60
$GLGSV,3,2,11,74,62,177,20,75,53,312,36,76,08,328,,83,17,046,25*69
$GLGSV,3,3,11,84,75,032,22,85,44,233,32,,,,35*62
$GNGLL,5153.35209,N,00108.27286,W,160346.00,A,A*6C
$GNRMC,160347.00,A,5153.35205,N,00108.27292,W,0.216,,290418,,,A*7E
$GNVTG,,T,,M,0.216,N,0.401,K,A*3D
$GNGGA,160347.00,5153.35205,N,00108.27292,W,1,12,0.67,81.7,M,46.9,M,,*66
$GNGSA,A,3,29,25,31,20,26,23,21,16,05,27,,,1.11,0.67,0.89*10
$GNGSA,A,3,68,73,83,74,84,75,85,67,,,,,1.11,0.67,0.89*1D
$GPGSV,4,1,15,04,,,34,05,14,040,21,09,07,330,,16,45,298,34*40
(I've put some random data in)
I'm trying to read this in Go. Currently, I have
package main
import "fmt"
import "log"
import "github.com/tarm/serial"
func main() {
config := &serial.Config{
Name: "/dev/ttyAMA0",
Baud: 9600,
ReadTimeout: 1,
Size: 8,
}
stream, err := serial.OpenPort(config)
if err != nil {
log.Fatal(err)
}
buf := make([]byte, 1024)
for {
n, err := stream.Read(buf)
if err != nil {
log.Fatal(err)
}
s := string(buf[:n])
fmt.Println(s)
}
}
But this prints malformed data. I suspect that this is due to the buffer size or the value of Size in the config struct being wrong, but I'm not sure how to get those values from the stty settings.
Looking back, I think the issue is that I'm getting a stream and I want to be able to iterate over lines of the stty, rather than chunks. This is how the stream is outputted:
$GLGSV,3
,1,09,69
,10,017,
,70,43,0
69,,71,3
2,135,27
,76,23,2
32,22*6F
$GLGSV
,3,2,09,
77,35,30
0,21,78,
11,347,,
85,31,08
1,30,86,
72,355,3
6*6C
$G
LGSV,3,3
,09,87,2
4,285,30
*59
$GN
GLL,5153
.34919,N
,00108.2
7603,W,1
92901.00
,A,A*6A
The struct you get back from serial.OpenPort() contains a pointer to an open os.File corresponding to the opened serial port connection. When you Read() from this, the library calls Read() on the underlying os.File.
The documentation for this function call is:
Read reads up to len(b) bytes from the File. It returns the number of bytes read and any error encountered. At end of file, Read returns 0, io.EOF.
This means you have to keep track of how much data was read. You also have to keep track of whether there were newlines, if this is important to you. Unfortunately, the underlying *os.File is not exported, so you'll find it difficult to use tricks like bufio.ReadLine(). It may be worth modifying the library and sending a pull request.
As Matthew Rankin noted in a comment, Port implements io.ReadWriter so you can simply use bufio to read by lines.
stream, err := serial.OpenPort(config)
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(stream)
for scanner.Scan() {
fmt.Println(scanner.Text()) // Println will add back the final '\n'
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
Change
fmt.Println(s)
to
fmt.Print(s)
and you will probably get what you want.
Or did I misunderstand the question?
Two additions to Michael Hamptom's answer which can be useful:
line endings
You might receive data that is not newline-separated text. bufio.Scanner uses ScanLines by default to split the received data into lines - but you can also write your own line splitter based on the default function's signature and set it for the scanner:
scanner := bufio.NewScanner(stream)
scanner.Split(ownLineSplitter) // set custom line splitter function
reader shutdown
You might not receive a constant stream but only some packets of bytes from time to time. If no bytes arrive at the port, the scanner will block and you can't just kill it. You'll have to close the stream to do so, effectively raising an error. To not block any outer loops and handle errors appropriately, you can wrap the scanner in a goroutine that takes a context. If the context was cancelled, ignore the error, otherwise forward the error. In principle, this can look like
var errChan = make(chan error)
var dataChan = make(chan []byte)
ctx, cancelPortScanner := context.WithCancel(context.Background())
go func(ctx context.Context) {
scanner := bufio.NewScanner(stream)
for scanner.Scan() { // will terminate if connection is closed
dataChan <- scanner.Bytes()
}
// if execution reaches this point, something went wrong or stream was closed
select {
case <-ctx.Done():
return // ctx was cancelled, just return without error
default:
errChan <- scanner.Err() // ctx wasn't cancelled, forward error
}
}(ctx)
// handle data from dataChan, error from errChan
To stop the scanner, you would cancel the context and close the connection:
cancelPortScanner()
stream.Close()

How to edit a reader in Go

I'm trying to work out what the best practise is to change some data in a stream without ioutil.ReadAll.
I need to remove lines beginning with a certain character and strip all instances of another.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"gopkg.in/pg.v3"
)
func main() {
fieldSep := "\x01"
badChar := "\x02"
comment := "#"
dbName := "foo"
db := pg.Connect(&pg.Options{})
file, err := os.Open("/path/to/file")
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
defer file.Close()
// I need to iterate my file Reader here
// all lines that begin with comment and remove them
scanner := bufio.NewScanner(file)
for scanner.Scan() {
file := bytes.TrimRight(file, comment)
}
// all instances of badChar should be dropped
file := bytes.Trim(file, badChar)
_, err = db.CopyFrom(file, fmt.Sprintf("COPY %s FROM STDIN WITH DELIMITER e'%s'", dbName, fieldSep))
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
err = db.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
fmt.Println("Import Done")
}
Context:
I'm to importing a large amount (>10GB) of data into a database, it's spread across several files.
My database interface accepts a reader to load the data.
The data has non-standard line endings and I need to strip comments (because PG's COPY FROM is no fun).
I know the code I've got to edit the stream is woeful, I just can't find a good reference - thanks!
If I was in your position, I'd make my own Reader, and insert it between the source and the destination. That's what consistent interfaces are for. Your reader would work easily on the small chunks of data along as they flow past.
Source (io.Reader) ==> Your filter (io.Reader) ==> Destination (expects an io.Reader)
provides the data does the transformations rock'n'rolls
A library example of such a reader that's made to be inserted between a reader and its client is bufio.Reader, that'll let you speed up many types of readers by buffering larger calls to the source, and letting the client consume the data in small bits if it likes it so. You can check out its source : http://golang.org/src/bufio/bufio.go

Resources