I would love to be able to use Go-IPFS within my Go program, however it is totally undocumented. This is where my reasearch lead me:
package main
import (
"context"
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
"gx/ipfs/QmSP88ryZkHSRn1fnngAaV2Vcn63WUJzAavnRM9CVdU1Ky/go-ipfs-cmdkit/files"
"github.com/ipfs/go-ipfs/core"
"github.com/ipfs/go-ipfs/core/coreunix"
)
func main() {
ctx := context.Background()
node, err := core.NewNode(ctx, &core.BuildCfg{})
if err != nil {
log.Fatalf("Failed to start IPFS node: %v", err)
}
reader, err := coreunix.Cat(ctx, node, "QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB")
if err != nil {
log.Fatalf("Failed to look up IPFS welcome page: %v", err)
}
blob, err := ioutil.ReadAll(reader)
if err != nil {
log.Fatalf("Failed to retrieve IPFS welcome page: %v", err)
}
fmt.Println(string(blob))
}
However I am not sure about the difference of
context.Background() Vs context.TODO() Vs context.WithCancel(context.Background()).
And most importantly, how can choose where IPFS will put the IPFS repo and make sure it also initializes it?
How can I enable and use Pubsub along with its commands subscribe and publish?
How can I add and pin a file to IPFS with the possibility to also input a stream for big files?
Is coreunix.Cat suitable to read files with a stream as well?
How can I keep the node "listening" like when you run the ipfs daemon from the CLI and have everything runnng on all the ports like webui, swarm, etc.?
How about this to add files? Does this use streams or reads the entire file into memory? How can this be improved?
func addFile(ctx *context.Context, node *core.IpfsNode, path *string) error {
file, err := os.Open(*path)
if err != nil {
return err
}
adder, err := coreunix.NewAdder(*ctx, node.Pinning, node.Blockstore, node.DAG)
if err != nil {
return err
}
filename := filepath.Base(*path)
fileReader := files.NewReaderFile(filename, filename, file, nil)
adder.AddFile(fileReader)
adder.PinRoot()
return nil
}
You may want to breakdown your question into smaller pieces, I have been playing with a source code of go-ipfs for a while and here's the general instruction I could give you:
Most of the data structures like, context, DAG, IPFSNode and so on are defined in form of go structures, and you should be able to find them in gx/.../... directory where also you should be able to see detailed information about each variable used ( simple string search through the directory should get you to your needed source file)
All the methods are defined in folder github.com/../.. directory
Clear the concept of pointers as they are using pointers most of the time to pass parameters to functions
Related
This question already has an answer here:
Write and Read File with same *os.File in Go
(1 answer)
Closed 8 months ago.
This post was edited and submitted for review 8 months ago and failed to reopen the post:
Original close reason(s) were not resolved
In Go, I am trying to write data to a temp file that I then turn around and read but have not been successful. Below is a stripped down test program. I have verified that the data are being written to the file by inspecting the temporary file. So, at least I know that data are making it into the file. I just am then unable to read it out.
Thank you for your help in advance
package main
import (
"bufio"
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
)
func main() {
tmpFile, err := ioutil.TempFile("", fmt.Sprintf("%s-", filepath.Base(os.Args[0])))
if err != nil {
log.Fatal("Could not create temporary file", err)
}
fmt.Println("Created temp file: ", tmpFile.Name())
// defer os.Remove(tmpFile.Name())
fmt.Println("Writing some data to the temp file")
if _, err = tmpFile.WriteString("test data"); err != nil {
log.Fatal("Unable to write to temporary file", err)
} else {
fmt.Println("data should have been written")
}
fmt.Println("Trying to read the temp file now")
s := bufio.NewScanner(tmpFile)
for s.Scan() {
fmt.Println(s.Text())
}
err = s.Err()
if err != nil {
log.Fatal("error reading temp file", err)
}
}
ioutil.TempFile creates a temp file and opens the file for reading and writing and returns the resulting *os.File (file descriptor). So when you're writing inside the file, the pointer is moved to that offset, i.e., it's currently at the end of the file.
But as your requirement is read from the file, you need to Seek back to the beginning or wherever desired offset using *os.File.Seek method. So, adding tmpFile.Seek(0, 0) will give you the desired behaviour.
Also, as a good practice, do not forget to close the file. Notice I've used defer tmpFile.Close() which closes the file before exiting.
Refer the following example:
package main
import (
"bufio"
"fmt"
"io/ioutil"
"log"
"os"
"path/filepath"
)
func main() {
tmpFile, err := ioutil.TempFile("", fmt.Sprintf("%s-", filepath.Base(os.Args[0])))
if err != nil {
log.Fatal("Could not create temporary file", err)
}
defer tmpFile.Close()
fmt.Println("Created temp file: ", tmpFile.Name())
fmt.Println("Writing some data to the temp file")
if _, err = tmpFile.WriteString("test data"); err != nil {
log.Fatal("Unable to write to temporary file", err)
} else {
fmt.Println("Data should have been written")
}
fmt.Println("Trying to read the temp file now")
// Seek the pointer to the beginning
tmpFile.Seek(0, 0)
s := bufio.NewScanner(tmpFile)
for s.Scan() {
fmt.Println(s.Text())
}
if err = s.Err(); err != nil {
log.Fatal("error reading temp file", err)
}
}
Update:
Comment from OP:
Is the deferred close needed given that deleting the actual file is also deferred? If so, I imagine order of deferral would matter.
So, that's a nice question. So the basic rule of thumb would be to close the file and then remove. So, it might even be possible to delete first and later close it, but that is OS-dependent.
If you refer C++'s doc:
If the file is currently open by the current or another process, the behavior of this function is implementation-defined (in particular, POSIX systems unlink the file name, although the file system space is not reclaimed even if this was the last hardlink to the file until the last running process closes the file, Windows does not allow the file to be deleted)
So, on Windows, that's a problem for sure if you try deleting it first without closing it.
So, as defer's are stacked, so the order of execution would be
defer os.Remove(tmpFile.Name()) // Called 2nd
defer tmpFile.Close() // Called 1st
package main
import (
"fmt"
"io/ioutil"
"log"
)
func main() {
content, err := ioutil.ReadFile("testdata/hello")
if err != nil {
log.Fatal(err)
}
fmt.Printf("File contents: %s", content)
according to the official golang docs.
I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram
func search() {
unique := make(map[string]string)
files, err := ioutil.ReadDir(".")
if err != nil {
log.Println(err)
}
for _, file := range files {
fileName := file.Name()
fmt.Println("CHECKING:", fileName)
fi, err := os.Stat(fileName)
if err != nil {
fmt.Println(err)
continue
}
if fi.Mode().IsRegular() {
data, err := ioutil.ReadFile(fileName)
if err != nil {
fmt.Println(err)
continue
}
sum := md5.Sum(data)
hexDigest := hex.EncodeToString(sum[:])
if _, ok := unique[hexDigest]; ok == false {
unique[hexDigest] = fileName
} else {
fmt.Println("DUPLICATE:", fileName)
}
}
}
}
As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks
There is an example in the Golang documentation, which covers your case.
package main
import (
"crypto/md5"
"fmt"
"io"
"log"
"os"
)
func main() {
f, err := os.Open("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
h := md5.New()
if _, err := io.Copy(h, f); err != nil {
log.Fatal(err)
}
fmt.Printf("%x", h.Sum(nil))
}
For your case, just make sure to close the files in the loop and not defer them. Or put the logic into a function.
Sounds like the 16GB RAM is your problem, not speed per se.
Don't read the entire file into a variable with ReadFile; io.Copy from the Reader that Open gives you to the Writer that hash/md5 provides (md5.New returns a hash.Hash, which embeds an io.Writer). That only copies a little bit at a time instead of pulling all of the file into RAM.
This is a trick useful in a lot of places in Go; packages like text/template, compress/gzip, net/http, etc. work in terms of Readers and Writers. With them, you don't usually need to create huge []bytes or strings; you can hook I/O interfaces up to each other and let them pass around pieces of content for you. In a garbage collected language, saving memory tends to save you CPU work as well.
I need to get information about source IPs and destination IPs from nfcapd binary file. The problem is in file's size. I know that it is not desirable to open and read very large (more than 1 GB) files with io or os package.
Here is my hacking and draft start:
package main
import (
"fmt"
"time"
"os"
"github.com/tehmaze/netflow/netflow5"
"log"
"io"
"bytes"
)
type Message interface {}
func main() {
startTime := time.Now()
getFile := os.Args[1]
processFile(getFile)
endTime := time.Since(startTime)
log.Printf("Program executes in %s", endTime)
}
func processFile(fileName string) {
file, err := os.Open(fileName)
// Check if file is not empty. If it is, then exit from program
if err != nil {
fmt.Println(err)
os.Exit(1)
}
// Useful to close file after getting information about it
defer file.Close()
Read(file)
}
func Read(r io.Reader) (Message, error) {
data := [2]byte{}
if _, err := r.Read(data[:]); err != nil {
return nil, err
}
buffer := bytes.NewBuffer(data[:])
mr := io.MultiReader(buffer, r)
return netflow5.Read(mr)
}
I want to split file into chunks with 24 flows and process it concurrently after reading with netflow package. But I do not imagine how to do it without losing any data during division.
Please fix me if I missed something in code or description. I spend a lot of time in searching my solution on the web and thinking about another possible implementations.
Any help and/or advice will be highly appreciated.
File has the following properties (command file -I <file_name> in terminal):
file_name: application/octet-stream; charset=binary
The output of file after command nfdump -r <file_name> has this structure:
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
Every property is on own column.
UPDATE 1:
Unfortunately, it is impossible to parse file with netflow package due to difference in binary file structure after saving it on disk via nfcapd. This answer was given by one of the nfdump contributors.
The only way now is to run nfdump from terminal in go program like pynfdump.
Another possible solution in the future is to use gopacket.
IO is is almost always going to be the limiting factor when parsing a file, and unless there is heavy computation involved, reading a single file serially is going to be the fastest way to process it.
Wrap the file in a bufio.Reader and give it to the Read function:
file, err := os.Open(fileName)
if err != nil {
log.Fatal((err)
}
defer file.Close()
packet, err := netflow5.Read(bufio.NewReader(file))
Once it's parsed, you can then split up the records if you need to handle the chunks separately.
I'm trying to work out what the best practise is to change some data in a stream without ioutil.ReadAll.
I need to remove lines beginning with a certain character and strip all instances of another.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"gopkg.in/pg.v3"
)
func main() {
fieldSep := "\x01"
badChar := "\x02"
comment := "#"
dbName := "foo"
db := pg.Connect(&pg.Options{})
file, err := os.Open("/path/to/file")
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
defer file.Close()
// I need to iterate my file Reader here
// all lines that begin with comment and remove them
scanner := bufio.NewScanner(file)
for scanner.Scan() {
file := bytes.TrimRight(file, comment)
}
// all instances of badChar should be dropped
file := bytes.Trim(file, badChar)
_, err = db.CopyFrom(file, fmt.Sprintf("COPY %s FROM STDIN WITH DELIMITER e'%s'", dbName, fieldSep))
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
err = db.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
fmt.Println("Import Done")
}
Context:
I'm to importing a large amount (>10GB) of data into a database, it's spread across several files.
My database interface accepts a reader to load the data.
The data has non-standard line endings and I need to strip comments (because PG's COPY FROM is no fun).
I know the code I've got to edit the stream is woeful, I just can't find a good reference - thanks!
If I was in your position, I'd make my own Reader, and insert it between the source and the destination. That's what consistent interfaces are for. Your reader would work easily on the small chunks of data along as they flow past.
Source (io.Reader) ==> Your filter (io.Reader) ==> Destination (expects an io.Reader)
provides the data does the transformations rock'n'rolls
A library example of such a reader that's made to be inserted between a reader and its client is bufio.Reader, that'll let you speed up many types of readers by buffering larger calls to the source, and letting the client consume the data in small bits if it likes it so. You can check out its source : http://golang.org/src/bufio/bufio.go
So I've been playing around with Network namespaces recently.
I put together a simple code, built it and noticed something very weird happening.
The code is as follows:
package main
import (
"fmt"
"log"
"net"
"os"
"path"
"syscall"
)
const (
NsRunDir = "/var/run/netns"
SelfNetNs = "/proc/self/ns/net"
)
func main() {
netNsPath := path.Join(NsRunDir, "myns")
os.Mkdir(NsRunDir, 0755)
if err := syscall.Mount(NsRunDir, NsRunDir, "none", syscall.MS_BIND, ""); err != nil {
log.Fatalf("Could not create Network namespace: %s", err)
}
fd, err := syscall.Open(netNsPath, syscall.O_RDONLY|syscall.O_CREAT|syscall.O_EXCL, 0)
if err != nil {
log.Fatalf("Could not create Network namespace: %s", err)
}
syscall.Close(fd)
if err := syscall.Unshare(syscall.CLONE_NEWNET); err != nil {
log.Fatalf("Could not clone new Network namespace: %s", err)
}
if err := syscall.Mount(SelfNetNs, netNsPath, "none", syscall.MS_BIND, ""); err != nil {
log.Fatalf("Could not Mount Network namespace: %s", err)
}
if err := syscall.Unmount(netNsPath, syscall.MNT_DETACH); err != nil {
log.Fatalf("Could not Unmount new Network namespace: %s", err)
}
if err := syscall.Unlink(netNsPath); err != nil {
log.Fatalf("Could not Unlink new Network namespace: %s", err)
}
ifcs, _ := net.Interfaces()
for _, ifc := range ifcs {
fmt.Printf("%#v\n", ifc)
}
}
Now, when you run this code on Trusty 14.04, you will see something weird happening.
This happens when you run the binary several times in a row.
Sometimes it prints out all Host's interfaces, sometimes it simply prints out just a loopback interface which means that the range loop at the end of the program seems to be executing once when the namespace is still attached and sometimes when it's already been detached.
I'm totally confused why this is happening but I'm thinking it's either my code or maybe I'm just missing out something in terms of the program execution or some kernel stuff.
Any help would be massively appreciated.
Thanks
Update1:
So it seems the "strange" behaviour has to do with how golang schedules goroutines across OS Threads. So you need to make sure that you handle the runtime well. What I mean by that is, that if you lock the code execution to one OS Thread you will get consistent results. You can do this by adding the following runtime package statement:
runtime.LockOSThread()
However this still does not solve my problem, but now I think it all comes down to understanding of Namespaces. I need to look more into that.
Update2:
To give you a little bit more context into why you should use the above OS Thread Lock when running bunch of syscalls and experience the similar "strangeness" of correct behavior, give ththis blogpost a read. It describes runtime and go schedulers. It was written for go 1.1 but it still gives very good overview.