How do I check the size of a Go project? - go

Is there an easy way to check the size of a Golang project? It's not an executable, it's a package that I'm importing in my own project.

You can see how big the library binaries are by looking in the $GOPATH/pkg directory (if $GOPATH is not exported go defaults to $HOME/go).
So to check the size of some of the gorilla http pkgs. Install them first:
$ go get -u github.com/gorilla/mux
$ go get -u github.com/gorilla/securecookie
$ go get -u github.com/gorilla/sessions
The KB binary sizes on my 64-bit MacOS (darwin_amd64):
$ cd $GOPATH/pkg/darwin_amd64/github.com/gorilla/
$ du -k *
284 mux.a
128 securecookie.a
128 sessions.a
EDIT:
Library (package) size is one thing, but how much space that takes up in your executable after the link stage can vary wildly. This is because packages have their own dependencies and with that comes extra baggage, but that baggage may be shared by other packages you import.
An example demonstrates this best:
empty.go:
package main
func main() {}
http.go:
package main
import "net/http"
var _ = http.Serve
func main() {}
mux.go:
package main
import "github.com/gorilla/mux"
var _ = mux.NewRouter
func main() {}
All 3 programs are functionally identical - executing zero user code - but their dependencies differ. The resulting binary sizes in KB:
$ du -k *
1028 empty
5812 http
5832 mux
What does this tell us? The core go pkg net/http adds significant size to our executable. The mux pkg is not large by itself, but it has an import dependency on net/http pkg - hence the significant file size for it too. Yet the delta between mux and http is only 20KB, whereas the listed file size of the mux.a library is 284KB. So we can't simply add the library pkg sizes to determine their true footprint.
Conclusion:
The go linker will strip out a lot of baggage from individual libraries during the build process, but in order to get a true sense of how much extra weight importing certain packages, one has to look at all of the pkg's sub-dependencies as well.

Here is another solution that makes use of https://pkg.go.dev/golang.org/x/tools/go/packages
I took the example provided by the author, and slightly updated it with the demonstration binary available here.
package main
import (
"flag"
"fmt"
"log"
"os"
"sort"
"golang.org/x/tools/go/packages"
)
func main() {
flag.Parse()
// Many tools pass their command-line arguments (after any flags)
// uninterpreted to packages.Load so that it can interpret them
// according to the conventions of the underlying build system.
cfg := &packages.Config{Mode: packages.NeedFiles |
packages.NeedSyntax |
packages.NeedImports,
}
pkgs, err := packages.Load(cfg, flag.Args()...)
if err != nil {
fmt.Fprintf(os.Stderr, "load: %v\n", err)
os.Exit(1)
}
if packages.PrintErrors(pkgs) > 0 {
os.Exit(1)
}
// Print the names of the source files
// for each package listed on the command line.
var size int64
for _, pkg := range pkgs {
for _, file := range pkg.GoFiles {
s, err := os.Stat(file)
if err != nil {
log.Println(err)
continue
}
size += s.Size()
}
}
fmt.Printf("size of %v is %v b\n", pkgs[0].ID, size)
size = 0
for _, pkg := range allPkgs(pkgs) {
for _, file := range pkg.GoFiles {
s, err := os.Stat(file)
if err != nil {
log.Println(err)
continue
}
size += s.Size()
}
}
fmt.Printf("size of %v and deps is %v b\n", pkgs[0].ID, size)
}
func allPkgs(lpkgs []*packages.Package) []*packages.Package {
var all []*packages.Package // postorder
seen := make(map[*packages.Package]bool)
var visit func(*packages.Package)
visit = func(lpkg *packages.Package) {
if !seen[lpkg] {
seen[lpkg] = true
// visit imports
var importPaths []string
for path := range lpkg.Imports {
importPaths = append(importPaths, path)
}
sort.Strings(importPaths) // for determinism
for _, path := range importPaths {
visit(lpkg.Imports[path])
}
all = append(all, lpkg)
}
}
for _, lpkg := range lpkgs {
visit(lpkg)
}
return all
}

You can download all the imported modules with go mod vendor, then count the lines of all the .go files that aren't test files:
package main
import (
"bytes"
"fmt"
"io/fs"
"os"
"os/exec"
"path/filepath"
"strings"
)
func count(mod string) int {
imp := fmt.Sprintf("package main\nimport _ %q", mod)
os.WriteFile("size.go", []byte(imp), os.ModePerm)
exec.Command("go", "mod", "init", "size").Run()
exec.Command("go", "mod", "vendor").Run()
var count int
filepath.WalkDir("vendor", func(s string, d fs.DirEntry, err error) error {
if strings.HasSuffix(s, ".go") && !strings.HasSuffix(s, "_test.go") {
data, err := os.ReadFile(s)
if err != nil {
return err
}
count += bytes.Count(data, []byte{'\n'})
}
return nil
})
return count
}
func main() {
println(count("github.com/klauspost/compress/zstd"))
}

Related

Is there RCF docs for PDF?

My goal is to create a PDF to CSV. In this case, converting journal entries from the PDF file into CSV file.
What I've tried:
use pdftotext from linux.
The Installation:
$ sudo apt-get install poppler-utils
The code:
package main
import (
"fmt"
"os/exec"
)
func main() {
body, err := exec.Command("pdftotext", "-layout", "-q", "-nopgbrk", "-enc", "UTF-8", "-eol", "unix", "/Volumes/T7Touch/Learn/e-statement-to-t-account/5725299769Jul2022.pdf", "-").Output()
if err != nil {
panic(err)
}
fmt.Println(string(body))
}
Does it work?
The -layout option makes the output workable. The fmt.Println(string(output)) will print every journal entry per line.
row1column1 row1column2 row1column3
row2column1 row2column2 row2column2
Without the -layout option, the output will be not readable.
row1column1
row2column1
row1column2
row2column2
The problem is that this solution need to install poppler-utils to use pdftotext. Without it, it will throws error.
$ GOOS=darwin GOARCH=arm64 go build
$ ./main // will throw errors -> panic: exec: "pdftotext": executable file not found in $PATH
use https://github.com/ledongthuc/pdf
The code
package main
import (
"fmt"
"os"
"github.com/ledongthuc/pdf"
)
func main() {
content, err := readPdf(os.Args[1]) // Read local pdf file
if err != nil {
panic(err)
}
fmt.Println(content)
return
}
func readPdf(path string) (string, error) {
f, r, err := pdf.Open(path)
defer func() {
_ = f.Close()
}()
if err != nil {
return "", err
}
totalPage := r.NumPage()
for pageIndex := 1; pageIndex <= totalPage; pageIndex++ {
p := r.Page(pageIndex)
if p.V.IsNull() {
continue
}
rows, _ := p.GetTextByRow()
for _, row := range rows {
println(">>>> row: ", row.Position)
for _, word := range row.Content {
fmt.Println(word.S)
}
}
}
return "", nil
}
I intend to create pdf decoder that does exactly like $ pdftotext -layout without the need to do $ sudo apt-get install popper-utils. I believe that I can solve this if someone can help point out where to see pdftotext repository or RFC docs for PDF.

List FTP files with goftp

I am trying to write a simple Go program which connects to an FTP server, list the files in a specified directory and pulls them.
The code is this:
package main
import (
"bytes"
"fmt"
"github.com/secsy/goftp"
"io/ioutil"
"log"
"os"
"path"
"time"
)
func main() {
config := goftp.Config{
User: "anonymous",
Password: "root#local.me",
ConnectionsPerHost: 21,
Timeout: 10 * time.Second,
Logger: os.Stderr,
}
// Connecting to the server
client, dailErr := goftp.DialConfig(config, "ftp.example.com")
if dailErr != nil {
log.Fatal(dailErr)
panic(dailErr)
}
// setting the search directory
dir := "/downloads/"
files, err := client.ReadDir(dir)
if err != nil {
for _, file := range files {
if file.IsDir() {
path.Join(dir, file.Name())
} else {
fmt.Println("the file is %s", file.Name())
}
}
}
// this section works , I am setting a file name and I can pull it
// if I mark the search part
ret_file := "example.PDF"
fmt.Println("Retrieving file: ", ret_file)
buf := new(bytes.Buffer)
fullPathFile := dir + ret_file
rferr := client.Retrieve(fullPathFile, buf)
if rferr != nil {
panic(rferr)
}
fmt.Println("writing data to file", ret_file)
fmt.Println("Opening file", ret_file, "for writing")
w, _ := ioutil.ReadAll(buf)
ferr := ioutil.WriteFile(ret_file, w, 0644)
if ferr != nil {
log.Fatal(ferr)
panic(ferr)
} else {
fmt.Println("Writing", ret_file, " completed")
}
}
For some reason I am getting an error on the ReadDir function.
I need to grab the files names so I can download them.
You're attempting to loop through files when ReadDir() returns an error. That will never work, as any time an error is returned files is nil.
This is pretty standard behavior and can be confirmed by reading the implementation of ReadDir().
I'm guessing you may have used the the example from the project used to demonstrate ReadDir() as a starting point. Within the example, the error handling is involved because it's deciding whether or not to continue walking the directory tree. However, note that when ReadDir() returns an error that doesn't result in stopping the program, the subsequent for loop is a no-op, since files is nil.
Here's a small program that demonstrates successfully using the results of Readdir() in a straightforward manner:
package main
import (
"fmt"
"github.com/secsy/goftp"
)
const (
ftpServerURL = "ftp.us.debian.org"
ftpServerPath = "/debian/"
)
func main() {
client, err := goftp.Dial(ftpServerURL)
if err != nil {
panic(err)
}
files, err := client.ReadDir(ftpServerPath)
if err != nil {
panic(err)
}
for _, file := range files {
fmt.Println(file.Name())
}
}
It outputs (which matches the current listing at http://ftp.us.debian.org/debian/):
$ go run goftp-test.go
README
README.CD-manufacture
README.html
README.mirrors.html
README.mirrors.txt
dists
doc
extrafiles
indices
ls-lR.gz
pool
project
tools
zzz-dists

How to read packed binary data in Go?

I'm trying to figure out the best way to read a packed binary file in Go that was produced by Python like the following:
import struct
f = open('tst.bin', 'wb')
fmt = 'iih' #please note this is packed binary: 4byte int, 4byte int, 2byte int
f.write(struct.pack(fmt,4, 185765, 1020))
f.write(struct.pack(fmt,4, 185765, 1022))
f.close()
I have been tinkering with some of the examples I've seen on Github.com and a few other sources but I can't seem to get anything working correctly (update shows working method). What is the idiomatic way to do this sort of thing in Go? This is one of several attempts
UPDATE and WORKING
package main
import (
"fmt"
"os"
"encoding/binary"
"io"
)
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line
for true {
_, err := fp.Read(lineBuf)
if err == io.EOF{
break
}
aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
fmt.Println(aVal, bVal, cVal)
}
}
A well portable and rather easy way to handle the problem are Google's "Protocol Buffers". Though this is too late now since you got it working, I took some effort in explaining and coding it, so I am posting it anyway.
You can find the code on https://github.com/mwmahlberg/ProtoBufDemo
You need to install the protocol buffers for python using your preferred method (pip, OS package management, source) and for Go
The .proto file
The .proto file is rather simple for our example. I called it data.proto
syntax = "proto2";
package main;
message Demo {
required uint32 A = 1;
required uint32 B = 2;
// A shortcomning: no 16 bit ints
// We need to make this sure in the applications
required uint32 C = 3;
}
Now you need to call protoc on the file and have it provide the code for both Python and Go:
protoc --go_out=. --python_out=. data.proto
which generates the files data_pb2.py and data.pb.go. Those files provide the language specific access to the protocol buffer data.
When using the code from github, all you need to do is to issue
go generate
in the source directory.
The Python code
import data_pb2
def main():
# We create an instance of the message type "Demo"...
data = data_pb2.Demo()
# ...and fill it with data
data.A = long(5)
data.B = long(5)
data.C = long(2015)
print "* Python writing to file"
f = open('tst.bin', 'wb')
# Note that "data.SerializeToString()" counterintuitively
# writes binary data
f.write(data.SerializeToString())
f.close()
f = open('tst.bin', 'rb')
read = data_pb2.Demo()
read.ParseFromString(f.read())
f.close()
print "* Python reading from file"
print "\tDemo.A: %d, Demo.B: %d, Demo.C: %d" %(read.A, read.B, read.C)
if __name__ == '__main__':
main()
We import the file generated by protoc and use it. Not much magic here.
The Go File
package main
//go:generate protoc --python_out=. data.proto
//go:generate protoc --go_out=. data.proto
import (
"fmt"
"os"
"github.com/golang/protobuf/proto"
)
func main() {
// Note that we do not handle any errors for the sake of brevity
d := Demo{}
f, _ := os.Open("tst.bin")
fi, _ := f.Stat()
// We create a buffer which is big enough to hold the entire message
b := make([]byte,fi.Size())
f.Read(b)
proto.Unmarshal(b, &d)
fmt.Println("* Go reading from file")
// Note the explicit pointer dereference, as the fields are pointers to a pointers
fmt.Printf("\tDemo.A: %d, Demo.B: %d, Demo.C: %d\n",*d.A,*d.B,*d.C)
}
Note that we do not need to explicitly import, as the package of data.proto is main.
The result
After generation the required files and compiling the source, when you issue
$ python writer.py && ./ProtoBufDemo
the result is
* Python writing to file
* Python reading from file
Demo.A: 5, Demo.B: 5, Demo.C: 2015
* Go reading from file
Demo.A: 5, Demo.B: 5, Demo.C: 2015
Note that the Makefile in the repository offers a shorcut for generating the code, compiling the .go files and run both programs:
make run
The Python format string is iih, meaning two 32-bit signed integers and one 16-bit signed integer (see the docs). You can simply use your first example but change the struct to:
type binData struct {
A int32
B int32
C int16
}
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
for {
thing := binData{}
err := binary.Read(fp, binary.LittleEndian, &thing)
if err == io.EOF{
break
}
fmt.Println(thing.A, thing.B, thing.C)
}
}
Note that the Python packing didn't specify the endianness explicitly, but if you're sure the system that ran it generated little-endian binary, this should work.
Edit: Added main() function to explain what I mean.
Edit 2: Capitalized struct fields so binary.Read could write into them.
As I mentioned in my post, I'm not sure this is THE idiomatic way to do this in Go but this is the solution that I came up with after a fair bit of tinkering and adapting several different examples. Note again that this unpacks 4 and 2 byte int into Go int32 and int16 respectively. Posting so that there is a valid answer in case someone comes looking. Hopefully someone will post a more idiomatic way of accomplishing this but for now, this works.
package main
import (
"fmt"
"os"
"encoding/binary"
"io"
)
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line
for true {
_, err := fp.Read(lineBuf)
if err == io.EOF{
break
}
aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
fmt.Println(aVal, bVal, cVal)
}
}
Try binpacker libary.
Example:
Example data:
buffer := new(bytes.Buffer)
packer := binpacker.NewPacker(buffer)
unpacker := binpacker.NewUnpacker(buffer)
packer.PushByte(0x01)
packer.PushUint16(math.MaxUint16)
Unpack:
var val1 byte
var val2 uint16
var err error
val1, err = unpacker.ShiftByte()
val2, err = unpacker.ShiftUint16()
Or:
var val1 byte
var val2 uint16
var err error
unpacker.FetchByte(&val1).FetchUint16(&val2)
unpacker.Error() // Make sure error is nil

Finding imports and dependencies of a go program

The go list -json command run from the command line will tell you the imports and dependencies of a go program ( in json format). Is there a way to get this information from within a go program I.e at runtime, either by running the 'go list' command somehow or another way?
The following code uses the go/build to get the imports for the application in the current working directory.
p, err := build.Default.Import(".", ".", 0)
if err != nil {
// handle error
}
for _, i := range p.Imports {
fmt.Println(i)
}
You can build a list of all dependencies using a simple recursive function.
To get the imports for a specific path, use:
p, err := build.Default.Import(path, ".", 0)
if err != nil {
// handle error
}
for _, i := range p.Imports {
fmt.Println(i)
}
I don't think you can do it without using the go binary since go needs to analyze your source code.
It's pretty easy to do but it must have access to go and your source code at run time. Heres a quick example:
package main
import (
"encoding/json"
"fmt"
"os/exec"
)
func main() {
cmd := exec.Command("go", "list", "-json")
stdout, err := cmd.Output()
if err != nil {
println(err.Error())
return
}
var list GoList
err = json.Unmarshal(stdout, &list)
for _, d := range list.Deps {
fmt.Printf(" - %s\n", d)
}
}
type GoList struct {
Dir string
ImportPath string
Name string
Target string
Stale bool
Root string
GoFiles []string
Imports []string
Deps []string
}
No, I don't think this is possible without the source in a reliable way. Go binaries on different platforms, compiled with different compilers may or may not have (or may not have in the future) these informations compiled in.
But as Go programs are compiled anyway: Why not record this information while you do have access to the source code?

Getting count of files in directory using Go

How might I get the count of items returned by io/ioutil.ReadDir()?
I have this code, which works, but I have to think isn't the RightWay(tm) in Go.
package main
import "io/ioutil"
import "fmt"
func main() {
files,_ := ioutil.ReadDir("/Users/dgolliher/Dropbox/INBOX")
var count int
for _, f := range files {
fmt.Println(f.Name())
count++
}
fmt.Println(count)
}
Lines 8-12 seem like way too much to go through to just count the results of ReadDir, but I can't find the correct syntax to get the count without iterating over the range. Help?
Found the answer in http://blog.golang.org/go-slices-usage-and-internals
package main
import "io/ioutil"
import "fmt"
func main() {
files,_ := ioutil.ReadDir("/Users/dgolliher/Dropbox/INBOX")
fmt.Println(len(files))
}
ReadDir returns a list of directory entries sorted by filename, so it is not just files. Here is a little function for those wanting to get a count of files only (and not dirs):
func fileCount(path string) (int, error){
i := 0
files, err := ioutil.ReadDir(path)
if err != nil {
return 0, err
}
for _, file := range files {
if !file.IsDir() {
i++
}
}
return i, nil
}
Starting with Go 1.16 (Feb 2021), a better option is os.ReadDir:
package main
import "os"
func main() {
d, e := os.ReadDir(".")
if e != nil {
panic(e)
}
println(len(d))
}
os.ReadDir returns fs.DirEntry instead of fs.FileInfo, which means that
Size and ModTime methods are omitted, making the process more efficient if
you just need an entry count.
https://golang.org/pkg/os#ReadDir
If you wanna get all files (not recursive) you can use len(files). If you need to just get the files without folders and hidden files just loop over them and increase a counter. And please don’t ignore errors
By looking at the code of ioutil.ReadDir
func ReadDir(dirname string) ([]fs.FileInfo, error) {
f, err := os.Open(dirname)
if err != nil {
return nil, err
}
list, err := f.Readdir(-1)
f.Close()
if err != nil {
return nil, err
}
sort.Slice(list, func(i, j int) bool { return list[i].Name() < list[j].Name() })
return list, nil
}
you would realize that it calls os.File.Readdir() then sorts the files.
In case of counting it, you don't need to sort, so you are better off calling os.File.Readdir() directly.
You can simply copy and paste this function then remove the sort.
But I did find out that f.Readdirnames(-1) is much faster than f.Readdir(-1).
Running time is almost half for /usr/bin/ with 2808 items (16ms vs 35ms).
So to summerize it in an example:
package main
import (
"fmt"
"os"
)
func main() {
f, err := os.Open(os.Args[1])
if err != nil {
panic(err)
}
list, err := f.Readdirnames(-1)
f.Close()
if err != nil {
panic(err)
}
fmt.Println(len(list))
}

Resources