Go converting data to parquet - go

I am getting data from dynamodb and converting data to parquet files.
For the conversion to parquet files, I am using https://github.com/xitongsys/parquet-go library. But for some reason I am getting runtime error: invalid memory address or nil pointer dereference
I know this error means that a pointer is nil and I am trying to use it or dereference it.
Not sure why pw.WriteStop() is giving memory invalid error
code:
fw, err := local.NewLocalFileWriter(parquetFile)
if err != nil {
log.Errorf("local.NewLocalFileWriter() error - %s", err)
return err
}
pw, err := writer.NewParquetWriter(fw, new(Struct), int64(len(tenantList)))
if err != nil {
log.Errorf("writer.NewParquetWriter() error - %s", err)
return err
}
pw.RowGroupSize = 128 * 1024 * 1024 //128M
pw.CompressionType = parquet.CompressionCodec_SNAPPY
for _, data := range tenantList {
if err = pw.Write(data); err != nil {
return err
}
}
// this line gives memory invalid error
if err = pw.WriteStop(); err != nil {
return err
}
fw.Close()
The only error that I am getting is "runtime error: invalid memory address or nil pointer dereference". By printing statements in the "if" statements I found that invalid memory address is coming from "pw.WriteStop()"
Attaching a screenshot of the error; just to show that I am really getting only "runtime error: invalid memory address or nil pointer dereference"

I faced this too. Figured out that it is because I haven't specified 'type=BOOLEAN' for bool fields in the struct tags, e.g. parquet:"name=MyFieldName, type=BOOLEAN". For me, supplying BOOLEAN for bool, INT64 for int64 and UTF8 for string Go types worked.
Ideally the library should pick up data types automatically, but probably it's a feature not implemented, or it involves too much overhead and that's why intentionally not implemented.

Related

G110: Potential DoS vulnerability via decompression bomb (gosec)

I'm getting the following golintci message:
testdrive/utils.go:92:16: G110: Potential DoS vulnerability via decompression bomb (gosec)
if _, err := io.Copy(targetFile, fileReader); err != nil {
^
Read the corresponding CWE and I'm not clear on how this is expected to be corrected.
Please offer pointers.
func unzip(archive, target string) error {
reader, err := zip.OpenReader(archive)
if err != nil {
return err
}
for _, file := range reader.File {
path := filepath.Join(target, file.Name) // nolint: gosec
if file.FileInfo().IsDir() {
if err := os.MkdirAll(path, file.Mode()); err != nil {
return err
}
continue
}
fileReader, err := file.Open()
if err != nil {
return err
}
defer fileReader.Close() // nolint: errcheck
targetFile, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, file.Mode())
if err != nil {
return err
}
defer targetFile.Close() // nolint: errcheck
if _, err := io.Copy(targetFile, fileReader); err != nil {
return err
}
}
return nil
}
The warning you get comes from a rule provided in gosec.
The rule specifically detects usage of io.Copy on file decompression.
This is a potential issue because io.Copy:
copies from src to dst until either EOF is reached on src or an error occurs.
So, a malicious payload might cause your program to decompress an unexpectedly big amount of data and go out of memory, causing denial of service as mentioned in the warning message.
In particular, gosec will check (source) the AST of your program and warn you about usage of io.Copy or io.CopyBuffer together with any one of the following:
"compress/gzip".NewReader
"compress/zlib".NewReader or NewReaderDict
"compress/bzip2".NewReader
"compress/flate".NewReader or NewReaderDict
"compress/lzw".NewReader
"archive/tar".NewReader
"archive/zip".NewReader
"*archive/zip".File.Open
Using io.CopyN removes the warning because (quote) it "copies n bytes (or until an error) from src to dst", thus giving you (the program writer) control of how many bytes to copy. So you could pass an arbitrarily large n that you set based on the available resources of your application, or copy in chunks.
Based on various pointers provided, replaced
if _, err := io.Copy(targetFile, fileReader); err != nil {
return err
}
with
for {
_, err := io.CopyN(targetFile, fileReader, 1024)
if err != nil {
if err == io.EOF {
break
}
return err
}
}
PS while this helps memory footprint, this wouldn't help a DDOS attack copying very long and/or infinite stream ...
Assuming that you're working on compressed data, you need to use io.CopyN.
You can try a workaround with --nocompress flag. But this will cause the data to be included uncompressed.
See the following PR and related issue : https://github.com/go-bindata/go-bindata/pull/50

How to determine the exact byte length of connection in golang?

I have the following code:
var buf []byte
read_len, err := conn.Read(buf)
if err != nil {
fmt.Println("Error reading:", err.Error())
}
buffer := make([]byte, read_len)
_, err = conn.Read(buffer)
if err != nil {
fmt.Println("Error reading:", err.Error())
}
The intention was to determine read_len with the first buf, then create a second buffer which is the exact length of an incoming json request. This just results in an error
unexpected end of JSON input
When I try to unmarshal
var request Device_Type_Request_Struct
err = json.Unmarshal(buffer, &request)
I'm assuming that this error occurs because the conn.Read(buffer) is returning nothing because another buffer has already read it (not sure though). How should I go about determining the length of json request while also being able to read it into a buffer (of the exact same length)?
Read returns the number of bytes read to the buffer. Because the length of the buffer passed to the first call to conn.Read is zero, the first call to conn.Read always returns zero.
There is no way to determine how much data a peer has sent without reading the data.
The easy solution to this problem is to use the JSON decoder:
d := json.NewDecoder(conn)
var request Device_Type_Request_Struct
if err := d.Decode(&request); err != nil {
// handle error
}
The decoder reads and decodes JSON values from a stream.

Placement of defer after error check

In Go, one often sees the following idiom:
func CopyFile(dstName, srcName string) (written int64, err error) {
src, err := os.Open(srcName)
if err != nil {
return
}
defer src.Close()
dst, err := os.Create(dstName)
if err != nil {
return
}
defer dst.Close()
return io.Copy(dst, src)
}
Is there any reason why the defer statement comes after the error check? My guess is that this is done in order to avoid dereferencing nil values in case err was not nil.
If the file Open or Create fails then you don't have a valid *File to close. The problem wouldn't be a nil value for *File as Close() will check for nil and simply return immediately in that case - the problem might be if the *File value is non-nil but invalid. Since documentation for os.Open() doesn't explicitly state that a failed call to Open() returns a nil value for *File you can't rely that all underlying implementations of it do in fact return a nil value or will always return a nil value..

How to pass (type *common.MapStr) to type []byte?

Sorry if the question is too newbie, as i just started to learn go yesterday.
I try to convert publishEvent into bytes, and compiler shown error like following:
cannot convert publishEvent (type *common.MapStr) to type []byte
Can anyone show me the way ?
Thank You.
var parsed map[string]interface{}
bytes := []byte(publishEvent) --->Error occur here
err := json.Unmarshal(bytes, &parsed)
if err != nil{
fmt.Println("error: ", err)
}
I assume the struct you are working with is common.MapStr from https://github.com/elastic/libbeat
common.MapStr is already a map[string]interface{} so I'm not sure why you are turing it into JSON, and then parsing it back into the same kind of structure, but if thats what you really want to do, replacing the error line with:
bytes, err := json.Marshal(publishEvent)
should work. You will get an error on the next line about redeclaring err so change it to:
err = json.Unmarshal(bytes, &parsed)
Resulting in the following code (also added another error check):
var parsed map[string]interface{}
bytes, err := json.Marshal(publishEvent)
if err != nil{
fmt.Println("error: ", err)
// you'll want to exit or return here since we can't parse `bytes`
}
err = json.Unmarshal(bytes, &parsed)
if err != nil{
fmt.Println("error: ", err)
}

'invalid memory address' error with go-mssql

I'm having an issue that I can't seem to resolve, probably due to my inexperience with GO. I have the following code working on one server, but not on another. Here is the code:
// Build out the connection string to the database, and then open the connection to the database.
connString := fmt.Sprintf("server=%s;user id=%s;password=%s;port=%d", *server, *user, *password, *port)
if *debug { fmt.Printf(" connString:%s\n", connString) }
db, err = sql.Open("mssql", connString)
if err != nil { log.Fatal("Open connection failed:", err.Error()) }
err = db.Ping()
if err != nil {
fmt.Println("Cannot connect: ", err.Error())
return
}
rows, _ := db.Query( "SELECT Zip FROM Zip_Rural WHERE Zip = ?", ZipCode[0:5] )
defer rows.Close()
if !rows.Next() {
acreageRequirement = .5
}
On the line that reads if !rows.Next() I get the following error:
panic: runtime error: invalid memory address or nil pointer dereference
panic: runtime error: invalid memory address or nil pointer dereference [signal 0xc0000005 code=0x0 addr=0x20 pc=0x477918]
This same code works just fine on another server, running GO version 1.4.2 on both machines. I have a feeling I just have some bad syntax somewhere in here, but have no idea what the real problem is. A call to db.Exec within the same file works just fine, which leads me to believe that my database connection is perfectly fine, but for some reason db.Query is not executing correctly.
There is probably an error with db.Query. Check your error and if it is not nil, assume that rows is nil. i.e. calling rows.Next() will segfault.
If you display the error, you will probably find out what the issue is.

Resources