How to read the first four bytes of a file, using Go? - go

I'm learning Go, and am trying to read the first four bytes of a file. I'm wanting to check if the file contains a specific file header that I'm looking for. My code does not display the bytes that I'm expecting, though. Does anybody know why the following code might not work? It does read in some bytes, but they're not bytes I recognized or expected to see. They're not random or anything, because they're the same every time I run it, so it's probably a pointer to something else or something.
Also, I realize I'm ignoring errors but that's because I went into hack-mode while this wasn't working and removed as much of the cruft as I could, trying to get down to the issue.
package main
import (
"os"
"io"
"fmt"
)
type RoflFile struct {
identifier []byte
}
func main() {
arguments := os.Args[1:]
if len(arguments) != 1 {
fmt.Println("Usage: <path-to-rofl>")
return
}
inputfile := arguments[0]
if _, err := os.Stat(inputfile); os.IsNotExist(err) {
fmt.Printf("Error: the input file could not be found: %s", inputfile)
return
}
rofl := new(RoflFile)
rofl.identifier = make([]byte, 4)
// open the input file so that we can pull out data
f, _ := os.Open(inputfile)
// read in the file identifier
io.ReadAtLeast(f, rofl.identifier, 4)
f.Close()
fmt.Printf("Got: %+v", rofl)
}

When I run your code against an input file beginning with "9876", I get:
Got: &{identifier:[57 56 55 54]}
When run against an input file beginning with "1234", I get:
Got: &{identifier:[49 50 51 52]}
For me, the program works as expected. Either something is going wrong on your system, or you don't realize that you're getting the decimal value of the first four bytes in the file. Were you expecting hex? Or were you expecting to see the bytes interpreted according to some encoding (e.g., ASCII or UTF-8, seeing "9 8 7 6" instead of "57 56 55 54")?
For future reference (or if this didn't answer your question), it's helpful in these situations to include your input file, the output you get on your system, and the output you expected. "They're not bytes I recognized or expected to see" leaves a lot of possibilities on the table.

Related

LimitedReader reads only once

I'm trying to understand Go by studying gopl book. I'm stuck when trying to implement the LimitReader function. I realized that I have two problems so let me separate them.
First issue
The description from official doc is saying that:
A LimitedReader reads from R but limits the amount of data returned to just N bytes. Each call to Read updates N to reflect the new amount remaining. Read returns EOF when N <= 0 or when the underlying R returns EOF.
OK, so my understanding is that I can read from io.Reader type many times but I will be always limited to N bytes. Running this code shows me something different:
package main
import (
"fmt"
"io"
"log"
"strings"
)
func main() {
r := strings.NewReader("some io.Reader stream to be read\n")
lr := io.LimitReader(r, 4)
b := make([]byte, 7)
n, err := lr.Read(b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Read %d bytes: %s\n", n, b)
b = make([]byte, 5)
n, _ = lr.Read(b)
// If removed because EOF
fmt.Printf("Read %d bytes: %s\n", n, b)
}
// Output:
// Read 4 bytes: some
// Read 0 bytes:
// I expect next 4 bytes instead
It seems that this type of object is able to read only once. Not quite sure but maybe this line in io.go source code could be changed to l.N = 0. The main question is why this code is inconsistent with doc description?
Second issue
When I've struggled with the first issue I was trying to display current N value. If I put fmt.Println(lr.N) to the code above it cannot be compiled lr.N undefined (type io.Reader has no field or method N). I realized that I still don't understand Go interfaces concept.
Here is my POV (based on listing above). Using io.LimitReader function I create LimitedReader object (see source code). Due to the fact that this object contains Read method with proper signature its interface type is io.Reader. That's is the reason why io.LimitReader returns io.Reader, right? OK, so everything works together.
The question is: why lr.N cannot be accessed? As I correctly understood the book, interface type only requires that data type contains some method(s). Nothing more.
LimitedReader limits the total size of data that can be read, not the amount of data that can be read at each read call. That is, if you set the limit to 4, you can perform 4 reads of 1 byte, or 1 read of 4 bytes, and after that, all reads will fail.
For your second question: lr is an io.Reader, so you cannot read lr.N. However, you can access the underlying concrete type using a type assertion: lr.(*io.LimitedReader).N should work.

Writing to a File in Golang

I'm rather new to Golang and not sure yet, how to use certain language constructs. Currently I have following code (with test debug outputs), which does not provide expected result:
json, _ := json.Marshal(struct)
fmt.Println(json)
f,_ := os.Create(fmt.Sprintf("/tmp/%s.json", "asd"))
i,_ := f.Write(json)
fmt.Println(i)
b, err := ioutil.ReadAll(f)
fmt.Print(b)
I expect the following behaviour:
translating the struct to a byte array
creating a new file
append the byte array to the file
However, the file is always empty when I run the code in my environment (AWS Lambda), as well as using it in the Golang Playground.
The output of above code looks like this:
[123 34 ... <hug array of bytes>]
1384
[]
which leads me to believe I'm using f.Write() not correctly, although I followed the package documentation. All other outputs indicate expected behavior, so what is my mistake? I'm somewhat restricted to using the File interface, otherwise I'd have gone with ioutil.WriteFile(). My assumption is a misunderstanding of pointer/values at some point, but the compiler prevented a usage of &f.
After f.Write(), your current position in the file is at the end of it, so ioutil.ReadAll() will read from that position and return nothing.
You need to call f.Sync() to make sure that the data is persistently saved to the disk, and then f.Seek(0, 0) to rewind to the beginning of the file first.
Update: from comments, it seems that you only need to serialize the JSON and pass it forward as io.Reader, for that you don't really need a file, thanks to bytes.Buffer:
data, _ := json.Marshal(s)
buf := bytes.NewBuffer(data)
b, _ := ioutil.ReadAll(buf)
fmt.Print(string(b))

Write fixed length padded lines to file Go

For printing, justified and fixed length, seems like what everyone asks about and there are many examples that I have found, like...
package main
import "fmt"
func main() {
values := []string{"Mustang", "10", "car"}
for i := range(values) {
fmt.Printf("%10v...\n", values[i])
}
for i := range(values) {
fmt.Printf("|%-10v|\n", values[i])
}
}
Situation
But what if I need to WRITE to a file with fixed length bytes?
For example: what if I have requirement that states, write this line to a file that must be 32 bytes, left justified and padded to the right with 0's
Question
So, how do you accomplish this when writing to a file?
There are analogous functions to fmt.PrintXX() functions, ones that start with an F, take the form of fmt.FprintXX(). These variants write the result to an io.Writer which may be an os.File as well.
So if you have the fmt.Printf() statements which you want to direct to a file, just change them to call fmt.Fprintf() instead, passing the file as the first argument:
var f *os.File = ... // Initialize / open file
fmt.Fprintf(f, "%10v...\n", values[i])
If you look into the implementation of fmt.Printf():
func Printf(format string, a ...interface{}) (n int, err error) {
return Fprintf(os.Stdout, format, a...)
}
It does exactly this: it calls fmt.Fprintf(), passing os.Stdout as the output to write to.
For how to open a file, see How to read/write from/to file using Go?
See related question: Format a Go string without printing?

Read random lines off a text file in go

I am using encoding/csv to read and parse a very large .csv file.
I need to randomly select lines and pass them through some test.
My current solution is to read the whole file like
reader := csv.NewReader(file)
lines, err := reader.ReadAll()
then randomly select lines from lines
The obvious problem is it takes a long time to read the whole thing and I need lots of memory.
Question:
my question is, encoding/csv gives me an io/reader is there a way to use that to read random lines instead of loading the whole thing at once?
This is more of a curiosity to learn more about io/reader than a practical question, since it is very likely that in the end it is more efficient to read it once and access it in memory, that to keep seeking random lines off on the disk.
Apokalyptik's answer is the closest to what you want. Readers are streamers so you can't just hop to a random place (per-se).
Naively choosing a probability against which you keep any given line as you read it in can lead to problems: you may get to the end of the file without holding enough lines of input, or you may be too quick to hold lines and not get a good sample. Either is much more likely than guessing correctly, since you don't know beforehand how many lines are in the file (unless you first iterate it once to count them).
What you really need is reservoir sampling.
Basically, read the file line-by-line. Each line, you choose whether to hold it like so: The first line you read, you have a 1/1 chance of holding it. After you read the second line, you have 1/2 chance of replacing what you're holding with this one. After the third line, you have a 1/2 * 2/3 = 1/3 chance of holding onto that one instead. Thus you have a 1/N chance of holding onto any given line, where N is the number of lines you've read in. Here's a more detailed look at the algorithm (don't try to implement it just from what I've told you in this paragraph alone).
The simplest solution would be to make a decision as you read each line whether to test it or throw it away... make your decision random so that you don't have the requirement of keeping the entire thing in RAM... then pass through the file once running your tests... you can also do this same style with non-random distribution tests (e.g. after X bytes, or x lines, etc)
My suggestion would be to randomize the input file in advance, e.g. using shuf
http://en.wikipedia.org/wiki/Shuf
Then you can simply read the first n lines as needed.
This doesn't help you learning more about io/readers, but might solve your problem nevertheless.
I had a similar need: to randomly read (specific) lines from a massive text file. I wrote a package that I call ramcsv to do this.
It first reads through the entire file once and marks the byte offset of each line (it stores this information in memory, but does not store the full line).
When you request a line number, it will transparently seek to the correct offset and give you the csv-parsed line.
(Note that the csv.Reader parameter that is passed as the second argument to ramcsv.New is used only to copy the settings into a new reader.) This could no doubt be made more efficient, but it was sufficient for my needs and spared me from reading a ~20GB text file into memory.
encoding/csv does not give you an io.Reader it gives you a csv.Reader (note the lack of package qualification on the definition of csv.NewReader [1] indicating that the Reader it returns belongs to the same package.
A csv.Reader implements only the methods you see there, so it looks like there is no way to do what you want short of writing your own CSV parser.
[1] http://golang.org/pkg/encoding/csv/#NewReader
Per this SO answer, there's a relatively memory efficient way to read a single random line from a large file.
package main
import (
"bufio"
"bytes"
"fmt"
"io"
"math/rand"
"strconv"
"time"
)
var words []byte
func main() {
prepareWordsVar()
var r = rand.New(rand.NewSource(time.Now().Unix()))
var line string
for len(line) == 0 {
line = getRandomLine(r)
}
fmt.Println(line)
}
func prepareWordsVar() {
base := []string{"some", "really", "file", "with", "many", "manyy", "manyyy", "manyyyy", "manyyyyy", "lines."}
words = make([]byte, 200*len(base))
for i := 0; i < 200; i++ {
for _, s := range base {
words = append(words, []byte(s+strconv.Itoa(i)+"\n")...)
}
}
}
func getRandomLine(r *rand.Rand) string {
wordsLen := int64(len(words))
offset := r.Int63n(wordsLen)
rd := bytes.NewReader(words)
scanner := bufio.NewScanner(rd)
_, _ = rd.Seek(offset, io.SeekStart)
// discard - bound to be partial line
if !scanner.Scan() {
return ""
}
scanner.Scan()
if err := scanner.Err(); err != nil {
fmt.Printf("err: %s\n", err)
return ""
}
// now we have a random line.
return scanner.Text()
}
Go Playground
Couple of caveats:
You should use crypto/rand if you need it to be cryptographically secure.
Note the bufio.Scanner's default MaxScanTokenSize, and adjust code accordingly.
As per original SO answer, this does introduce bias based on the length of the line.

Removing NUL characters from bytes

To teach myself Go I'm building a simple server that takes some input, does some processing, and sends output back to the client (that includes the original input).
The input can vary in length from around 5 - 13 characters + endlines and whatever other guff the client sends.
The input is read into a byte array and then converted to a string for some processing. Another string is appended to this string and the whole thing is converted back into a byte array to get sent back to the client.
The problem is that the input is padded with a bunch of NUL characters, and I'm not sure how to get rid of them.
So I could loop through the array and when I come to a nul character, note the length (n), create a new byte array of that length, and copy the first n characters over to the new byte array and use that. Is that the best way, or is there something to make this easier for me?
Some stripped down code:
data := make([]byte, 16)
c.Read(data)
s := strings.Replace(string(data[:]), "an", "", -1)
s = strings.Replace(s, "\r", "", -1)
s += "some other string"
response := []byte(s)
c.Write(response)
c.close()
Also if I'm doing anything else obviously stupid here it would be nice to know.
In package "bytes", func Trim(s []byte, cutset string) []byte is your friend:
Trim returns a subslice of s by slicing off all leading and trailing UTF-8-encoded Unicode code points contained in cutset.
// Remove any NULL characters from 'b'
b = bytes.Trim(b, "\x00")
Your approach sounds basically right. Some remarks:
When you have found the index of the first nul byte in data, you don't need to copy, just truncate the slice: data[:idx].
bytes.Index should be able to find that index for you.
There is also bytes.Replace so you don't need to convert to string.
The io.Reader documentation says:
Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered.
If the call to Read in the application does not read 16 bytes, then data will have trailing zero bytes. Use the number of bytes read to trim the zero bytes from the buffer.
data := make([]byte, 16)
n, err := c.Read(data)
if err != nil {
// handle error
}
data = data[:n]
There's another issue. There's no guarantee that Read slurps up all of the "message" sent by the peer. The application may need to call Read more than once to get the complete message.
You mention endlines in the question. If the message from the client is terminated but a newline, then use bufio.Scanner to read lines from the connection:
s := bufio.NewScanner(c)
if s.Scan() {
data = s.Bytes() // data is next line, not including end lines, etc.
}
if s.Err() != nil {
// handle error
}
You could utilize the return value of Read:
package main
import "strings"
func main() {
r, b := strings.NewReader("north east south west"), make([]byte, 16)
n, e := r.Read(b)
if e != nil {
panic(e)
}
b = b[:n]
println(string(b) == "north east south")
}
https://golang.org/pkg/io#Reader

Resources