How to read specific number of lines of a file? - go

I need to read specific lines of a file at one time (for example, 10 lines one time), and read from the next line (11) of last read position next time I read the file, and continue to read 10 lines.

There's no available library function to read specific number of lines. Although
you can implement something like this to do the same. Working example here
func readLines(n int, r io.Reader) ([]string, error) {
rd := bufio.NewReader(r)
var (
lines = make([]string, 0, n)
bs []byte
done bool
)
for {
if done || len(lines) == n {
break
}
bss, isPrefix, err := rd.ReadLine()
if err != nil {
if err != io.EOF {
return nil, err
}
done = true
}
bs = append(bs, bss...)
if isPrefix {
continue
}
lines = append(lines, string(bs))
bs = make([]byte, 0)
}
return lines, nil
}

This is the function I wrote, and seems it works
func ReadLine(inputFile io.ReadSeeker, startPos int64, lineNum int) (slice []string, lastPos int64, err error) {
r := bufio.NewReader(inputFile)
var line string
inputFile.Seek(startPos, os.SEEK_SET)
lastPos = startPos
for i := 0; i < lineNum; i++ {
line, err = r.ReadString('\n')
if err != nil {
break
}
lastPos += int64(len(line))
slice = append(slice, line)
}
return
}

Related

Read line of numbers in Go

I have the following input, where on the first line is N - count of numbers, and on the second line N numbers, separated by space:
5
2 1 0 3 4
In Python I can read numbers without specifying its count (N):
_ = input()
numbers = list(map(int, input().split()))
How can I do the same in Go? Or I have to know exactly how many numbers are?
You can iterate through a file line-by-line using bufio, and the strings module can split a string into a slice. So that gets us something like:
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
readFile, err := os.Open("data.txt")
defer readFile.Close()
if err != nil {
fmt.Println(err)
}
fileScanner := bufio.NewScanner(readFile)
fileScanner.Split(bufio.ScanLines)
for fileScanner.Scan() {
// get next line from the file
line := fileScanner.Text()
// split it into a list of space-delimited tokens
chars := strings.Split(line, " ")
// create an slice of ints the same length as
// the chars slice
ints := make([]int, len(chars))
for i, s := range chars {
// convert string to int
val, err := strconv.Atoi(s)
if err != nil {
panic(err)
}
// update the corresponding position in the
// ints slice
ints[i] = val
}
fmt.Printf("%v\n", ints)
}
}
Which given your sample data will output:
[5]
[2 1 0 3 4]
Since you know the delimiter and you only have 2 lines, this is also a more compact solution:
package main
import (
"fmt"
"os"
"regexp"
"strconv"
"strings"
)
func main() {
parts, err := readRaw("data.txt")
if err != nil {
panic(err)
}
n, nums, err := toNumbers(parts)
if err != nil {
panic(err)
}
fmt.Printf("%d: %v\n", n, nums)
}
// readRaw reads the file in input and returns the numbers inside as a slice of strings
func readRaw(fn string) ([]string, error) {
b, err := os.ReadFile(fn)
if err != nil {
return nil, err
}
return regexp.MustCompile(`\s`).Split(strings.TrimSpace(string(b)), -1), nil
}
// toNumbers plays with the input string to return the data as a slice of int
func toNumbers(parts []string) (int, []int, error) {
n, err := strconv.Atoi(parts[0])
if err != nil {
return 0, nil, err
}
nums := make([]int, 0)
for _, p := range parts[1:] {
num, err := strconv.Atoi(p)
if err != nil {
return n, nums, err
}
nums = append(nums, num)
}
return n, nums, nil
}
The output out be:
5: [2 1 0 3 4]

How to read large file by blocks with n length

I want to read and split large text file (near 3GB) to blocks with n symbols length. I was trying to read file and split using runes, but it takes a lot of memory.
func SplitSubN(s string, n int) []string {
sub := ""
subs := []string{}
runes := bytes.Runes([]byte(s))
l := len(runes)
for i, r := range runes {
sub = sub + string(r)
if (i+1)%n == 0 {
subs = append(subs, sub)
sub = ""
} else if (i + 1) == l {
subs = append(subs, sub)
}
}
return subs
}
I suppose it can be done in smarter way, like a phased reading of blocks of a certain length from file, but I don't know how to do it correctly.
Scan for rune start bytes and split based on that. This eliminates all allocations within the function except for the allocation of the result slice.
func SplitSubN(s string, n int) []string {
if len(s) == 0 {
return nil
}
m := 0
i := 0
j := 1
var result []string
for ; j < len(s); j++ {
if utf8.RuneStart(s[j]) {
if (m+1)%n == 0 {
result = append(result, s[i:j])
i = j
}
m++
}
}
if j > i {
result = append(result, s[i:j])
}
return result
}
The API specified in the question requires that the application allocate memory when converting the []byte read from the file to a string. This allocation can be avoided by changing the function to work on bytes:
func SplitSubN(s []byte, n int) [][]byte {
if len(s) == 0 {
return nil
}
m := 0
i := 0
j := 1
var result [][]byte
for ; j < len(s); j++ {
if utf8.RuneStart(s[j]) {
if (m+1)%n == 0 {
result = append(result, s[i:j])
i = j
}
m++
}
}
if j > i {
result = append(result, s[i:j])
}
return result
}
Both of these functions require that the application slurp the entire file into memory. I assume that's OK because the function in the question does as well. If you only need to process one chunk at a time, then the above code be be adapted to scan as the file is read incrementally.
Actually, the most interesting part is not parsing chunk itself but rather handling characters overlapping.
For example, if you read from a file say in chunks of N bytes but last multi-byte char is read partially (the remainder will be read in the next iteration).
Here is a solution that reads a text file by given chunks and handles characters overlapping in async manner:
package main
import (
"fmt"
"io"
"log"
"os"
"unicode/utf8"
)
func main() {
data, err := ReadInChunks("testfile", 1024*16)
competed := false
for ; !competed; {
select {
case next := <-data:
if next == nil {
competed = true
break
}
fmt.Printf(string(next))
case e := <-err:
if e != nil {
log.Fatalf("error: %s", e)
}
}
}
}
func ReadInChunks(path string, readChunkSize int) (data chan []rune, err chan error) {
var readChanel = make(chan []rune)
var errorChanel = make(chan error)
onDone := func() {
close(readChanel)
close(errorChanel)
}
onError := func(err error) {
errorChanel <- err
onDone()
}
go func() {
if _, err := os.Stat(path); os.IsNotExist(err) {
onError(fmt.Errorf("file [%s] does not exist", path))
return
}
f, err := os.Open(path)
if err != nil {
onError(err)
return
}
defer f.Close()
readBuf := make([]byte, readChunkSize)
reminder := 0
for {
read, err := f.Read(readBuf[reminder:])
if err == io.EOF {
onDone()
return
}
if err != nil {
onError(err)
}
runes, parsed := runes(readBuf[:reminder+read])
if reminder = readChunkSize - parsed; reminder > 0 {
copy(readBuf[:reminder], readBuf[readChunkSize-reminder:])
}
if len(runes) > 0 {
readChanel <- runes
}
}
}()
return readChanel, errorChanel
}
func runes(nextBuffer []byte) ([]rune, int) {
t := make([]rune, utf8.RuneCount(nextBuffer))
i := 0
var size = len(nextBuffer)
var read = 0
for len(nextBuffer) > 0 {
r, l := utf8.DecodeRune(nextBuffer)
runeLen := utf8.RuneLen(r)
if read+runeLen > size {
break
}
read += runeLen
t[i] = r
i++
nextBuffer = nextBuffer[l:]
}
return t[:i], read
}
It can be greatly simplified if the file is ACSII.
Alternativly, if you need to support unicode you can play aroud UTF-32 (which has fixed length) or UTF-16 (if you don't need to handle >2-bytes, you can treat it as fixed-size as well)

reader.ReadLine() doesn't advance after a scanner.Scan() call

The code below reads its values from this file:
2 3\n
1.0 2.0 3.0\n
-1.0 -2.0 -3.0\n
And should print:
[ {1 2 3}, {-1 -2 -3} ]
But instead I get this:
[{2 [31 2 3]} {0 []}] strconv.ParseFloat: parsing "3.0-1.0": invalid syntax
It seems that the reader.ReadLine() stays at the same location. Is there a simpler way to scan lines, then values inside each line?
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
"strings"
)
type Example struct {
classLabel int
attributes []float64
}
func NewExample(classLabel int, attributes []float64) *Example {
return &Example{classLabel, attributes}
}
func readFile(path string) ([]Example, error) {
var (
result []Example
err error
file *os.File
part []byte
size int
attributeNum int
)
if file, err = os.Open(path); err != nil {
return result, err
}
defer file.Close()
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 0))
if part, _, err = reader.ReadLine(); err != nil {
return result, err
}
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
r := strings.NewReader(newLine)
scanner := bufio.NewScanner(r)
scanner.Split(bufio.ScanWords)
if scanner.Scan() {
size, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("size=" + strconv.Itoa(size))
if scanner.Scan() {
attributeNum, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("attributeNum=" + strconv.Itoa(attributeNum))
result = make([]Example, size)
var classLabel int
var attributes []float64
for k := 0; k < size; k++ {
if part, _, err = reader.ReadLine(); err != nil {
return result, err
}
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
r := strings.NewReader(newLine)
scanner := bufio.NewScanner(r)
scanner.Split(bufio.ScanWords)
if scanner.Scan() {
classLabel, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("classLabel=" + strconv.Itoa(classLabel))
for i := 0; i < attributeNum; i++ {
var attribute float64
if scanner.Scan() {
attribute, err = strconv.ParseFloat(scanner.Text(), 64)
if err != nil {
return result, err
}
attributes = append(attributes, attribute)
fmt.Println("attribute=" + strconv.FormatFloat(attribute, 'f', -1, 64))
}
}
result[k] = *NewExample(classLabel, attributes)
}
return result, scanner.Err()
}
func main() {
example, err := readFile("test.txt")
fmt.Println(example, err)
}
When you do this inside the for loop:
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
The next line gets appended to buffer.
That is,
before the loop begins, buffer contains 2 3,
and then after reading 1.0 2.0 3.0,
it gets appended to buffer,
so the content becomes 2 31.0 2.0 3.0,
which you store in newLine.
That's where things start to go sideways.
You probably want to clear the buffer before reading each new line:
buffer.Reset()
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
But then you will have further problems still, here:
if scanner.Scan() {
classLabel, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
Since the line contains 1.0 2.0 3.0, the strconf.Atoi is going to fail.
I don't understand the purpose of this snippet,
perhaps you can delete it (or comment out).
With the above fixed, you will still have one more problem, on this line:
attributes = append(attributes, attribute)
Since attributes is never reset, it keeps growing.
That is, after the first line, it will contain 1 2 3,
and after the second line it will contain 1 2 3 -1 -2 -3.
You could correct that by moving the declaration of attributes without the outer loop, like this:
var attributes []float64
for i := 0; i < attributeNum; i++ {
var attribute float64
if scanner.Scan() {
attribute, err = strconv.ParseFloat(scanner.Text(), 64)
if err != nil {
return result, err
}
attributes = append(attributes, attribute)
fmt.Println("attribute=" + strconv.FormatFloat(attribute, 'f', -1, 64))
}
}

golang scan a line of numbers from sdin

I'm trying to read input from stdin like
3 2 1<ENTER>
and save it in a list of ints. At the moment my code looks like this:
nums = make([]int, 0)
var i int
for {
_, err := fmt.Scan(&i)
if err != nil {
if err==io.EOF { break }
log.Fatal(err)
}
nums = append(nums, i)
}
at the moment the program never leaves the for-loop. I can't find an easy way to check for a newline character in the documentation. how would i do this?
Edit:
Since I know that there will almost certainly be four numbers, I tried the following:
var i0,i1,i2,i3 int
fmt.Scanf("%d %d %d %d\n", &i0, &i1, &i2, &i3)
but this only scanned the first number and then exited the program. I'm not sure if that's because of the z-shell I'm using.
Edit:
To clarify, the program will pause and ask for the user to input a list of n numbers separated by spaces and terminated with a newline. these numbers should be stored in an array.
Ok, I decided to bring out the large bufio hammer and solve it like this:
in := bufio.NewReader(os.Stdin)
line, err := in.ReadString('\n')
if err != nil {
log.Fatal(err)
}
strs := strings.Split(line[0:len(line)-1], " ")
nums := make([]int, len(strs))
for i, str := range strs {
if nums[i], err = strconv.Atoi(str); err != nil {
log.Fatal(err)
}
}
It does seem like an awful lot of code, but it works.
It seems that you want https://golang.org/pkg/fmt/#Fscanln
Something like
ok := func(err error) { if err != nil { panic(err) } }
for {
var i, j, k int
_, err := fmt.Fscanln(io.Stdin, &i, &j, &k)
ok(err)
fmt.Println(i, j, k)
}
I will suggest to use "bufio" package with the "scan()" method.
Following is the code where I'm reading two lines from "stdin" and storing the lines into an array.
Hope this helps you.
package main
import (
"fmt"
"bufio"
"os"
"strconv"
"strings"
)
func ReadInput() []string{
var lines []string
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
lines = append(lines, scanner.Text())
//count, _ := strconv.Atoi(lines[0])
if len(lines) == 2 { break }
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return lines
}
func main(){
lines := ReadInput()
count ,_ := strconv.Atoi(lines[0])
num := strings.Fields(lines[1])
if count != len(num) { os.Exit(0) }
// Do whatever you want here
}
Two lines will be accepted. First line will have a count. Second line will have all the numbers. You can modify the same code as per your requirement.
Example:
3
1 5 10

limitation on bytes.Buffer?

I am trying to gzip a slice of bytes using the package "compress/gzip". I am writing to a bytes.Buffer and I am writing 45976 bytes, when I am trying to uncompress the content using a gzip.reader and then reader function - I find that the not all of the content is recovered. Is there some limitations to bytes.buffer? and is it a way to by pass or alter this? here is my code (edit):
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j, err := r.Read(b2)
if(err!=nil){
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
output from testing (with a chosen string as long_string) would give
Wrote: 45976, Read 32768
Continue reading to get the remaining 13208 bytes. The first read returns 32768 bytes, the second read returns 13208 bytes, and the third read returns zero bytes and EOF.
For example,
package main
import (
"bytes"
"compress/gzip"
"fmt"
"io"
"log"
)
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i, err := w.Write([]byte(long_string))
if err != nil {
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j := 0
for {
n, err := r.Read(b2[:cap(b2)])
b2 = b2[:n]
j += n
if err != nil {
if err != io.EOF {
log.Fatal(err)
}
if n == 0 {
break
}
}
fmt.Println(len(b2))
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
var long_string string
func main() {
long_string = string(make([]byte, 45976))
compress_and_uncompress()
}
Output:
32768
13208
Wrote: 45976 Read: 45976
Use ioutil.ReadAll. The contract for io.Reader says it doesn't have to return all the data and there is a good reason for it not to to do with sizes of internal buffers. ioutil.ReadAll works like io.Reader but will read until EOF.
Eg (untested)
import "io/ioutil"
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if err!=nil {
log.Fatal(err)
}
w.Close()
r, _ := gzip.NewReader(&buf)
b2, err := ioutil.ReadAll(r)
if err!=nil {
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", len(b2))
}
If the read from gzip.NewReader does not return the whole expected slice. You can just keep re-reading until you have recieved all the data in the buffer.
Regarding you problem where if you re-read the subsequent reads did not append to the end of the slice, but instead at the beginning; the answer can be found in the implementation of gzip's Read function, which includes
208 z.digest.Write(p[0:n])
This will result in an "append" at the beginning of the string.
This can be solves in this manner
func compress_and_uncompress(long_string string) {
// Writer
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
// Reader
var j, k int
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
for j=0 ; ; j+=k {
k, err = r.Read(b2[j:]) // Add the offset here
if(err!=nil){
if(err != io.EOF){
log.Fatal(err)
} else{
break
}
}
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
The result will be:
Wrote: 45976 Read: 45976
Also after testing with a string of 45976 characters i can confirm that the output is in exactly the same manner as the input, where the second part is correctly appended after the first part.
Source for gzip.Read: http://golang.org/src/pkg/compress/gzip/gunzip.go?s=4633:4683#L189

Resources