Reading a random line from a file in constant time in Go - go

I have the following code to choose 2 random lines from a file containing lines of the form ip:port:
import (
"os"
"fmt"
"math/rand"
"log"
"time"
"unicode/utf8"
//"bufio"
)
func main() {
fmt.Println("num bytes in line is: \n", utf8.RuneCountInString("10.244.1.8:8080"))
file_pods_array, err_file_pods_array := os.Open("pods_array.txt")
if err_file_pods_array != nil {
log.Fatalf("failed opening file: %s", err_file_pods_array)
}
//16 = num of bytes in ip:port pair
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
firstLoc := randgenerator.Intn(10)
secondLoc := randgenerator.Intn(10)
candidate1 := ""
candidate2 := ""
num_bytes_from_start_first := 16 * (firstLoc + 1)
num_bytes_from_start_second := 16 * (secondLoc + 1)
buf_ipport_first := make([]byte, int64(15))
buf_ipport_second := make([]byte, int64(15))
start_first := int64(num_bytes_from_start_first)
start_second := int64(num_bytes_from_start_second)
_, err_first := file_pods_array.ReadAt(buf_ipport_first, start_first)
first_ipport_ep := buf_ipport_first
if err_first == nil {
candidate1 = string(first_ipport_ep)
}
_, err_second := file_pods_array.ReadAt(buf_ipport_second, start_second)
second_ipport_ep := buf_ipport_second
if err_second == nil {
candidate2 = string(second_ipport_ep)
}
fmt.Println("first is: ", candidate1)
fmt.Println("sec is: ", candidate2)
}
This sometimes prints empty or partial lines.
Why does this happen and how can I fix it?
Output example:
num bytes in line is:
15
first is: 10.244.1.17:808
sec is:
10.244.1.11:80
Thank you.

If your lines were of a fixed length you could do this in constant time.
Length of each line is L.
Check the size of the file, S.
Divide S/L to get the number of lines N.
Pick a random number R from 0 to N-1.
Seek to R*L in the file.
Read L bytes.
But you don't have fixed length lines. We can't do constant time, but we can do it in constant memory and O(n) time using the technique from The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
Read a line. Remember its line number M.
Pick a random number from 1 to M.
If it's 1, remember this line.
That is, as you read each line you have a 1/M chance of picking it. Cumulatively this adds up to 1/N for every line.
If we have three lines, the first line has a 1/1 chance of being picked. Then a 1/2 chance of remaining. Then a 2/3 chance of remaining. Total chance: 1 * 1/2 * 2/3 = 1/3.
The second line has a 1/2 chance of being picked and a 2/3 chance of remaining. Total chance: 1/2 * 2/3 = 1/3.
The third line has a 1/3 chance of being picked.
package main
import(
"bufio"
"fmt"
"os"
"log"
"math/rand"
"time"
);
func main() {
file, err := os.Open("pods_array.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
lineNum := 1
var pick string
for scanner.Scan() {
line := scanner.Text()
fmt.Printf("Considering %v at 1/%v.\n", scanner.Text(), lineNum)
// Instead of 1 to N it's 0 to N-1
roll := randgenerator.Intn(lineNum)
fmt.Printf("We rolled a %v.\n", roll)
if roll == 0 {
fmt.Printf("Picking line.\n")
pick = line
}
lineNum += 1
}
fmt.Printf("Picked: %v\n", pick)
}
Because rand.Intn(n) returns [0,n), that is from 0 to n-1, we check for 0, not 1.
Maybe you're thinking "what if I seek to a random point in the file and then read the next full line?" That wouldn't quite be constant time, it would beO(longest-line), but it wouldn't be truly random. Longer lines would get picked more frequently.
Note that since these are (I assume) all IP addresses and ports you could have constant record lengths. Store the IPv4 address as a 32 bits and the port as a 16 bits. 48 bits per line.
However, this will break on IPv6. For forward compatibility store everything as IPv6: 128 bits for the IP and 16 bits for the port. 144 bits per line. Convert IPv4 addresses to IPv6 for storage.
This will allow you to pick random addresses in constant time, and it will save disk space.
Alternatively, store them in SQLite.

found a solution using ioutil and strings:
func main() {
randsource := rand.NewSource(time.Now().UnixNano())
randgenerator := rand.New(randsource)
firstLoc := randgenerator.Intn(10)
secondLoc := randgenerator.Intn(10)
candidate1 := ""
candidate2 := ""
dat, err := ioutil.ReadFile("pods_array.txt")
if err == nil {
ascii := string(dat)
splt := strings.Split(ascii, "\n")
candidate1 = splt[firstLoc]
candidate2 = splt[secondLoc]
}
fmt.Println(candidate1)
fmt.Println(candidate2)
}
Output
10.244.1.3:8080
10.244.1.11:8080

Related

GOLANG bufio Scanner handling 10000 characters

I want to input 10000 length of string from os.stdin
but bufio.NewScanner can only read 4096 of characters
How can I read more than 4096 characters?
Here is my code
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
sc := bufio.NewScanner(os.Stdin)
buf := make([]byte, 2048*2048)
sc.Buffer(buf, 2048*2048)
sc.Scan()
s := sc.Bytes()
fmt.Println(len(s)) // 9998; must be 10000
str := make([]byte, len(s) + 1)
for i := 1; i < len(s) + 1; i++ {
str[i] = s[i-1]
}
}
if I input 10000 characters
panic: runtime error: index out of range [9998] with length 9998
Just a hunch, but I suspect that the missing two characters are the CR+LF pair used by windows as a line terminator.
This works for me. I'm using a text file ("pp1.txt") that contains the text of Jane Austin's Pride and Prejudice broken up into 10,000-character lines — making each line (on my MacOS system) actually 10,001 characters (10,000 characters of text, followed by a LF line delimiter). It reads the entire line, discarding the line terminator and giving me 10,000 characters.
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
file, err := os.Open("./pp1.txt")
if err != nil {
panic(err)
}
one_megabyte := 1024 * 1024
buf := make([]byte, one_megabyte)
sc := bufio.NewScanner(file)
sc.Buffer(buf, one_megabyte)
sc.Scan()
s := sc.Bytes()
fmt.Println(len(s)) // 9998; must be 10000
str := make([]byte, len(s)+1)
for i := 1; i < len(s)+1; i++ {
str[i] = s[i-1]
}
}
I'm a little confused by this, though:
str := make([]byte, len(s)+1)
for i := 1; i < len(s)+1; i++ {
str[i] = s[i-1]
}
If you trying to convert the bytes from the scanner into a string, why wouldn't you just use sc.Text() instead of sc.Bytes().

Infinite loop in Go

I want to have the "for loop" to loop 3 times or until the user inputs something other than an integer. Below is my code, although this runs an infinite amount of times and prints out the first value the user enters.
package main
import "fmt"
import "bufio"
import "strconv"
import "os"
import "sort"
func main(){
emptySlice := make([]int, 3) // Capacity of 3
fmt.Println(cap(emptySlice))
scanner := bufio.NewScanner(os.Stdin) // Creating scanner object
fmt.Printf("Please enter a number: ")
scanner.Scan() // Will always scan in a string regardless if its a number
for i := 0; i < cap(emptySlice); i++ { // Should this not run 3 times?
input, err := strconv.ParseInt(scanner.Text(), 10, 16)
if err != nil{
fmt.Println("Not a valid entry! Ending program")
break
}
emptySlice = append(emptySlice, int(input)) // adds input to the slice
sort.Ints(emptySlice) // sorts the slice
fmt.Println(emptySlice) // Prints the slice
}
}
I think there are a couple of minor bugs, but this version should work correctly:
package main
import "fmt"
import "bufio"
import "strconv"
import "os"
import "sort"
func main() {
emptySlice := make([]int, 3) // Capacity of 3
fmt.Println(cap(emptySlice))
scanner := bufio.NewScanner(os.Stdin) // Creating scanner object
for i := 0; i < cap(emptySlice); i++ { // Should this not run 3 times?
fmt.Printf("Please enter a number: ")
scanner.Scan() // Will always scan in a string regardless if its a number
input, err := strconv.ParseInt(scanner.Text(), 10, 16)
if err != nil {
fmt.Println("Not a valid entry! Ending program")
break
}
// emptySlice = append(emptySlice, int(input)) // adds input to the slice
emptySlice[i] = int(input)
}
sort.Ints(emptySlice) // sorts the slice
fmt.Println(emptySlice) // Prints the slice
}
I've moved the prompt into the loop, and I've replaced the append call with a direct assignment to the previously allocated slice entries. Otherwise calling append will just increase the size of the slice.
I've moved the sort and the print outside of the loop, as these seemed to be incorrectly placed too.
The program in the question starts with cap(emptySlice) == 3. Given that each complete iteration of the loop appends a new value to empty slice, we know that cap(emptySlice) >= 3 + i. It follows that the loop does not terminate.
My homework assignment is slightly different: Read up to three integers and print them in sorted order. Here's how I did it:
func main() {
var result []int
scanner := bufio.NewScanner(os.Stdin)
for i := 0; i < 3; i++ {
fmt.Printf("Please enter a number: ")
if !scanner.Scan() {
// Exit on EOF or other errors.
break
}
n, err := strconv.Atoi(scanner.Text())
if err != nil {
// Exit on bad input.
fmt.Println(err)
break
}
result = append(result, n)
}
sort.Ints(result)
fmt.Println(result)
}

golang copy function understanding

Hey guys I was playing around with some buffers and I just wrote some code to understand how Read() works
package main
import (
"bytes"
"fmt"
"io"
)
func main() {
tmp := make([]byte, 2)
data := []byte("HEL")
dataReader := bytes.NewReader(data)
dest := make([]byte, len(data))
for {
n, err := dataReader.Read(tmp)
fmt.Println(n)
fmt.Println(string(tmp))
dest = append(dest, tmp[:]...)
if err == io.EOF {
break
}
}
fmt.Println(string(dest))
}
output:
2 -> n
HE -> tmp[:]
1 -> n
LE -> tmp[:]
0 -> n
LE -> tmp[:]
HELELE -> dest
So I know the output is wrong and I should actually be doing temp[:n] to write the bytes, but looking at the output I realised that the tmp buffer does not get cleared on every iteration, also when n is 1 should'nt the contents of the buffer be EL, I mean L is getting prepended to tmp not appended. I took a look at Read function but couldn't understand. Can someone explain it to me.
In the first iteration, Read reads two bytes, and your program produces the HE output. In the second iteration, Read reads one byte into tmp. Now tmp[0] contains that byte, but tmp[1] still contains the E read during the first iteration. However, you append all of tmp to dest, getting HELE. The third time around, read reads 0 bytes, but you still append the LE in tmp to dest.
The correct version of your program would be:
for {
n, err := dataReader.Read(tmp)
fmt.Println(n)
fmt.Println(string(tmp))
dest = append(dest, tmp[:n]...)
if err == io.EOF {
break
}
}

How to read inputs recursively in golang

In the following code after one recursion the inputs are not read(from stdin). Output is incorrect if N is greater than 1.
X is read as 0 after one recursive call and hence the array is not read after that.
Program is supposed to print sum of squares of positive numbers in the array. P.S has to done only using recursion
package main
// Imports
import (
"fmt"
"bufio"
"os"
"strings"
"strconv"
)
// Global Variables
var N int = 0;
var X int = 0;
var err error;
var out int = 0;
var T string = "0"; // All set to 0 just in case there is no input, so we don't crash with nil values.
func main() {
// Let's grab our input.
fmt.Print("Enter N: ")
fmt.Scanln(&N)
// Make our own recursion.
loop()
}
func loop() {
if N == 0 {return}
// Grab our array length.
fmt.Scanln(&X)
tNum := make([]string, X)
// Grab our values and put them into an array.
in := bufio.NewReader(os.Stdin)
T, err = in.ReadString('\n')
tNum = strings.Fields(T)
// Parse the numbers, square, and add.
add(tNum)
// Output and reset.
fmt.Print(out)
out = 0;
N--
loop()
}
// Another loop, until X is 0.
func add(tNum []string) {
if X == 0 {return}
// Parse a string to an integer.
i, err := strconv.Atoi(tNum[X-1])
if err != nil {}
// If a number is negative, make it 0, so when we add its' square, it does nothing.
if (i < 0) {
i = 0;
}
// Add to our total!
out = out + i*i
X--
add(tNum)
}
Input:
2
4
2 4 6 8
3
1 3 9
Output:
1200
Expected output:
120
91
bufio.Reader, like the name suggests, use a buffer to store what is in the reader (os.Stdin here), which means, each time you create a bufio.Reader and read it once, there are more than what is read stored into the buffer, and thus the next time you read from the reader (os.Stdin), you do not read from where you left.
You should only have one bufio.Reader for os.Stdin. Make it global (if that is a requirement) or make it an argument. In fact, bufio package has a Scanner type that can splits spaces and new lines so you don't need to call strings.Fields.
I think you should practise doing this yourself, but here is a playground link: https://play.golang.org/p/7zBDYwqWEZ0
Here is an example that illustrates the general principles.
// Print the sum of the squares of positive numbers in the input.
package main
import (
"bufio"
"fmt"
"io"
"os"
"strconv"
"strings"
)
func sumOfSquares(sum int, s *bufio.Scanner, err error) (int, *bufio.Scanner, error) {
if err != nil {
return sum, s, err
}
if !s.Scan() {
err = s.Err()
if err == nil {
err = io.EOF
}
return sum, s, err
}
for _, f := range strings.Fields(s.Text()) {
i, err := strconv.Atoi(f)
if err != nil || i <= 0 {
continue
}
sum += i * i
}
return sumOfSquares(sum, s, nil)
}
func main() {
sum := 0
s := bufio.NewScanner(os.Stdin)
sum, s, err := sumOfSquares(sum, s, nil)
if err != nil && err != io.EOF {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
fmt.Println(sum)
}
Input:
2
4
2 4 6 8
3
1 3 9
Output:
240

Newbie: Properly sizing a []byte size in GO (Chunking)

Go Newbie alert!
Not quite sure how to do this - I want to make a "file chunker" where I grab fixed slices out of a binary file for later upload as a learning project.
I currently have this:
type (
fileChunk []byte
fileChunks []fileChunk
)
func NumChunks(fi os.FileInfo, chunkSize int) int {
chunks := fi.Size() / int64(chunkSize)
if rem := fi.Size() % int64(chunkSize) != 0; rem {
chunks++
}
return int(chunks)
}
// left out err checks for brevity
func chunker(filePtr *string) fileChunks {
f, err := os.Open(*filePtr)
defer f.Close()
// create the initial container to hold the slices
file_chunks := make(fileChunks, 0)
fi, err := f.Stat()
// show me how big the original file is
fmt.Printf("File Name: %s, Size: %d\n", fi.Name(), fi.Size())
// let's partition it into 10000 byte pieces
chunkSize := 10000
chunks := NumChunks(fi, chunkSize)
fmt.Printf("Need %d chunks for this file", chunks)
for i := 0; i < chunks; i++ {
b := make(fileChunk, chunkSize) // allocate a chunk, 10000 bytes
n1, err := f.Read(b)
fmt.Printf("Chunk: %d, %d bytes read\n", i, n1)
// add chunk to "container"
file_chunks = append(file_chunks, b)
}
fmt.Println(len(file_chunks))
return file_chunks
}
This all works mostly fine, but here's what happens if my fize size is 31234 bytes, then I'll end up with three slices full of the first 30000 bytes from the file, the final "chunk" will consist of 1234 "file bytes" followed by "padding" to the 10000 byte chunk size - I'd like the "remainder" filechunk ([]byte) to be sized to 1234, not the full capacity - what would the proper way to do this be? On the receiving side I would then "stitch" together all the pieces to recreate the original file.
You need to re-slice the remainder chunk to be just the length of the last chunk read:
n1, err := f.Read(b)
fmt.Printf("Chunk: %d, %d bytes read\n", i, n1)
b = b[:n1]
This does the re-slicing for all chunks. Normally, n1 will be 10000 for all the non-remainder chunks, but there is no guarantee. The docs say "Read reads up to len(b) bytes from the File." So it's good to pay attention to n1 all the time.

Resources