Find byte offset of a pattern in Golang - go

We can find the byte offset of a pattern from file by
"grep -ob pattern filename";
However, grep is not utf8 safe.
How do I find byte offset of a pattern in Go? The file is process log, which can be in TB.
This is what I want to get in Go:
$ cat fname
hello world
findme
hello 世界
findme again
...
$ grep -ob findme fname
12:findme
32:findme

FindAllStringIndex(s string, n int) returns byte start/finish indexes (i.e., slices) of all successive matches of the expression:
package main
import "fmt"
import "io/ioutil"
import "regexp"
func main() {
fname := "C:\\Users\\UserName\\go\\src\\so56798431\\fname"
b, err := ioutil.ReadFile(fname)
if err != nil {
panic(err)
}
re, err := regexp.Compile("findme")
if err != nil {
// handle error
}
fmt.Println(re.FindAllStringIndex(string(b), -1))
}
Output:
[[12 18] [32 38]]
Note: I did this on Microsoft Windows, but saved the file in UNIX format (linefeed); if input file saved in Windows format (carriage return & linefeed) the byte offsets would increment to 13 and 35, respectively.
UPDATE: for large files, use bufio.Scanner; for example:
package main
import (
"bufio"
"fmt"
"log"
"os"
"regexp"
)
func main() {
fname, err := os.Open("C:\\Users\\UserName\\go\\src\\so56798431\\fname")
if err != nil {
log.Fatal(err)
}
defer fname.Close()
re, err := regexp.Compile("findme")
if err != nil {
// handle error
}
scanner := bufio.NewScanner(fname)
bytesRead := 0
for scanner.Scan() {
b := scanner.Text()
//fmt.Println(b)
results := re.FindAllStringIndex(b, -1)
for _, result := range results {
fmt.Println(bytesRead + result[0])
}
// account for UNIX EOL marker
bytesRead += len(b) + 1
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
Output:
12
32

Related

Go regExRepl-Script does not change the text file

my go script should add one newline before matching the regEx-Search-String ^(.+[,]+\n).
The Prototype i had tested before into the editor:
i want add newlines before this lines: \n$1.
This works if i try it into the Text-Editor.
If i try this (see line 24) with my script it is changing nothing and sends no error.
Any ideas what i do wrong?
Example
i like to use PCRE like it works in this Example https://regex101.com/r/sB9wW6/17
Same Example here:
Example source
Dear sir,
Thanks for your interest.
expected result
#### here is a newline ####
Dear sir,
Thanks for your interest.
result is (produced by the script below)
Dear sir,
Thanks for your interest.
go script:
// replace in files and store the new copy of it.
package main
import (
"fmt"
"io/ioutil"
"os"
"path/filepath"
"regexp"
"strings"
"time"
)
func visit(path string, fi os.FileInfo, err error) error {
matched, err := filepath.Match("*.csv", fi.Name())
if err != nil {
panic(err)
return err
}
if matched {
read, err := ioutil.ReadFile(path)
if err != nil {
panic(err)
}
newContents := string(read)
newContents = regExRepl(`^(.+[,]+\n)`, newContents, `\n\n\n$1`)
var re = regexp.MustCompile(`[\W]+`)
t_yymmdd := regexp.MustCompile(`[\W]+`).ReplaceAllString(time.Now().Format(time.RFC3339), `-`)[:10]
t_hhss := re.ReplaceAllString(time.Now().Format(time.RFC3339), `-`)[11:19]
t_yymmddhhss := t_yymmdd + "_" + t_hhss
fmt.Println(t_yymmddhhss)
filePath := fileNameWithoutExtension(path) + t_yymmddhhss + ".csv"
err = ioutil.WriteFile(filePath, []byte(newContents), 0)
if err != nil {
panic(err)
}
}
return nil
}
func regExRepl(regExPatt string, newContents string, regExRepl string) string {
return regexp.MustCompile(regExPatt).ReplaceAllString(newContents, regExRepl)
}
func main() {
err := filepath.Walk("./november2020messages.csv", visit) // <== read all files in current folder 20:12:06 22:44:42
if err != nil {
panic(err)
}
}
func fileNameWithoutExtension(fileName string) string {
return strings.TrimSuffix(fileName, filepath.Ext(fileName))
}
for interpretation \n as newline don't us
`\n`` use "\n"
may use ^(.+[,]+) instead ^(.+[,]+\n) and ad (?m) before for multi-line replacements
this suggestion you could test here: https://play.golang.org/p/25_0GJ93oCT
The following example illustrates the difference (in golang-playground here https://play.golang.org/p/FkPwElhx-Xu ):
// example from:
package main
import (
"fmt"
"regexp"
)
func main() {
newContents := `line 1,
line 2
line a,
line b`
newContents1 := regexp.MustCompile(`^(.+[,]+\n)`).ReplaceAllString(newContents, `\n$1`)
fmt.Println("hi\n" + newContents1)
newContents1 = regexp.MustCompile(`(?m)^(.+[,]+\n)`).ReplaceAllString(newContents, "\n$1")
fmt.Println("ho\n" + newContents1)
}
Result:
hi
\nline 1,
line 2
line a,
line b
ho
line 1,
line 2
line a,
line b

Read file at byte location

I need to read a file at a specific location, given by a byte offset.
filePath := "test_file.txt"
byteOffset := 6
// Read file
How can I achieve this, if possible without reading the whole file in memory ?
Package os
import "os"
func (*File) Seek
func (f *File) Seek(offset int64, whence int) (ret int64, err error)
Seek sets the offset for the next Read or Write on file to offset,
interpreted according to whence: 0 means relative to the origin of the
file, 1 means relative to the current offset, and 2 means relative to
the end. It returns the new offset and an error, if any. The behavior
of Seek on a file opened with O_APPEND is not specified.
Package io
import "io"
Seek whence values.
const (
SeekStart = 0 // seek relative to the origin of the file
SeekCurrent = 1 // seek relative to the current offset
SeekEnd = 2 // seek relative to the end
)
For example,
package main
import (
"fmt"
"io"
"os"
)
func main() {
filePath := "test.file"
byteOffset := 6
f, err := os.Open(filePath)
if err != nil {
panic(err)
}
defer f.Close()
_, err = f.Seek(int64(byteOffset), io.SeekStart)
if err != nil {
panic(err)
}
buf := make([]byte, 16)
n, err := f.Read(buf[:cap(buf)])
buf = buf[:n]
if err != nil {
if err != io.EOF {
panic(err)
}
}
fmt.Printf("%s\n", buf)
}
Output:
$ cat test.file
0123456789
$ go run seek.go
6789
$

Go Lang Scan doesent scan for next line

This scanner dosent scan for the next line. I will explain it in more detail when you see results...
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
)
func main() {
var inputFileName string
var write string
fmt.Scanln(&inputFileName)
//func Join(a []string, sep string) string
s := []string{inputFileName, ".txt"}
inputFileName = strings.Join(s, "")
creator, err := os.Create(inputFileName)
check(err)
/*
*Writing
*/
fmt.Printf("The file name with %s what do you want to write?", inputFileName)
fmt.Scanln(&write)
if len(write) <= 0 {
panic("Cant be empty")
}
byteStringWrite := []byte(write)
//func (f *File) Write(b []byte) (n int, err error)
fmt.Println("BYTE : ", byteStringWrite)
fmt.Println("NONBYTE : ", write)
_, errWriter := creator.Write(byteStringWrite)
check(errWriter)
/**
*Reading File
*/
read, errRead := ioutil.ReadFile(inputFileName)
check(errRead)
readString := string(read)
fmt.Println("*******************FILE*********************")
fmt.Println(readString)
}
func check(e error) {
if e != nil {
panic(e)
}
}
Results:
Sample.txt //My User Input
The file name with Sample.txt what do you want to write?Hello World
BYTE : [72 101 108 108 111]
NONBYTE : Hello
*******************FILE*********************
Hello
So Here you can see it dosent look for the space. Meaning after the space it automatically quits. Can someone help me figure out this problem? Thankyou.
EDIT
Using bufio.ReadString();
package main
import (
"fmt"
"io/ioutil"
"os"
"strings"
"bufio"
)
func main() {
var inputFileName string
var write string
bio := bufio.NewReader(os.Stdin)
inputFileName, err := bio.ReadString('\n')
fmt.Println(inputFileName)
//func Join(a []string, sep string) string
s := []string{inputFileName, ".txt"}
inputFileName = strings.Join(s, "")
creator, err := os.Create(inputFileName)
check(err)
/*
*Writing
*/
fmt.Printf("The file name with %s what do you want to write?", inputFileName)
fmt.Scanln(&write)
if len(write) <= 0 {
panic("Cant be empty")
}
byteStringWrite := []byte(write)
//func (f *File) Write(b []byte) (n int, err error)
fmt.Println("BYTE : ", byteStringWrite)
fmt.Println("NONBYTE : ", write)
_, errWriter := creator.Write(byteStringWrite)
check(errWriter)
/**
*Reading File
*/
read, errRead := ioutil.ReadFile(inputFileName)
check(errRead)
readString := string(read)
fmt.Println("*******************FILE*********************")
fmt.Println(readString)
}
func check(e error) {
if e != nil {
panic(e)
}
}
Results:
amanuel2:~/workspace/pkg_os/07_Practice $ go run main.go
Sample
The file name with Sample
.txt what do you want to write?Something Else
BYTE : [83 111 109 101 116 104 105 110 103]
NONBYTE : Something
*******************FILE*********************
Something
Gives me correct .txt .. But same issue as above, it dosent take spaces
This is exactly what fmt.Scanln is supposed to do:
Scan scans text read from standard input, storing successive
space-separated values into successive arguments. Newlines count as
space. It returns the number of items successfully scanned. If that is
less than the number of arguments, err will report why.
If you want to read a line of text use bufio.Reader:
bio := bufio.NewReader(os.Stdin)
// in case you want a string which doesn't contain the newline
line, hasMoreInLine, err := bio.ReadLine()
s := string(line)
fmt.Println(s)
// in case you need a string which contains the newline
s, err := bio.ReadString('\n')
fmt.Println(s)

How can I write a string to a binary file?

such as, I write 'A' but in file it is '1000001' ,
how can I do ?
I have tried
buf := new(bytes.Buffer)
data := []int8{65, 80}
for _, i := range data {
binary.Write(buf, binary.LittleEndian, i)
fp.Write(buf.Bytes())
}
but I got string 'AP' in file not a binary code
I didn't really understand the question, but perhaps you want something like:
package main
import (
"fmt"
"log"
"os"
)
func main() {
f, err := os.OpenFile("out.txt", os.O_TRUNC|os.O_CREATE|os.O_WRONLY, 0600)
if err != nil {
log.Fatal(err)
}
for _, v := range "AP" {
fmt.Fprintf(f, "%b\n", v)
}
f.Close()
}
which gives:
$ cat out.txt
1000001
1010000

How to read a file, abort with error if it's not valid UTF-8?

In Go, I want to read in a file line by line, into str's or []rune's.
The file should be encoded in UTF-8, but my program shouldn't trust it. If it contains invalid UTF-8, I want to properly handle the error.
There is bytes.Runes(s []byte) []rune, but that has no error return value. Will it panic on encountering invalid UTF-8?
For example,
package main
import (
"bufio"
"fmt"
"io/ioutil"
"os"
"strings"
"unicode/utf8"
)
func main() {
tFile := "text.txt"
t := []byte{'\xFF', '\n'}
ioutil.WriteFile(tFile, t, 0666)
f, err := os.Open(tFile)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer f.Close()
r := bufio.NewReader(f)
s, err := r.ReadString('\n')
if err != nil {
fmt.Println(err)
os.Exit(1)
}
s = strings.TrimRight(s, "\n")
fmt.Println(t, s, []byte(s))
if !utf8.ValidString(s) {
fmt.Println("!utf8.ValidString")
}
}
Output:
[255 10] � [255]
!utf8.ValidString
For example:
import (
"io/ioutil"
"log"
"unicode/utf8"
)
// ...
buf, err := ioutil.ReadAll(fname)
if error != nil {
log.Fatal(err)
}
size := 0
for start := 0; start < len(buf); start += size {
var r rune
if r, size = utf8.DecodeRune(buf[start:]); r == utf8.RuneError {
log.Fatalf("invalid utf8 encoding at ofs %d", start)
}
}
utf8.DecodeRune godocs:
DecodeRune unpacks the first UTF-8 encoding in p and returns the rune
and its width in bytes. If the encoding is invalid, it returns
(RuneError, 1), an impossible result for correct UTF-8.

Resources