How to import CSV and save in database - go

This is a snippet of the first part of my code
package main
import (
"encoding/csv"
"fmt"
"os"
)
func main() {
file, err := os.Open("Account_balances.csv")
if err != nil {
fmt.Println("Error", err)
return
}
defer file.Close()
reader := csv.NewReader(file)
record, err := reader.ReadAll()
if err != nil {
fmt.Println("Error", err)
}
for value:= range record{ // for i:=0; i<len(record)
fmt.Println("", record[value])
}
}
I want to write code that saves the CSV file in any database (i.e SQL, SQLite or PostgreSQL).

The Go MySQL driver supports loading from file:
See https://github.com/go-sql-driver/mysql#load-data-local-infile-support and https://godoc.org/github.com/go-sql-driver/mysql#RegisterLocalFile.
RegisterLocalFile adds the given file to the file whitelist, so that
it can be used by "LOAD DATA LOCAL INFILE ". Alternatively
you can allow the use of all local files with the DSN parameter
'allowAllFiles=true'
filePath := "/home/gopher/data.csv" mysql.RegisterLocalFile(filePath)
err := db.Exec("LOAD DATA LOCAL INFILE '" + filePath + "' INTO TABLE
foo") if err != nil { ...

Each DB engine has different ways for importing CSVs in an optimized way. You should use them instead of writing you own methods for reading CSVs and mass inserting records.
Refs:
MySQL: https://dev.mysql.com/doc/refman/5.7/en/load-data.html
PgSQL: https://www.postgresql.org/docs/current/static/sql-copy.html

Related

List FTP files with goftp

I am trying to write a simple Go program which connects to an FTP server, list the files in a specified directory and pulls them.
The code is this:
package main
import (
"bytes"
"fmt"
"github.com/secsy/goftp"
"io/ioutil"
"log"
"os"
"path"
"time"
)
func main() {
config := goftp.Config{
User: "anonymous",
Password: "root#local.me",
ConnectionsPerHost: 21,
Timeout: 10 * time.Second,
Logger: os.Stderr,
}
// Connecting to the server
client, dailErr := goftp.DialConfig(config, "ftp.example.com")
if dailErr != nil {
log.Fatal(dailErr)
panic(dailErr)
}
// setting the search directory
dir := "/downloads/"
files, err := client.ReadDir(dir)
if err != nil {
for _, file := range files {
if file.IsDir() {
path.Join(dir, file.Name())
} else {
fmt.Println("the file is %s", file.Name())
}
}
}
// this section works , I am setting a file name and I can pull it
// if I mark the search part
ret_file := "example.PDF"
fmt.Println("Retrieving file: ", ret_file)
buf := new(bytes.Buffer)
fullPathFile := dir + ret_file
rferr := client.Retrieve(fullPathFile, buf)
if rferr != nil {
panic(rferr)
}
fmt.Println("writing data to file", ret_file)
fmt.Println("Opening file", ret_file, "for writing")
w, _ := ioutil.ReadAll(buf)
ferr := ioutil.WriteFile(ret_file, w, 0644)
if ferr != nil {
log.Fatal(ferr)
panic(ferr)
} else {
fmt.Println("Writing", ret_file, " completed")
}
}
For some reason I am getting an error on the ReadDir function.
I need to grab the files names so I can download them.
You're attempting to loop through files when ReadDir() returns an error. That will never work, as any time an error is returned files is nil.
This is pretty standard behavior and can be confirmed by reading the implementation of ReadDir().
I'm guessing you may have used the the example from the project used to demonstrate ReadDir() as a starting point. Within the example, the error handling is involved because it's deciding whether or not to continue walking the directory tree. However, note that when ReadDir() returns an error that doesn't result in stopping the program, the subsequent for loop is a no-op, since files is nil.
Here's a small program that demonstrates successfully using the results of Readdir() in a straightforward manner:
package main
import (
"fmt"
"github.com/secsy/goftp"
)
const (
ftpServerURL = "ftp.us.debian.org"
ftpServerPath = "/debian/"
)
func main() {
client, err := goftp.Dial(ftpServerURL)
if err != nil {
panic(err)
}
files, err := client.ReadDir(ftpServerPath)
if err != nil {
panic(err)
}
for _, file := range files {
fmt.Println(file.Name())
}
}
It outputs (which matches the current listing at http://ftp.us.debian.org/debian/):
$ go run goftp-test.go
README
README.CD-manufacture
README.html
README.mirrors.html
README.mirrors.txt
dists
doc
extrafiles
indices
ls-lR.gz
pool
project
tools
zzz-dists

Date format in the CSV before it is imported in the BigQuery

I have developed a Golang code that takes the csv file from Google cloud storage than import it to the Big Query table.
Everything is ok except that my CSV contains a DATE column in format of "2017-06-14 00:49:52 PDT". This cause the issue that CSV file can not be imported in the Big Query since the format must be "2017-06-14". I can not manually edit it in the CSV before uploading to the Google storage because it is about a very huge files (that changes every day).
Is there any option to update the CSV hosted on a storage using go lang and leave only "2017-06-14" value for this column (DATE) before executing the rest of code which import it to the Big Query, or any other solution?
Thank you in advance!
package storagetobigquery
import (
"cloud.google.com/go/bigquery"
"github.com/gin-gonic/gin"
"google.golang.org/appengine"
)
// StoragetoBigquery function
func StoragetoBigquery(c *gin.Context) {
ctx := appengine.NewContext(c.Request)
client, err := bigquery.NewClient(ctx, "MY PROJECT ID")
gcsRef := bigquery.NewGCSReference("PATH TO THE GOOGLE STORAGE CSV FILE")
gcsRef.SourceFormat = bigquery.CSV
gcsRef.AutoDetect = true
gcsRef.SkipLeadingRows = 1
loader := client.Dataset("DATASET NAME").Table(TABLE NAME).LoaderFrom(gcsRef)
loader.WriteDisposition = bigquery.WriteTruncate
job, err := loader.Run(ctx)
if err != nil {
panic(err)
}
status, err := job.Wait(ctx)
if err != nil {
panic(err)
}
if status.Err() != nil {
panic(status.Err)
}
}
If you have to use the Go client library, before changing the format you need to get the object from GCS. Then, re-upload to import it to BQ as you're doing. There is no method documented in https://godoc.org/cloud.google.com/go/storage to update the object in GCS directly, only its metadata.

Replacing a line within a file with Golang

I'm new to Golang, starting out with some examples. Currently, what I'm trying to do is reading a file line by line and replace it with another string in case it meets a certain condition.
The file is use for testing purposes contains four lines:
one
two
three
four
The code working on that file looks like this:
func main() {
file, err := os.OpenFile("test.txt", os.O_RDWR, 0666)
if err != nil {
panic(err)
}
reader := bufio.NewReader(file)
for {
fmt.Print("Try to read ...\n")
pos,_ := file.Seek(0, 1)
log.Printf("Position in file is: %d", pos)
bytes, _, _ := reader.ReadLine()
if (len(bytes) == 0) {
break
}
lineString := string(bytes)
if(lineString == "two") {
file.Seek(int64(-(len(lineString))), 1)
file.WriteString("This is a test.")
}
fmt.Printf(lineString + "\n")
}
file.Close()
}
As you can see in the code snippet, I want to replace the string "two" with "This is a test" as soon as this string is read from the file.
In order to get the current position within the file I use Go's Seek method.
However, what happens is that always the last line gets replaced by This is a test, making the file looking like this:
one
two
three
This is a test
Examining the output of the print statement which writes the current file position to the terminal, I get that kind of output after the first line has been read:
2016/12/28 21:10:31 Try to read ...
2016/12/28 21:10:31 Position in file is: 19
So after the first read, the position cursor already points to the end of my file, which explains why the new string gets appended to the end. Does anyone understand what is happening here or rather what is causing that behavior?
The Reader is not controller by the file.Seek. You have declared the reader as: reader := bufio.NewReader(file) and then you read one line at a time bytes, _, _ := reader.ReadLine() however the file.Seek does not change the position that the reader is reading.
Suggest you read about the ReadSeeker in the docs and switch over to using that. Also there is an example using the SectionReader.
Aside from the incorrect seek usage, the difficulty is that the line you're replacing isn't the same length as the replacement. The standard approach is to create a new (temporary) file with the modifications. Assuming that is successful, replace the original file with the new one.
package main
import (
"bufio"
"io"
"io/ioutil"
"log"
"os"
)
func main() {
// file we're modifying
name := "text.txt"
// open original file
f, err := os.Open(name)
if err != nil {
log.Fatal(err)
}
defer f.Close()
// create temp file
tmp, err := ioutil.TempFile("", "replace-*")
if err != nil {
log.Fatal(err)
}
defer tmp.Close()
// replace while copying from f to tmp
if err := replace(f, tmp); err != nil {
log.Fatal(err)
}
// make sure the tmp file was successfully written to
if err := tmp.Close(); err != nil {
log.Fatal(err)
}
// close the file we're reading from
if err := f.Close(); err != nil {
log.Fatal(err)
}
// overwrite the original file with the temp file
if err := os.Rename(tmp.Name(), name); err != nil {
log.Fatal(err)
}
}
func replace(r io.Reader, w io.Writer) error {
// use scanner to read line by line
sc := bufio.NewScanner(r)
for sc.Scan() {
line := sc.Text()
if line == "two" {
line = "This is a test."
}
if _, err := io.WriteString(w, line+"\n"); err != nil {
return err
}
}
return sc.Err()
}
For more complex replacements, I've implemented a package which can replace regular expression matches. https://github.com/icholy/replace
import (
"io"
"regexp"
"github.com/icholy/replace"
"golang.org/x/text/transform"
)
func replace2(r io.Reader, w io.Writer) error {
// compile multi-line regular expression
re := regexp.MustCompile(`(?m)^two$`)
// create replace transformer
tr := replace.RegexpString(re, "This is a test.")
// copy while transforming
_, err := io.Copy(w, transform.NewReader(r, tr))
return err
}
OS package has Expand function which I believe can be used to solve similar problem.
Explanation:
file.txt
one
two
${num}
four
main.go
package main
import (
"fmt"
"os"
)
var FILENAME = "file.txt"
func main() {
file, err := os.ReadFile(FILENAME)
if err != nil {
panic(err)
}
mapper := func(placeholderName string) string {
switch placeholderName {
case "num":
return "three"
}
return ""
}
fmt.Println(os.Expand(string(file), mapper))
}
output
one
two
three
four
Additionally, you may create a config (yml or json) and
populate that data in the map that can be used as a lookup table to store placeholders as well as their replacement strings and modify mapper part to use this table to lookup placeholders from input file.
e.g map will look like this,
table := map[string]string {
"num": "three"
}
mapper := func(placeholderName string) string {
if val, ok := table[placeholderName]; ok {
return val
}
return ""
}
References:
os.Expand documentation: https://pkg.go.dev/os#Expand
Playground

Reading from a text file in Golang?

I want to do this:
Read a line from a text file.
Process the line.
Delete the line.
My first thought was to read the entire file into memory with ioutil.Readfile(),
but I'm not sure how to update the text file after the line has been processed,
and what happens if extra lines is added to the text file after it has been read into memory?
I normally write shell scripts and would do something like this:
while read -r line; do
echo "${line}"
sed -i 1d "${myList}"
done < "${myList}"
What is the best way to do this in Golang?
Use the bufio package.
Here's the basic syntax for opening a text file and looping through each line.
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
// Open the file.
f, _ := os.Open("C:\\programs\\file.txt")
// Create a new Scanner for the file.
scanner := bufio.NewScanner(f)
// Loop over all lines in the file and print them.
for scanner.Scan() {
line := scanner.Text()
fmt.Println(line)
}
}
you have some options:
1- read file, process it, then write it back (you need to lock that file).
2- use binary file and invent (make use of) special data structure (like linked list) to optimize text processing (with line locking).
3- use ready made databases.
4- use Virtual filesystem inside your file, and treat each line like one file, see: https://github.com/lotrfan/vfs and https://github.com/blang/vfs
using file manager (like database server) solves the file locking dilemma.
and if the purpose of using file is one way communication which sender program just adds new line and receiver program just removes it, it is better to use os pipes (named pipe (FIFO)) or other interop methods.
see for Linux: Unix FIFO in go?
for Windows: https://github.com/natefinch/npipe
sample file writer:
package main
import (
"bufio"
"fmt"
"os"
"time"
)
func main() {
f, err := os.OpenFile("/tmp/file.txt", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
if err != nil {
panic(err)
}
defer f.Close()
for i := 0; ; i++ {
w := bufio.NewWriter(f)
_, err := fmt.Fprintln(w, i)
if err != nil {
panic(err)
}
w.Flush() // Flush writes any buffered data to the underlying io.Writer.
f.Sync() // commit the current contents of the file to stable storage.
fmt.Println("write", i)
time.Sleep(500 * time.Millisecond)
}
}
sample file reader:
package main
import (
"fmt"
"os"
"time"
)
func main() {
f, err := os.OpenFile("/tmp/file.txt", os.O_RDWR, 0666)
if err != nil {
panic(err)
}
defer f.Close()
i := 0
for {
n, err := fmt.Fscanln(f, &i)
if n == 1 {
fmt.Println(i)
}
if err != nil {
fmt.Println(err)
return
}
time.Sleep(500 * time.Millisecond)
}
}

How to edit a reader in Go

I'm trying to work out what the best practise is to change some data in a stream without ioutil.ReadAll.
I need to remove lines beginning with a certain character and strip all instances of another.
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"gopkg.in/pg.v3"
)
func main() {
fieldSep := "\x01"
badChar := "\x02"
comment := "#"
dbName := "foo"
db := pg.Connect(&pg.Options{})
file, err := os.Open("/path/to/file")
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
defer file.Close()
// I need to iterate my file Reader here
// all lines that begin with comment and remove them
scanner := bufio.NewScanner(file)
for scanner.Scan() {
file := bytes.TrimRight(file, comment)
}
// all instances of badChar should be dropped
file := bytes.Trim(file, badChar)
_, err = db.CopyFrom(file, fmt.Sprintf("COPY %s FROM STDIN WITH DELIMITER e'%s'", dbName, fieldSep))
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
err = db.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "ERROR: %s\n", err)
}
fmt.Println("Import Done")
}
Context:
I'm to importing a large amount (>10GB) of data into a database, it's spread across several files.
My database interface accepts a reader to load the data.
The data has non-standard line endings and I need to strip comments (because PG's COPY FROM is no fun).
I know the code I've got to edit the stream is woeful, I just can't find a good reference - thanks!
If I was in your position, I'd make my own Reader, and insert it between the source and the destination. That's what consistent interfaces are for. Your reader would work easily on the small chunks of data along as they flow past.
Source (io.Reader) ==> Your filter (io.Reader) ==> Destination (expects an io.Reader)
provides the data does the transformations rock'n'rolls
A library example of such a reader that's made to be inserted between a reader and its client is bufio.Reader, that'll let you speed up many types of readers by buffering larger calls to the source, and letting the client consume the data in small bits if it likes it so. You can check out its source : http://golang.org/src/bufio/bufio.go

Resources