How to dump huge csv data (4GB) into mysql - go

If anyone had tried this before using Go, please get the idea with code, that would be really appreciated.
I wrote few line which is slow
// This is to read the csv file
func usersFileLoader(filename string, channel chan User) {
defer close(channel)
file, err := os.Open(filename)
if err != nil {
panic(err)
}
defer file.Close()
var user User
reader := csv.NewReader(file)
for {
err := Unmarshal(reader, &user)
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
channel <- user
}
}
// This is to insert csv file
func saveUser(channel <-chan User, db *sql.DB) {
stmt, err := db.Prepare(`
INSERT INTO Users( id, name, address) values ( ?, ?, ?)`)
if err != nil {
log.Fatal(err)
}
for usr := range channel {
_, err := stmt.Exec(
usr.ID,
usr.Name,
usr.Address,
)
if err != nil {
log.Fatal(err)
}
}
}
// here is the struct of the user
type User struct {
ID int `csv:"id"`
Name int `csv:"name"`
Address int `csv:"address"`
}
// here is my main func
func main() {
db := DBconnect(ConnectionString(dbConfig()))
channel := make(chan User)
go usersFileLoader("../user.csv", channel)
saveUser(channel, db)
defer db.Close()
}
// This code is working but slow for me.
Share your thought and ideas

I wouldn't attempt to use Go's built in standard library functions for loading a very large CSV file into MySQL (unless, of course, you are simply trying to learn how they work).
For best performance I would simply use MySQL's built in LOAD DATA INFILE functionality.
For example:
result, err := db.Exec("LOAD DATA INFILE ?", filename)
if err != nil {
log.Fatal(err)
}
log.Printf("%d rows inserted\n", result.RowsAffected())
If you haven't used LOAD DATA INFILE before, note carefully the documentation regarding LOCAL. Depending on your server configuration and permissions, you might need to use LOAD DATA LOCAL INFILE instead. (If you intend to use Docker containers, for instance, you will absolutely need to use LOCAL.)

Related

Read BQ query result without struct

Has anybody tried storing result from a query to a map?
I want to able to read data from BQ tables without having the need to define a struct that matches the BQ table schema.
I have tried following https://kylewbanks.com/blog/query-result-to-map-in-golang, but I want to use a RowIterator instead of the approach in this link.
Here's the code I am struggling with:
//Removed error handling for brewity
ctx := context.Background()
client, _ := bigquery.NewClient(ctx, ProjectID)
query := fmt.Sprintf("SELECT * FROM `%s.%s.%s` LIMIT 5;", ProjectID, DatasetId, ResourceName)
queryResult := client.Query(query)
it, _ := queryResult.Read(ctx)
for {
row := make(map[string]bigquery.Value)
err := it.Next(&row)
if err == iterator.Done {
break
}
if err != nil {
fmt.Printf("Error happened")
}}
I am not sure how to proceed after this, I would ideally like to convert the data into a JSON format.
for {
var values []bigquery.Value
err := it.Next(&values)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
fmt.Println(values)
}
Place rows into slice as you can store a row using anything that implements the ValueLoader interface, or with a slice or map of bigquery.Value
ref: godocs bq

jmoiron/sqlx, ...interface{}, and abstracting some boilerplate

I thought I'd try to be a little bit "clever" and abstract some of my boilerplate SQL code (using sqlx -- https://github.com/jmoiron/sqlx). The idea is to feed a code a function pointer to process the result, along with the sql string and args that produce the rows. As it happens, the code works fine provided I strip out the "sqlArgs interface" stuff, but in the "cleverer" format errors with the statement
sql: converting Exec argument $1 type: unsupported type []interface {}, a slice of interface
Here are two versions, the first one that errors, the second that works but without parameterization:
//GetRows (doesn't work)
func GetRows(parseRows func(*sqlx.Rows), sql string, sqlArgs ...interface{}) {
db := sqlx.MustConnect("mysql", ConnString)
defer db.Close()
rows, err := db.Queryx(sql, sqlArgs)
defer rows.Close()
if err != nil {
panic(err)
}
parseRows(rows)
}
//GetRows ... (works, but doesn't allow parameterization)
func GetRows(fp func(*sqlx.Rows), sql string) {
db := sqlx.MustConnect("mysql", ConnString)
defer db.Close()
rows, err := db.Queryx(sql)
defer rows.Close()
if err != nil {
panic(err)
}
fp(rows)
}
The idea is to call the code something like this:
func getUser(userID string) User {
var users []*User
parseRows := func(rows *sqlx.Rows) {
for rows.Next() {
var u User
err := rows.StructScan(&u)
if err != nil {
panic(err)
}
users = append(users, u)
}
}
sql := "SELECT * FROM users WHERE userid = ?;"
sqlutils.GetRows(parseRows, sql, userID)
if len(users) == 1{
return users[0]
}
return User{}
}
I guess my code doesn't actually pass through the userID from call to call, but instead it passes an []interface{}, which the sql package can't handle. I'm not sure about that, however. In any case, is there any way to accomplish this idea? Thanks.

One file two different outputs - Windows Server 2012

My program reads in an sql file and performs operation on a database.
I edited one of the sql files on the server via notepad yesterday.
I made one more change on the same file today, again via notepad.
When program reads in the file, the changes I made to the sql are not there.
Printing the sql contents to the console reveals that the binary is reading in the version from yesterday.
What black magic is at play here?
Deleting the file does not work.
If I create it again the Date created time stamp is from 1 month ago. Date modified is from yesterday.
Opening the file in notepad, wordpad any text reader you can think of shows the correct contents.
Binary reads the version from yesterday.
This is how the binary reads the file
file, err := ioutil.ReadFile("appointment.sql")
if err != nil {
log.Fatal(err)
}
Program was cross compiled for windows on a mac.
Sql files were written originally on a mac via vim and then uploaded to the server.
EDIT: I include the code from the method after suggested debugging.
func (r *Icar) ReadAppointments(qCfg dms.QueryConfig) []dms.Appointment {
// r.conn contains the db connection
/*DEBUGGING*/
name := "appointment.sql"
fmt.Printf("%q\n", name)
path, err := filepath.Abs(name)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q\n", path) //correct path
file, err := ioutil.ReadFile("appointment.sql")
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q\n", file) //correct output
/*END*/
appointmentQuery := string(file)
fmt.Println(appointmentQuery) //correct output
appointmentQuery = strings.Replace(appointmentQuery, "#", qCfg.QueryLocationID, -1)
fmt.Println(appointmentQuery) //correct output
rows, err := r.conn.Query(appointmentQuery)
if err != nil {
fmt.Println(appointmentQuery) //wrong output. output contains edits from a previous version
log.Fatal("Error reading from the database: %s", err.Error())
}
appointments := []dms.Appointment{}
var (
ExternalID,
WONumber,
CustomerWaiting interface{}
)
for rows.Next() {
appointment := dms.Appointment{}
err = rows.Scan(&ExternalID, &WONumber, &appointment.AppointmentDate, &CustomerWaiting)
if err != nil {
fmt.Println(appointmentQuery)
log.Fatal(err)
}
toStr := []interface{}{ExternalID, WONumber}
toInt := []interface{}{CustomerWaiting}
convertedString := d.ConvertToStr(toStr)
convertedInt := d.ConvertToInt(toInt)
appointment.ExternalID = convertedString[0]
appointment.WONumber = convertedString[1]
appointment.CustomerWaiting = convertedInt[0]
appointments = append(appointments, appointment)
}
err = rows.Close()
return appointments
}
I close the db connection in a deferred statement in my main func.
Here is the constructor for reference
func New(config QueryConfig) (*Icar, func()) {
db, err := sql.Open("odbc", config.Connection)
if err != nil {
log.Fatal("The database doesn't open correctly:\n", err.Error())
}
icar := &Icar{
conn: db,
}
return icar, func() {
icar.conn.Close()
}
}
Basic debugging says check your inputs and outputs. You may be looking at different files. Clearly, "appointment.sql" is not necessarily unique in the file system. For example, does this give you expected results?
package main
import (
"fmt"
"io/ioutil"
"log"
"path/filepath"
)
func main() {
name := "appointment.sql"
fmt.Printf("%q\n", name)
path, err := filepath.Abs(name)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q\n", path)
file, err := ioutil.ReadFile(name)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q\n", file)
}
Output:
"appointment.sql"
"C:\\Users\\peter\\gopath\\src\\so\\appointment.sql"
"SELECT * FROM appointments;\n"

Golang gRPC server-stream

I have some issue with a gRPC server-stream on golang.
I had no problem with one row, I just used the simple gRPC response for it.
But now I need to send a number of rows from my database and I can't finish my server-stream application.
I just learn Golang and gRPC and this task a little bit difficult to me to solve this task now. And I will be very grateful if someone could help with this because not too many examples of this material on the web.
Or maybe you now where I can find an example, how to stream data from database using gRPC + golang. Thank you
I have this code:
....
type record struct {
id int
lastname string
}
type server struct {
}
func (s *server) QueryData(city *pb.City, stream pb.GetDataStream_QueryDataServer) error {
db, err := sql.Open("postgres", "postgres://adresspass")
if err != nil {
log.Fatal(err)
}
defer db.Close()
rows, err := db.Query(`SELECT id, last_name FROM person WHERE city=$1`, city.City)
if err != nil {
log.Fatal(err)
}
var rs = make([]record, 0)
var rec record
for rows.Next() {
err = rows.Scan(&rec.id, &rec.lastname)
if err != nil {
return err
}
rs = append(rs, rec)
}
for _, onedata := range rs {
if err := stream.Send(onedata); err != nil {
return err
}
}
return nil
}
...
You should scan your values from database as protocol buffer types and not native types.
You should have defined a message format in proto file
message DbVal{
uint32 id = 1;
message string = 2;
}
And pass that struct to the stream.

Goroutine behaves differently on Windows and Linux

I`m new to the GO. I have a following legacy code.
var db *sql.DB
func init() {
go feedChan()
connString := os.Getenv("DB_CONN")
var err error
db, err = sql.Open("postgres", connString)
if err != nil {
log.Fatalf("Failed to connect to database at %q: %q\n", connString, err)
}
// confirm connection
if err = db.Ping(); err != nil {
log.Fatalf("Unable to ping database at %q: %q\n", connString, err)
}
}
func feedChan() {
selectQuery, err := db.Prepare(`
SELECT id, proxy
FROM proxy
WHERE fail_count < 2
ORDER BY date_added DESC, last_used ASC, fail_count ASC
LIMIT 5
`)
....
Following code works on linux. But it fails on windows with nil error on
selectQuery, err := db.Prepare(`
Which make sense for me, since db initialized after a launch of feedChan goroutine. What doesnt make sense for me is why it work on linux.
So the question is why this code work at linux without errors?
That's probably a race condition. Import "time", put this line after go feedChan(), and see if it still works on Linux:
time.Sleep(3 * time.Second)
In order to avoid this situation, you could either initialize db before you spawn the routine (which uses db) or use some sort of barrier:
func init() {
barrier := make(chan int)
go feedChan(barrier)
connString := os.Getenv("DB_CONN")
var err error
db, err = sql.Open("postgres", connString)
if err != nil {
log.Fatalf("Failed to connect to database at %q: %q\n", connString, err)
// Retry.
} else {
barrier <- 1 // Opens barrier.
}
// ...
}
func feedChan(barrier chan int) {
<-barrier // Blocks until db is ready.
selectQuery, err := db.Prepare(`
SELECT id, proxy
FROM proxy
WHERE fail_count < 2
ORDER BY date_added DESC, last_used ASC, fail_count ASC
LIMIT 5
`)
// ...
}
After reading the first lines of your functions I just can say you that your legacy code has a huge bug and it can be easily fixed just moving this line go feedChan() to the end of the init() function.
Also note the main reason is not a race condition, just a matter of wait for the correct initialization of db variable.

Resources