How to Delete Multiple Items from a DynamoDB Table in Go - go

I've a DynamoDB table that contains items like this:
type AuthEntry struct {
UserID string `dynamodbav:"userId"`
Token string `dynamodbav:"token"`
CreatedOn time.Time `dynamodbav:"createdOn"`
}
I need to delete all the AuthEntry items older than 5 minutes (CreatedOn < now - 5 mins) and without a token (Token is empty). It is clear to me how to remove one item at a time... but I'm wondering how to delete multiple items in one shot. Thank u very much.

I was looking for an example like the one here below... and I hope it helps other newbie like me. For instance, first I use Scan to retrieve the expired entries, and then I run BatchWriteItemInput to actually delete them.
import (
"context"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue"
"github.com/aws/aws-sdk-go-v2/service/dynamodb"
"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)
var tableName = "USER_AUTH"
...
type AuthRepository struct {
ctx context.Context
svc *dynamodb.Client
}
...
func NewAuthRepository(ctx context.Context) (*AuthRepository, error) {
cfg, err := config.LoadDefaultConfig(ctx)
if err != nil {
return nil, err
}
return &AuthRepository{ctx, dynamodb.NewFromConfig(cfg)}, nil
}
...
func (r *AuthRepository) Collect(maxAge int) (int32, error) {
t := time.Now().Add(time.Duration(maxAge*-1) * time.Millisecond).UTC()
params := &dynamodb.ScanInput{
TableName: aws.String(tableName),
ProjectionExpression: aws.String("userId"),
ExpressionAttributeValues: map[string]types.AttributeValue{
"t": &types.AttributeValueMemberS{*aws.String(t.String())},
},
FilterExpression: aws.String("createdOn < :t"),
}
result, err := r.svc.Scan(r.ctx, params)
if err != nil {
return 0, err
}
wr := make([]types.WriteRequest, result.Count)
for _, v := range result.Items {
authEntry := &AuthEntry{}
if err := attributevalue.UnmarshalMap(v, &authEntry); err != nil {
return 0, err
}
wr = append(wr, types.WriteRequest{
DeleteRequest: &types.DeleteRequest{
Key: map[string]types.AttributeValue{
"userId": &types.AttributeValueMemberS{*aws.String(authEntry.UserID)},
},
}})
}
input := &dynamodb.BatchWriteItemInput{
RequestItems: map[string][]types.WriteRequest{
tableName: wr,
},
}
_, err = r.svc.BatchWriteItem(r.ctx, input)
return result.Count, nil
}

When it comes to deletion, you have a few options.
deleteItem - Deletes a single item in a table by primary key.
batchWriteItem - The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests
TimeToLive - You can utilize DynamoDBs Time To Live (TTL) feature to delete items you no longer need. Keep in mind that TTL only marks your items for deletion and actual deletion could take up to 48 hours.
I'm not sure which items in your table are part of the primary key, so it's difficult to give you an example. However, this operation is the preferred method to delete multiple items at a time.

Related

How can I do conditional actions in SQLX

I have database store function:
func (p *ProductsRep) FindAll(PageNumber int, PaginationSize int, Query string) []*postgresmodels.Product {
Also I have SQL query look like this:
SELECT * FROM table_name.
Then I want to concat conditional action like WHERE some_value=3 if some value (in this case Query) exists then I want to get SELECT * FROM table_name WHERE some_value=3.
I tried to use fmt.Sprintf to concat, or strings.Join, or bytes.Buffer.WriteString. But everytime I getting this error:
I replace real value for understanding:
pq: column "Some value" does not exist.
How can I do "adaptive" queries, which depends on inputed function values.
I believe you are trying to query rows in the database by using parameters.
You need to make sure you don't pass this data in as RAW values, due to the potential risk of SQL injection. You can make queries by using store procedures
You can use the function Query to pass in your query with your parameters. In the example case this is $1. If you wanted to you could add $2, $3... etc depending on how many parameters you wanted to query
Here is two examples
Postgres
using "github.com/jackc/pgx/v4" driver
ctx := context.Background()
type Bar struct {
ID int64
SomeValue string
}
rows, err := conn.Query(ctx, `SELECT * FROM main WHERE some_value=$1`, "foo")
if err != nil {
fmt.Println("ERRO")
panic(err) // handle error
}
defer rows.Close()
var items []Bar
for rows.Next() {
var someValue string
var id int64
if err := rows.Scan(&id, &someValue); err != nil {
log.Fatal(err) // handle error
}
item := Bar{
ID: id,
SomeValue: someValue,
}
items = append(items, item)
}
fmt.Println(items)
MySQL Driver
https://golang.org/pkg/database/sql/#DB.QueryRow
type Bar struct {
ID int64
SomeValue string
}
rows, err := conn.Query(`SELECT * FROM main WHERE some_value=$1`, "foo")
if err != nil {
fmt.Println("ERRO")
panic(err) // handle error
}
defer rows.Close()
var items []Bar
for rows.Next() {
var someValue string
var id int64
if err := rows.Scan(&id, &someValue); err != nil {
log.Fatal(err) // handle error
}
item := Bar{
ID: id,
SomeValue: someValue,
}
items = append(items, item)
}
fmt.Println(items)

Parsing prometheus metrics from file and updating counters

I've a go application that gets run periodically by a batch. Each run, it should read some prometheus metrics from a file, run its logic, update a success/fail counter, and write metrics back out to a file.
From looking at How to parse Prometheus data as well as the godocs for prometheus, I'm able to read in the file, but I don't know how to update app_processed_total with the value returned by expfmt.ExtractSamples().
This is what I've done so far. Could someone please tell me how should I proceed from here? How can I typecast the Vector I got into a CounterVec?
package main
import (
"fmt"
"net/http"
"strings"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
"github.com/prometheus/common/model"
)
var (
fileOnDisk = prometheus.NewRegistry()
processedTotal = prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "app_processed_total",
Help: "Number of times ran",
}, []string{"status"})
)
func doInit() {
prometheus.MustRegister(processedTotal)
}
func recordMetrics() {
go func() {
for {
processedTotal.With(prometheus.Labels{"status": "ok"}).Inc()
time.Sleep(5 * time.Second)
}
}()
}
func readExistingMetrics() {
var parser expfmt.TextParser
text := `
# HELP app_processed_total Number of times ran
# TYPE app_processed_total counter
app_processed_total{status="ok"} 300
`
parseText := func() ([]*dto.MetricFamily, error) {
parsed, err := parser.TextToMetricFamilies(strings.NewReader(text))
if err != nil {
return nil, err
}
var result []*dto.MetricFamily
for _, mf := range parsed {
result = append(result, mf)
}
return result, nil
}
gatherers := prometheus.Gatherers{
fileOnDisk,
prometheus.GathererFunc(parseText),
}
gathering, err := gatherers.Gather()
if err != nil {
fmt.Println(err)
}
fmt.Println("gathering: ", gathering)
for _, g := range gathering {
vector, err := expfmt.ExtractSamples(&expfmt.DecodeOptions{
Timestamp: model.Now(),
}, g)
fmt.Println("vector: ", vector)
if err != nil {
fmt.Println(err)
}
// How can I update processedTotal with this new value?
}
}
func main() {
doInit()
readExistingMetrics()
recordMetrics()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe("localhost:2112", nil)
}
I believe you would need to use processedTotal.WithLabelValues("ok").Inc() or something similar to that.
The more complete example is here
func ExampleCounterVec() {
httpReqs := prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "How many HTTP requests processed, partitioned by status code and HTTP method.",
},
[]string{"code", "method"},
)
prometheus.MustRegister(httpReqs)
httpReqs.WithLabelValues("404", "POST").Add(42)
// If you have to access the same set of labels very frequently, it
// might be good to retrieve the metric only once and keep a handle to
// it. But beware of deletion of that metric, see below!
m := httpReqs.WithLabelValues("200", "GET")
for i := 0; i < 1000000; i++ {
m.Inc()
}
// Delete a metric from the vector. If you have previously kept a handle
// to that metric (as above), future updates via that handle will go
// unseen (even if you re-create a metric with the same label set
// later).
httpReqs.DeleteLabelValues("200", "GET")
// Same thing with the more verbose Labels syntax.
httpReqs.Delete(prometheus.Labels{"method": "GET", "code": "200"})
}
This is taken from the Promethus examples on Github
To use the value of vector you can do the following:
vectorFloat, err := strconv.ParseFloat(vector[0].Value.String(), 64)
if err != nil {
panic(err)
}
processedTotal.WithLabelValues("ok").Add(vectorFloat)
This is assuming you will only ever get a single vector value in your response. The value of the vector is stored as a string but you can convert it to a float with the strconv.ParseFloat method.

Scan a dynamodb table and using contains on a list with go sdk

I have a dynamodb field that looks like this:
[ { "S" : "test#gmail.com" }, { "S" : "test2#gmail.com" } ]
I am trying to run a scan to return any record that e.g. contain test#gmail.com . I am not sure I should use contains to do this, it's currently not returning any records, any pointers as to what I should use?
My go is setup like this:
type Site struct {
ID string `json:"id"`
Site string `json:"site"`
Emails []string `json:"emails,omitempty"`
}
func (ds *datastore) GetEmail(email string, out interface{}) error {
filt := expression.Name("emails").Contains(strings.ToLower(email))
fmt.Println("Get Email", filt)
//filt := expression.Contains(expression.Name("emails"), expression.Value(email))
proj := expression.NamesList(
expression.Name("emails"),
expression.Name("site"),
)
expr, err := expression.NewBuilder().
WithFilter(filt).
WithProjection(proj).
Build()
if err != nil {
fmt.Println(err)
}
scanInput := &dynamodb.ScanInput{
ExpressionAttributeNames: expr.Names(),
ExpressionAttributeValues: expr.Values(),
FilterExpression: expr.Filter(),
ProjectionExpression: expr.Projection(),
TableName: aws.String(ds.TableName),
}
result, err := ds.DDB.Scan(scanInput)
if err != nil {
fmt.Println("what is the err", err)
return err
}
if len(result.Items) == 0 {
fmt.Println("No Email found")
return errors.New(http.StatusText(http.StatusNotFound))
}
err = ds.Marshaler.UnmarshalMap(result.Items[0], out)
return err
}
If you're doing a contains on a partial email it won't match when the filter is applied to a set. It will have to be an exact email match.
{
"Email": "test#gmail.com"
}
// This will match a contains on "test#g"
{
"Emails": ["test#gmail.com", "another#gmail.com"]
}
// this will not match a contains on "test#g" but will match a contains of "test#gmail.com"
See contains: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
Also note you're doing a scan. Scans perform poorly in Dynamodb as soon as your data is of any significant size. Think of storing your data in a different format so you query it via partition keys or use an AWS RDS as an alternative.

How to make goroutines work with anonymous functions returning value in a loop

I am working on a custom script to fetch data from RackSpace cloudfiles container and make a list of all the files in a given container (container has around 100 million files) and I have been working on parallelizing the code and currently stuck.
// function to read data from channel and display
// currently just displaying, but there will be allot of processing done on this data
func extractObjectItemsFromList(objListChan <-chan []string) {
fmt.Println("ExtractObjectItemsFromList")
for _, c := range <-objListChan {
fmt.Println(urlPrefix, c, "\t", count)
}
}
func main()
// fetching data using flags
ao := gophercloud.AuthOptions{
Username: *userName,
APIKey: *apiKey,
}
provider, err := rackspace.AuthenticatedClient(ao)
client, err := rackspace.NewObjectStorageV1(provider,gophercloud.EndpointOpts{
Region: *region,
})
if err != nil {
logFatal(err)
}
// We have the option of filtering objects by their attributes
opts := &objects.ListOpts{
Full: true,
Prefix: *prefix,
}
var objectListChan = make(chan []string)
go extractObjectItemsFromList(objectListChan)
// Retrieve a pager (i.e. a paginated collection)
pager := objects.List(client, *containerName, opts)
// Not working
// By default EachPage contains 10000 records
// Define an anonymous function to be executed on each page's iteration
lerr := pager.EachPage(func(page pagination.Page) (bool, error) { // Get a slice of objects.Object structs
objectList, err := objects.ExtractNames(page)
if err != nil {
logFatal(err)
}
for _, o := range objectList {
_ = o
}
objectListChan <- objectList
return true, nil
})
if lerr != nil {
logFatal(lerr)
}
//---------------------------------------------------
// below code is working
//---------------------------------------------------
// working, but only works inside the loop, this keeps on fetching new pages and showing new records, 10000 per page
// By default EachPage contains 10000 records
// Define an anonymous function to be executed on each page's iteration
lerr := pager.EachPage(func(page pagination.Page) (bool, error) { // Get a slice of objects.Object structs
objectList, err := objects.ExtractNames(page)
if err != nil {
logFatal(err)
}
for _, o := range objectList {
fmt.Println(o)
}
return true, nil
})
if lerr != nil {
logFatal(lerr)
}
The first 10000 records are displayed but then it stuck and nothing happens. If I do not use channel and just run the plain loop it works perfectly fine, which kills the purpose of parallelizing.
for _, c := range <-objListChan {
fmt.Println(urlPrefix, c, "\t", count)
}
Your async worker pops one list from the channel, iterates it and exits. You need to have two loops: one reading the channel (range objListChan), the other - reading the (just retrieved) object list.

Delete objects in s3 using wildcard matching

I have the following working code to delete an object from Amazon s3
params := &s3.DeleteObjectInput{
Bucket: aws.String("Bucketname"),
Key : aws.String("ObjectKey"),
}
s3Conn.DeleteObjects(params)
But what i want to do is to delete all files under a folder using wildcard **. I know amazon s3 doesn't treat "x/y/file.jpg" as a folder y inside x but what i want to achieve is by mentioning "x/y*" delete all the subsequent objects having the same prefix. Tried amazon multi object delete
params := &s3.DeleteObjectsInput{
Bucket: aws.String("BucketName"),
Delete: &s3.Delete{
Objects: []*s3.ObjectIdentifier {
{
Key : aws.String("x/y/.*"),
},
},
},
}
result , err := s3Conn.DeleteObjects(params)
I know in php it can be done easily by s3->delete_all_objects as per this answer. Is the same action possible in GOlang.
Unfortunately the goamz package doesn't have a method similar to the PHP library's delete_all_objects.
However, the source code for the PHP delete_all_objects is available here (toggle source view): http://docs.aws.amazon.com/AWSSDKforPHP/latest/#m=AmazonS3/delete_all_objects
Here are the important lines of code:
public function delete_all_objects($bucket, $pcre = self::PCRE_ALL)
{
// Collect all matches
$list = $this->get_object_list($bucket, array('pcre' => $pcre));
// As long as we have at least one match...
if (count($list) > 0)
{
$objects = array();
foreach ($list as $object)
{
$objects[] = array('key' => $object);
}
$batch = new CFBatchRequest();
$batch->use_credentials($this->credentials);
foreach (array_chunk($objects, 1000) as $object_set)
{
$this->batch($batch)->delete_objects($bucket, array(
'objects' => $object_set
));
}
$responses = $this->batch($batch)->send();
As you can see, the PHP code will actually make an HTTP request on the bucket to first get all files matching PCRE_ALL, which is defined elsewhere as const PCRE_ALL = '/.*/i';.
You can only delete 1000 files at once, so delete_all_objects then creates a batch function to delete 1000 files at a time.
You have to create the same functionality in your go program as the goamz package doesn't support this yet. Luckily it should only be a few lines of code, and you have a guide from the PHP library.
It might be worth submitting a pull request for the goamz package once you're done!
Using the mc tool you can do:
mc rm -r --force https://BucketName.s3.amazonaws.com/x/y
it will delete all the objects with the prefix "x/y"
You can achieve the same with Go using minio-go like this:
package main
import (
"log"
"github.com/minio/minio-go"
)
func main() {
config := minio.Config{
AccessKeyID: "YOUR-ACCESS-KEY-HERE",
SecretAccessKey: "YOUR-PASSWORD-HERE",
Endpoint: "https://s3.amazonaws.com",
}
// find Your S3 endpoint here http://docs.aws.amazon.com/general/latest/gr/rande.html
s3Client, err := minio.New(config)
if err != nil {
log.Fatalln(err)
}
isRecursive := true
for object := range s3Client.ListObjects("BucketName", "x/y", isRecursive) {
if object.Err != nil {
log.Fatalln(object.Err)
}
err := s3Client.RemoveObject("BucketName", object.Key)
if err != nil {
log.Fatalln(err)
continue
}
log.Println("Removed : " + object.Key)
}
}
Since this question was asked, the AWS GoLang lib for S3 has received some new methods in S3 Manager to handle this task (in response to #Itachi's pr).
See Github record: https://github.com/aws/aws-sdk-go/issues/448#issuecomment-309078450
Here is their example in v1: https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/go/s3/DeleteObjects/DeleteObjects.go#L36
To get "wildcard matching" on paths inside the bucket, add the Prefix param to the example's ListObjectsInput call, as shown here:
iter := s3manager.NewDeleteListIterator(svc, &s3.ListObjectsInput{
Bucket: bucket,
Prefix: aws.String("somePathString"),
})
A bit late in the game, but since I was having the same problem, I created a small pkg that you can copy to your code base and import as needed.
func ListKeysInPrefix(s s3iface.S3API, bucket, prefix string) ([]string, error) {
res, err := s.Client.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(bucket),
Prefix: aws.String(prefix),
})
if err != nil {
return []string{}, err
}
var keys []string
for _, key := range res.Contents {
keys = append(keys, *key.Key)
}
return keys, nil
}
func createDeleteObjectsInput(keys []string) *s3.Delete {
rm := []*s3.ObjectIdentifier{}
for _, key := range keys {
rm = append(rm, &s3.ObjectIdentifier{Key: aws.String(key)})
}
return &s3.Delete{Objects: rm, Quiet: aws.Bool(false)}
}
func DeletePrefix(s s3iface.S3API, bucket, prefix string) error {
keys, err := s.ListKeysInPrefix(bucket, prefix)
if err != nil {
panic(err)
}
_, err = s.Client.DeleteObjects(&s3.DeleteObjectsInput{
Bucket: aws.String(bucket),
Delete: s.createDeleteObjectsInput(keys),
})
if err != nil {
return err
}
return nil
}
So, in the case you have a bucket called "somebucket" with the following structure: s3://somebucket/foo/some-prefixed-folder/bar/test.txt and wanted to delete from some-prefixed-folder onwards, usage would be:
func main() {
// create your s3 client here
// client := ....
err := DeletePrefix(client, "somebucket", "some-prefixed-folder")
if err != nil {
panic(err)
}
}
This implementation only allows to delete a maximum of 1000 entries from the given prefix due ListObjectsV2 implementation - but it is paginated, so it's a matter of adding the functionality to keep refreshing results until results are < 1000.

Resources