Because lambdas are stateless and multiple instances can run at the same time it might be a bad idea to generate ids based on timestamps. I am currently using UUIDv1. I know the chance of generating the same IDs with the same timestamp already literally impossible. It's also enough unique enough for my application. Out of curiosity I'm thinking of ways to generate truly mathematically unique ids on aws lambda.
UUID v1 uses a node to distinct ids generated with the same timestamp. Random numbers or MAC-Adresses (bad idea for virtual instances) are used to create node ids.
If I had a unique id for my active lambda instance I would be able to generate truly unique ids. There is a awsRequestId inside the context object but it just seems like another timestamp based UUID.
Maybe you guys have more ideas?
AWS lambda:
System.getenv("AWS_LAMBDA_LOG_STREAM_NAME").replaceFirst(".*?(?=\\w+$)", EMPTY)
Defined runtime environment variables
AWS EC2:
httpGet http://169.254.169.254/latest/meta-data/instance-id
Instance metadata and user data
AWS ECS:
httpGet http://localhost:51678/v1/metadata
How to get Task ID from within ECS container?
Unique within subnet
String executableId = ManagementFactory.getRuntimeMXBean().getName();
Related
Is it ok to have multiple tenant QnA's to be stored in a single data source? for ex: in Azure Table Storage with all QnA's stored in a single table but each tenant data differentiated by an unique key and then filter results based on their unique key, this would help me to reduce the azure service cost but is their any drawbacks in using this method ?
Sharing a service/index in developer/test environments is fine, but there are additional concerns for production environments. These are some drawbacks, though you might not care about some of them:
competing queries: high traffic volume for one tenant can affect query latency/throughput for another tenant
harder to manage data for individual tenants: can you easily delete all documents for a particular tenant? Would the whole index need to be deleted or recreated for any reason which will affect all tenants?
flexibility in location: multiple services allow you to put data physically closer to where the queries will be issued. There can also be legal requirements for where data is stored.
susceptible to bugs/human error: people make mistakes; how bad is it to return data for the wrong tenant? How would you guard against that?
permission management: do you need to need grant permissions to view data for a subset of the tenants?
Given I have two microservices: Service A and Service B.
Service A owns full customer data and Service B requires a small subset of this data (which it gets from Service A through some bulk load say).
Both services store customers in their own database.
If service B then needs to interact with service A say to get additional data (e.g. GET /customers/{id}) it clearly needs a unique identifier that is shared between the two services.
Because the ids are GUIDs I could simply use the PK from Service A when creating the record in Service B. So both PKs match.
However this sounds extremely fragile. One option is to store the 'external Id' (or 'source id') as a separate field in Service B, and use that to interact with Service A. And probably this is a string as one day it may not be a GUID.
Is there a 'best practice' around this?
update
So I've done some more research and found a few related discussions:
Should you expose a primary key in REST API URLs?
Is it a bad practice to expose the database ID to the client in your REST API?
Slugs as Primary Keys
conclusion
I think my idea of trying to keep both Primary Keys for Customer the same across Service A and B was just wrong. This is because:
Clearly PKs are service implementation specific, so they may be totally incompatible e.g. UUID vs auto-incremented INT.
Even if you can guarantee compatibility, although the two entities both happen to be called 'Customer' they are effectively two (potentially very different) concepts and both Service A and Service B both 'own' their own 'Customer' record. What you may want to do though is synchronise some Customer data across those services.
So I now think that either service can expose customer data via its own unique id (in my case the PK GUID) and if one service needs to obtain additional customer data from another service it must store the other service identifier/key and uses that. So essentially back to my 'external id' or 'source id' idea but perhaps more specific as 'service B id'.
I think it depends a bit on the data source and your design. But, one thing I would avoid sharing is a Primary key which is a GUID or auto-increment integer to an external service. Those are internal details of your service and not what other services should take a dependency on.
I would rather have an external id which is more understood by other services and perhaps business as a whole. It could be a unique customer number, order number or a policy number as opposed to an id. You can also consider them as a "business id". One thing to also keep in mind is that an external id can also be exposed to an end-user. Hence, it is a ubiquitous way of identifying that "entity" across the entire organization and services irrespective of whether you have an Event-Driven-Design or if your services talk through APIs. I would only expose the DB ids to the infrastructure or repository. Beyond that, it is only a business/ external id.
Well, if you have an idea on Value Object, a business ID will much better for designing.
DDD focus on business, an pure UUID/Auto increment ID can't present it.
Use a business meaning ID(UL ID), like a Customer ID, instead of a simple ID.
I have a simple web app UI (which stores certain dataset parameters (for simplicity, assuming they are all data tables in a single Redshift database, but the schema/table name can vary, and the Redshift is in AWS). Tableau is installed on an EC2 instance in the same AWS account.
I am trying to determine an automated way of passing 'parameters' as a data source (i.e. within the connection string inside Tableau on EC2/AWS) rather than manually creating data source connections and inputting the various customer requests.
The flow for the user would be say 50 users select various parameters on the UI (for simplicity suppose the parameters are stored as a JSON file in AWS) -> parameters are sent to Tableau and data sources created -> connection is established within Tableau without the customer 'seeing' anything in the back end -> customer is able to play with the data in Tableau and create tables and charts accordingly.
How may I do this at least through a batch job or cloud formation setup? A "hacky" solution is fine.
Bonus: if the above is doable in real-time across multiple users that would be awesome.
** I am open to using other dashboard UI tools which solve this problem e.g. QuickSight **
After installing Tableau on EC2 I am facing issues in finding an article/documentation of how to pass parameters into the connection string itself and/or even parameterise manually.
An example could be customer1 selects "public_schema.dataset_currentdata" and "public_scema.dataset_yesterday" and one customer selects "other_schema.dataser_currentdata" all of which exist in a single database.
3 data sources should be generated (one for each above) but only the data sources selected should be open to the customer that selected it i.e. customer2 should only see the connection for other_schema.dataset_currentdata.
One hack I was thinking is to spin up a cloud formation template with Tableau installed for a customer when they make a request, creating the connection accordingly, and when they are done then just delete the cloud formation template. I am mainly unsure how I would get the connection established though i.e. pass in the parameters. I am not sure spinning up 50 EC2's though is wise. :D
An issue I have seen so far is creating a manual extract limits the number of rows. Therefore I think I need a live connection per customer request. Hence I am trying to get around this issue.
You can do this with a combination of a basic embed and applying filters. This would load the Tableau workbook. Then you would apply a filter based on whatever values your user selects from the JSON.
The final missing part is that you would use a parameter instead of a filter and pass those values to the database via initial sql.
I'm building a REST-API and while I was researching for what id type to give certain objects I saw the DigitalOcean API Documentation.
The objects: volume, volume snapshot, certificate, domain, firewall and load balancer are all having a string uuid.
The objects: action, domain record, droplet, droplet snapshot, droplet kernel, droplet backup, droplet neighbor, image and SSH key have a integer id.
But Droplets have an unique integer id.
What are the intentions of using integer ids or string ids in the situations of each object?
The only thing I thought off, DigitalOcean had used string ids in the early years couldn't just switch all string ids to integer ids.
Or
All objects which are short-lived or being created a massively often have a integer id for performance reasons and objects with string ids are like the opposite, long-lived and created less often.
I've made two tables to see better which objects have a string/integer id.
At DigitalOcean, we have standardized on using string uuids going forward. One of the main motivations was that primary keys tie are tied to a specific datastore implementation and can make architecture refactoring more difficult. So the resources using integer IDs are doing so for backwards compatibility and have simply been around longer (i.e. our Droplets were our first product while things like Load Balancers and Firewalls are more recent additions).
Full disclosure: Among other things, I maintain DigitalOcean's API documentation.
I want to have a try about implementing a normal chatting system after have read many artifles in confluent kafka. But I have met some problems when doing some structure design.
When using mysql as my data's db, I can give id to every meaningful message, like user_id in user table, message_id for message table. After having id in model table, it is very convinient for client and server doing some comunication.
But in Kafka stream, how can I give every meaningful model a unique id in KTable? Or is it really necessary for me to do this?
Maybe I can answer the question for myself.
In mysql, we can directly use sequenceId because all data will go to one place and then be auto allocated a new id. But when the table grows too large, we also need to split table to several little tables.In that case, we also should to regenerate the unique id for each record, because auto generated id in these tables is begun from 0.
Maybe it is the same in Kafka. When we only have one partition in kafka, we also can use the id from kafka generated id because all the message will go to only one place, so they will never be dumplicated. But when we want more partitions, we also have to be careful that these generated id from different partition is not global unique.
So what we should do is to generate id for ourself. UUID is a fast way to do this, but I we want to have a number, we can use a little algorithm to implement this. Maybe use the structure like this in a distributed enviroment:
[nodeid+threadId+current_time+auto_increased_number]