Plaid API investment holdings don't have unique identifier - plaid

/investments/holdings/get does not return a unique identifier. So it does not seem possible to make local DB updates without wiping all existing holding data. Is there something I'm missing?
Sample response from Plaid:
{
"account_id": "5d37rd4BdJkq1zZkR8XEI9ovVAn35Ph464637",
"cost_basis": null,
"institution_price": 15.24,
"institution_price_as_of": null,
"institution_value": 30.48,
"iso_currency_code": "USD",
"quantity": 2,
"security_id": "3mg4qV4JZyckeZnYfgazubEhn8gLKkUeMVpx4",
"unofficial_currency_code": null
}
Note that account_id + security_id is not a valid compound key. Plaid returns each "lot" of a holding, so there can be multiple holdings for the same security and account, as they would likely have different cost bases.
If Plaid is listening, it would be nice to add a unique holding_id to the response, which is surely being stored on Plaid's end, similar to transaction_id, item_id, or account_id.

Related

AggregatingMergeTree order by column not in the sorting key

What are some options to have AggregatingMergeTree merge by a column but ordered by a column that's not in the sorting key?
My application is similar to Zendesk tickets. A ticket has a category, status, and ID. The application emits ticket status change events to CH and I'm calculating statistics on the time it took to close since it was created given some time range R group by some time period P.
For example, events look like this
{
"ticket": "A",
"event_time": 2022-12-08T15:00:00Z,
"category": "bug",
"status": "created"
},
{
"ticket": "A",
"event_time": 2022-12-08T15:30:00Z,
"category": "bug",
"status": "reviewing"
},
{
"ticket": "A",
"event_time": 2022-12-08T16:00:00Z,
"category": "bug",
"status": "reviewed"
}
My AggregatingMergeTree (more specifically, it's replicated) has a sorting key on the ticket ID to aggregate two states into one.
CREATE TABLE ticket_created_to_reviewed
(
`ticket` String,
`created_ticket_event_id` SimpleAggregateFunction(max, String),
`created_ticket_event_time` SimpleAggregateFunction(max, DateTime64(9)),
`created_ticket_category` SimpleAggregateFunction(max, String),
`close_ticket_event_id` SimpleAggregateFunction(max, String),
`close_ticket_event_time` SimpleAggregateFunction(max, DateTime64(9)),
`close_ticket_category` SimpleAggregateFunction(max, String),
)
ENGINE = ReplicatedAggregatingMergeTree('<path>', '{replica}')
PARTITION BY toYYYYMM(close_ticket_event_time)
PRIMARY KEY ticket
ORDER BY ticket
TTL date_trunc('second', if(close_ticket_event_time > created_ticket_event_time,
close_ticket_event_time, created_ticket_event_time)) + toIntervalMonth(12)
SETTINGS index_granularity = 8192
Two MVs SELECT on the raw events and inserts to the ticket_created_to_reviewed. One for WHERE status = 'created' and another for WHERE status = 'reviewed'
So far the data populates correctly, although I have to exclude rows that only have one of the status events populated. Getting hourly p99 of ticket time to close past 1 day for each category looks something like this
SELECT
quantile(0.9)(date_diff('second', created_ticket_event_time, close_ticket_event_time)),
date_trunc('hour', close_ticket_event_time) as t,
close_ticket_category as category
FROM
(
SELECT
ticket,
max(created_ticket_event_id) AS created_ticket_event_id,
max(created_ticket_event_time) AS created_ticket_event_time,
max(created_ticket_category) AS created_ticket_category,
max(close_ticket_event_id) AS close_ticket_event_id,
max(close_ticket_event_time) AS close_ticket_event_time,
max(close_ticket_category) AS close_ticket_category
FROM ticket_unreviewed_to_reviewed
GROUP BY ticket
)
WHERE close_ticket_event_id != '' AND created_ticket_event_id != '' AND
close_ticket_event_time > addDays(now(), -1)
GROUP BY t, category
The problem is close_ticket_event_time is not in the sorting key so the query scans the full table, but I can't also include that column in the sorting key because the table wouldn't then aggregate by the ticket ID.
Any suggestions?
Things tried:
Adding an index and/or projection that orders by close_ticket_event_time. However, I think the main problem is that the sorting key is on ticket ID so the data is not ordered by time to efficiently find the matching time range, but at the same time adding close_ticket_event_time breaks the aggregation behavior in AggregatingMergeTree
MV that joins created ticket and closed ticket, and a different destination table with close_ticket_event_time as the sorting key. The destination table doesn't contain all the data if the right side of the JOIN isn't available at the time MV was triggered (i.e. left side). This can happen if events are ingested out of order.
Ideally, what I'm looking for is something like this in AggregatingMergeTree, but it appears this isn't possible due to the nature of how the data is stored.
PRIMARY KEY ticket
ORDER BY close_ticket_event_time
Thanks in advance

By usecases 1.create customer 2.update(kyc) 3.Deposit 4 Fund transfer using 3 microservices Api validation

USE case :
Create customer
create a table with name customers having following columns : Id(Numeric), customer_id(varchar), customer_name(varchar), customer_mobileNo(Numeric), EmailId(Varchar) Account_Id(varchar), Amount(Numeric), address(varchar), is_KYC_DONE(Boolean), active(Boolean)
Make a post rest end point to insert data into the customer table and register 2 customers with default amount like 0.
API Validation be like customer_id, customer_name, account_id should be mandatory, Mobile number should be numeric, Email should be in right format, is_KYC_DONE and active should have default value of 0 and 1
Update KYC to true
Make a patch rest point to update kyc in table to true on basis of customer id.
Deposit
Make a post rest end point to deposit amount in above created table based on account id
Deposit should happen when kyc is done i.e true otherwise tranaction will be declined
Fund transfer
Make a post rest point to do amount transfer from one account to another.
payload should be like this :
transfer {
“Amount” :
withdraw {
“Account_id” : “”
},
Deposit {
“Account_id” : “”
}
}
Validation be like
Sufficent funds should be dere
for both withdraw and deposit accountId and amount should be mandatory.
if deposit fails then amount withdrawn from account should be rollbacked.
NOTE : Mke sure atmost once and at least once the transaction should happen.
There will 3 microservices
customer : will have create customer and update kyc end points
Deposit : will have deposit end point
fundtransfer : will have fundtransfer endpoint, withdraw happens from this microservice and for deposit call should go to deposit miroservice.
I am in the process of learning about microservices and there's one thing that I can't seem to figure out and I can't find any resources that give me a direct answer to this. The question is: Do microservices involve only business logic and

ClickHouse: how to enable performant queries against increasing user-defined attributes

I am designing a system that handles a large number of buried point event. An event record contains:
buried_point_id, for example: 1 means app_launch, 2 means user_register.
happened_at: the event timestamp.
user_id: the user identifier.
other attributes, including basic ones (phone_number, city, country) and user-defined ones (click_item_id, it literally can be any context information). PMs will add more and more user-defined attributes to the event record.
The query pattern is like:
SELECT COUNT(DISTINCT user_id) FROM buried_points WHERE buried_point_id = 1 AND city = 'San Francisco' AND click_item_id = 123;
Since my team invests heavily in ClickHouse, I want to leverage ClickHouse for the problem. I wonder if it is a good practice to use the experimental Map data type to store all attributes in a MAP-type column such as {city: San Francisco, click_item_id: 123, ...}, or any other recommendation? Thanks.

Convert column value from null to value of similar row with similar values

sorry for the slightly strange Title I couldn't think of a succinct way to describe my problem.
I have a set of data that is created by one person, the data is structured as follows
ClientID ShortName WarehouseZone RevenueStream Name Budget Period
This data is manually inputted, but as there are many Clients and Many RevenueStreams only lines where budget != 0 have been included.
This needs to connect to another data set to generate revenue and there are times when revenue exists, but no budget exists.
For this reason I have gathered all customers and cross joined them to all codes and then appended these values into the main query, however as warehousezone is mnaually inputted there are a lot of entries where WarehouseZone is null.
This will always be the same for every instance of the customer.
Now after my convuluted explanation there's my question, how can I
-Psuedo Code that I hope makes sense.
SET WarehouseZone = WarehouseZone WHERE ClientID = ClientID AND
WarehouseZone != NULL
Are you sure that a client has one WarehouseZone? otherwise you need a aggregation.
Let's check, you can add a custom column that will return a record like this:
Table.Max(
Table.SelectColumns(
Table.SelectRows(#"Last Step" ,
each [ClientID] = _[ClientID])
, "Warehousezone")
,"Warehousezone"
)
This may create a new column that will bring the max warehousezone of a clientid everytime. At the end you can expand the record to get the value.
P/D The calculation is not so good for performance

Fetch first charge of a customer in stripe

I am reading stripe documentation and I want to fetch the first charge of the a customer. Currently I am doing
charge_list = Stripe::Charge.list(
{
customer: "cus_xxx"
},
"sk_test_xxxxxx"
)
first_charge = charge_list.data.last
Since stripe api returns the charges list in sorted order with the most recent charges appearing first. But I don't think it is a good approach. Can anyone help me with how can I fetch the first charge by a customer or how can I sort the list with descending order of created date so that I could get the first object from the array.
It seems there is no reverse order sorting feature in stripe API.
Also remember the first charge may not be on the first page result set, so you have to iterate using #auto_paging_each.
A quick possible solution:
charge_list = Stripe::Charge.list(
{customer: "cus_xxx", limit: 100 }, # limit 100 to reduce request
"sk_test_xxxxxx")
first_charge = nil
charge_list.auto_paging_each {|c| first_charge = c }
You may want to persist the result somewhere since it is a heavy operation.
But the cleanest solution IMO would be to store all charge records into your DB and make subsequent queries against it.

Resources