SSAS: The way to hide certain fields in a table from certain users - dax

For a Microsoft Analysis Services Tabular (1500) data cube, given a Sales table:
CREATE TABLE SalesActual (
Id Int,
InvoiceNumber Char(10),
InvoiceLineNumber Char(3),
DateKey Date,
SalesAmount money,
CostAmount money )
Where the GP Calculation in DAX would be
GP := SUM('SalesActual'[SalesAmount]) - SUM('SalesActual'[CostAmount])
I want to limit some users from accessing cost / GP data. Which approach would you recommend?
I can think of the following:
Split all the Sales and Cost into separate rows and create a MetricType flag 'C', 'S', etc. and set Row-Level Security so that some people won't be able to see lines with costs.
Separate the into two different tables and handle it through OLS.
Any other recommendations?
I am leaning towards approach 1 as I have some other RLS set-up and OLS doesn't mix well with RLS, but I also want to hear from the experts what other approach could fulfill such requirements.
Thanks!
UPDATE: I ended up going with the first approach.
Tabular DB is fast for this kind of split
OLS = renders the field invalid; and I'd have to create and maintain two reports... which is undesirable
RLS is easier to control; and I think cost / GP is the only thing I'd need to exclude for now, but it also gives me some flexibility in the filter if I need to restrict other fields; my data will grow vertically, but I can also add additional data type such as sales budget, sales forecast, expenses and other cost, etc. into the model in the future. All easily controlled by RLS
The accepted answer works and would work for many scenario. I appreciate answerer's sharing, just that it doesn't solve my particular situation.

You can create a role where CLS does the job. There is no gui for CLS, but we can use a script (You can script your current role from SSMS "Script Role As", to modify - but better test this on new one)
{
"createOrReplace": {
"object": {
"database": "YourDatabase",
"role": "CLS1"
},
"role": {
"name": "CLS1",
"modelPermission": "read",
"members": [
{
"memberName": "YourOrganization\\userName"
}
],
"tablePermissions": [
{
"name": "Sales",
"columnPermissions": [
{
"name": "SalesBonus",
"metadataPermission": "none"
},
{
"name": "CostAmount",
"metadataPermission": "none"
}
]
}
]
}
}
}
The key element is TablePermissions and columnPermissions in which we define which column / columns the user cannot use).

Related

Questions regarding inserting relation when using Upsert in Laravel

At hourly basis I have a schedule running a job that connects to my bank and fetches my bank transactions.
Unfortunately, my bank's API return transactions without any transaction IDs, and also transaction details will change over time, example:
Once paying for something the transaction amount is only reserved from the account, not actually withdrawn. So if I pay $100, the amount is withdrawn from my account and a transaction item is created with the following details:
{
"amount": {
"amount": -100,
"currencyCode": "USD"
},
"accountingDate": "2021-01-19",
"description": "PAYMENT",
"transactionCode": "123",
"transactionType": null
}
And a few hours later, or even maybe up to a day, this transaction entry in the API will have changed to something like:
{
"amount": {
"amount": -100,
"currencyCode": "USD"
},
"accountingDate": "2021-01-19",
"description": "WALMART",
"transactionCode": "R_123",
"transactionType": "Purchase"
}
In this case it is three different values being changed: description, transactionCode and transactionType.
The description field gets updated with details of the vendor or receiver of the money.
The transactionCode field references the bank's transaction categorization (which is not public), but when the code is prefixed with R_ it means it's withdrawn and accounted.
The transactionType field is updated with information in regards of the type of transaction (not neccessary in correlation with the transactionCode field). Examples are "Visa", "Purchase", "Bill payment", "Fees", "Transfer between own accounts" etc.
So to handle this in my application I use the upsert function where I basically check for changes in any of the fields from the API which in 99% of the cases do work.
But; I thought I would extract the transactionCode values to a separate table, using them as categories in my own app. What is the quickest and easiest way to extract the data (regardless of its R_ prefix) with doing lots of queries to check the database for already existing value?
I'm thinking of something like for each transaction:
Get transactionCode value
Strip the R_ prefix
Check if transaction code exists in database
Insert if not
Create relation for transaction
But for several hundred transactions at a time it might not be suitable to do a check for each transaction code value. Would it make more sense to fetch all transaction codes before looping the transactions and then create the relations afterwards?
And also, how do I insert relation when using upsert?

Sorting by product price considering special prices (client, group, country)

we have a shop with a few products (~ 5000).
There are, of course, category overview sites which show all products that are in the current category. A requirement is that all products can be sorted by price (ASC and DESC).
This already works (partially), because the problem is, in our Elasticsearch, we currently only have the "original" price, so any product discounts are not considered and therefore the sorting does not work correctly.
My task is it now to fix that.
But I am already struggling with "how to" persist the "special prices" into Elasticsearch.
The problem is every product can be discounted in general, on a customer level, on a customer group level and on a country level.
So I imagine a structure like this would be a start:
# current
{
"articleNumber": "12345",
...
"price": 9.99,
...
}
# new
{
"articleNumber": "12345",
...
"price": 9.99,
...
"special_prices": [
{
"customer": "123456",
"client_price": 5.99,
"client_group_price": null,
"country_de": null
"country_es": null,
...
},
...
]
}
Following thoughts:
The specials prices could be stored as a nested object inside the product index (but I am not sure how to do the sorting on it later)
Maybe I could create a second index with prices, then I would have two queries, but I guess that would be ok? Because I have to build a whole matrix with every customer we have (also ~5000), with every product with every possible price. But if I would have a second index then I would have to join and maybe the sorting is incorrect then
If possible, I would like to only persist any prices if a product has a special price and if not, I don't want to blow up the index
I tried something with painless to return the special price if one exists for the product and customer, but this gives me this:
...
"script": "if (doc['special_prices.customer'] != null && doc['special_prices.customer'].value == '123456') { return 12.45; } else { return doc['price']; }",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [special_prices.customer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
...
Maybe something like SQL ORDER BY CASE WHEN would be an option?
Any ideas on how I should model and persist the special prices? And how can I achieve the sorting?
Is joining a second index a good idea?
Best regards
The error you see is because special_prices.customer is not indexed as keyword, and instead is a text (which allows full-text search). If you didn't specify mapping explicitly, Elasticsearch most likely created a keyword for you. Just try to replace special_prices.customer with special_prices.customer.keyword in your script.
The idea of using a script for sorting is good, given that you only have 5000 documents. Scripts do not have good performance, but in your case this might not matter.
In general this looks like a tough case, because you need some kind of joining between products and prices, and Elasticsearch is not good at joins. It has got some joining options: nested datatype, join datatype (a.k.a. parent-child), and denormalization. The last one you have already considered - when you put different prices in the original product document.
Unfortunately I can't recommend one over another, because there is no single recipe. I would try with scripts, and if performance is not good enough consider remodelling the data.
Hope that helps!

How to transform nested JSON-payloads with Kiba-ETL?

I want to transform nested JSON-payloads into relational tables with Kiba-ETL. Here's a simplified pseudo-JSON-payload:
{
"bookings": [
{
"bookingNumber": "1111",
"name": "Booking 1111",
"services": [
{
"serviceNumber": "45",
"serviceName": "Extra Service"
}
]
},
{
"bookingNumber": "2222",
"name": "Booking 2222",
"services": [
{
"serviceNumber": "1",
"serviceName": "Super Service"
},
{
"serviceNumber": "2",
"serviceName": "Bonus Service"
}
]
}
]
}
How can I transform this payload into two tables:
bookings
services (every service belongsTo a booking)
I read a about yielding multiple rows with the help of Kiba::Common::Transforms::EnumerableExploder at wiki, blog, ... etc.
Would you solve my use-case by yielding multiple rows (the booking and multiple services), or would you implement a Destination which receives a whole booking and calls some Sub-Destinations (i.e. to create or update a service)?
Author of Kiba here!
This is a common requirement, but it can (and this is not specific to Kiba) be more or less complex to handle. Here are a few points you'll need to think about.
Handling of foreign keys
The main problem here is that you'll want to keep the relationships between services and bookings, once they are inserted.
Foreign keys using business keys
A first (most easy) way to handle this is to use a foreign-key constraint on "booking number", and make sure to insert that booking number in each service row, so that you can leverage it later in your queries. If you do this (see https://stackoverflow.com/a/18435114/20302) you'll have to set a unique-constraint on "booking number" in the bookings table target.
Foreign keys using primary keys
If you instead prefer to have a booking_id which points to the bookings table id key, things are a bit more complicated.
If this is a one-off import targeting an empty table, I recommend that you arbitrarily force the primary key using something like:
transform do |r|
#row_index ||= 0
#row_index += 1
r.merge(id: #row_index)
end
If this not a one-off import, you will have to:
* Upsert bookings in a first pass
* In a second pass, look-up (via SQL queries) "bookings" to figure out what is the id to store in booking_id, then upsert the services
As you see it's a bit more work, so stick with option 1 if you don't have strong requirements around this (although option 2 is more solid on the long run).
Example implementation (using Kiba Pro & business keys)
The simplest way to achieve this (assuming your target is Postgres) is to use Kiba Pro's SQL Bulk Insert/Upsert destination.
It would go this way (in single pass):
extend Kiba::DSLExtensions::Config
config :kiba, runner: Kiba::StreamingRunner
source Kiba::Common::Sources::Enumerable, -> { Dir["input/*.json"] }
transform { |r| JSON.parse(IO.read(r)).fetch('bookings') }
transform Kiba::Common::Transforms::EnumerableExploder
# SNIP (remapping / renaming of fields etc)
first_destination = nil
destination Kiba::Pro::Destinations::SQLBulkInsert,
row_pre_processor: -> (row) { row.except("services") },
dataset: -> (dataset) {
dataset.insert_conflict(target: :booking_number)
},
after_read: -> (d) { first_destination = d }
destination Kiba::Pro::Destinations::SQLBulkInsert,
row_pre_processor: -> (row) { row.fetch("services") },
dataset: -> (dataset) {
dataset.insert_conflict(target: :service_number)
},
before_flush: -> { first_destination.flush }
Here we iterate over each input file, parsing it and grabbing the "bookings", then generating one row per element of "bookings".
We have 2 destinations, doing "upsert" (insert or update), plus one trick to ensure we'll save the parent rows before we insert the children, to avoid a failure due to missing pointed record.
You can of course implement this yourself, but this is a bit of work though!
If you need to use primary-key based foreign keys, you'll have (likely) to split in 2 pass (one for each destination), then add some form of lookup in the middle.
Conclusion
I know that this is not trivial (depending on what you'll need, & if you'll use Kiba Pro or not), but at least I'm sharing the patterns that I'm using in such situations.
Hope it helps a bit!

Dynamic Achievement System algorithm / design

I'm developing this Achievement System and it must have a CRUD, that admins access to create new achievements and it's rules. I need some help with the design & algorithm of this so it can easily evolve with new rules as admins ask.
Rules sample
Medal one: must complete 5 any courses with a score of at least 90
Medal two: must complete two specific courses with a score of at least 85
Medal three: must be top 5 in general ranking at least once
Medal four: must have more than 5000 points
I'll basically store that as metadata in a relational database, probably with these columns below:
action
action quantity
course quantity
score
id course
ranking
position
points
I want to know if there is any known algorithm / design to this kind of problem? Or perhaps I should store them differently to make it easier? Don't know, I want suggestions.
Your doubts may be right. In my opinion, a database is the wrong way to organize this data. Every new kind of achievement you want to create would add extra columns to your database, and most achievements wouldn't use most of the columns. A more flexible data structure, one that doesn't expect for every entry to use all of the possible achievement criteria at once by default, would probably be more useful. Most languages support JSON, so I suggest you use that. The structure could be something like this:
[
{
"name": "Medal One",
"requirements": {
"coursesCompleted": 5,
"scoreMin": 90
}
},
{
"name": "Medal Two",
"requirements": {
"specificCoursesCompleted": [
"Course 1",
"Course 2"
],
"scoreMin": 85
}
},
{
"name": "Medal Three",
"requirements": {
"generalRankingMin": 5
}
},
{
"name": "Medal Four",
"requirements": {
"scoreMin": 5000
}
}
]
You can see here how the criteria types are sometimes reused, but they can be omitted when not needed and new ones can be added to a few achievements without bloating the rest of the dataset as well.
PS: I made the criteria names very verbose for demonstration purposes; shortening them or not in actual use is up to preference.

DynamoDB: What's the best way to structure and query a sorted list of timestamped logs?

In the interest of better understanding Amazon's DynamoDB, Lambda functions and IAM roles (I'll stick to DynamoDB in this question), I'm setting up a Linux device to listen for new DynamoDB items and audibly read out updates that are being added by other functions at a regular interval. My goal is to query or scan items, returning those items in ascending order since a specific timestamp (the last time the device checked).
Here's the item structure I'm using so far:
{
"id": {
"S": "1eb4520d44715b6daa5f9d907fe43aab" //md5sum of "time"
},
"message": {
"S": "I'm creating the audible reporting log now."
},
"status": {
"S": "working"
},
"time": {
"S": "1452297505" //timestamp: should probably add milliseconds for sake of unique "id"
}
}
"id" is the partition key. "time" is the sort key. Looking at this now, I'm guessing I should probably make "time" a number, not a string...
Query or scan? Query seems like the correct option for sorting, but it requires a specific partition ID in the query (at least in in the AWS website query tool), so perhaps I'm adding those incorrectly. Scan loads all items and I'm guessing that the sort is not automatic or an option (at least not in in the AWS website query tool). I really only want to load items greater than a timestamp value, sorted.
Where am I off in my thinking? I appreciate the assistance in advance.
UPDATE
After further experimentation with AWS-CLI and DynamoDB, I ended up using a slightly different solution. Since this is a small scale "hello world" type of project, all update items are added to the same table with a single partition key, "SF Reporter", for now. This could scale if I decide to start monitoring additional "reporter"/service updates with separate queries and/or devices.
{
"datetime": { //sort key
"S": "2016-01-11T05:15:02"
},
"message": {
"S": "It is all good."
},
"reporter": { //primary partition key
"S": "SF Reporter"
},
"status": {
"S": "ok"
}
}
The JSON query itself looks something like this (abbreviated node.js example):
var AWS = require("aws-sdk");
AWS.config.credentials = new AWS.SharedIniFileCredentials({ profile: 'default' });
AWS.config.update({"region": "us-west-2"});
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: "spoken_reports",
KeyConditionExpression: "#reporter = :reporter and #datetime >= :datetime",
ExpressionAttributeNames:{
"#reporter": "reporter",
"#datetime": "datetime"
},
ExpressionAttributeValues: {
":reporter":"SF Reporter",
":datetime":"2016-01-11T05:15:02"
}
};
docClient.query(params, onUpdatesReceived);
var onUpdatesReceived = function(err, data) {
if (err) {
console.log(err, err.stack);
} else {
console.log(data);
}
}
The query gets the latest updates sorted by a string timestamp (defaults to ascending order in this example). This allows for some scaling as I can have multiple devices checking the same table for the latest updates. I would create a scheduled query/function to clear out old updates once in a while to keep things light.
Dead simple way:
You should set up a global secondary index, and project "isNew" as the primary/hash key to it, with timestamp as the range key.
On creation of an entry, mark isNew as a UUID or something. This will make the table item project into the index.
When you need to check for data, scan the secondary index - the index will have only the results which are new. Then, updateItem the items you have read within the table itself to delete the isNew key on the item. The item will be removed from the secondary index, so it is not read again.
If you stick with this table design, scanning the entire table is the only option you have, for the reasons you've mentioned: for querying, you need a partition key, which is something your devices have no way of knowing beforehand.
There is another solution that comes to my mind:
Let's say your current table is called T1. Create another table, T2, that has deviceID as partition key and timestamp as sort key.
You define a AWS Lambda function on T1's stream that will, on any update, push that row in T2 as well, one per device.
Now whenever any of your device wakes up, it queries (not scan) T2 with its own device id. Processes all the rows and deletes them.
In other words, T2 will always have all the rows that a given device is yet to process.

Resources