django-queryset: query many columns at once - django-queryset

I have a django app (w/postgresql database) that stores information on nest conditions for an endangered bird. Data is collected over multiple sites with different #'s of nests at each site. The nest conditions also have a unique date range per site.
DB Columns: site_name, date, nest_01, nest_02, nest_03 ... all the way to nest_1350.
The nests have values of either empty, 1E, 2E, 3E, or 4E.
Is there a way to do 1 query of all (1-1350) of the nest columns looking for '1E'?
Thanks

Do you actually have a model with 1350+ columns?
If I were you I'd normalize the whole setup like this:
class Site(Model):
site_name = Charfield()
date = DateField()
class Nest(Model):
name = Charfield()
condition = Charfield()
site = ForeignKey(Site)
And then query it like this:
site = Site.objects.get(pk=1) # just a Site, I assume you know a Site
nests = Nest.objects.filter(site=site).filter(condition='1E') # your desired nests

Related

In Quicksight is it possible to define values in an array from a split function as dimensions?

I have a field subspecialty that can contain multiple values as a string separated by commas.
I am trying to get the sum of another field ('topic_ids) grouped by the values in subspecialty`. In other words... number of topics completed grouped by subspecialty, but there are multiple subspecialties listed in the field in each row.
So far I've managed to use split to separate subspecialty values into different fields.
There are up to five values in the field at once, so I created 5 different custom fields.
split({subspecialty[flattened_content]},',',1)
split({subspecialty[flattened_content]},',',2)
split({subspecialty[flattened_content]},',',3)
split({subspecialty[flattened_content]},',',4)
split({subspecialty[flattened_content]},',',5)
so something like subspecialty[flattened_content] = {Neuro, Emergency, Head and Neck, Vascular, Physics}
becomes
subspecialty_split_1 = "Neuro"
subspecialty_split_2 = "Emergency"
subspecialty_split_3 = "Head and Neck"
subspecialty_split_4 = "Vascular"
subspecialty_split_5 = "Physics"
with these fields split, I can now create additional custom fields to count topic_id where any one of subspecialty_split_n fields = a particular value.
countIf
(
{topic_id},
{subspecialty_split_1} = "Neuro"
OR
{subspecialty_split_2} = " Neuro".
OR
{subspecialty_split_3} = " Neuro"
OR
{subspecialty_split_4} = " Neuro"
OR
{subspecialty_split_5} = " Neuro"
)
This has at least allowed me to create a table that counts topic ID by individual subspecialties, but because they are custom aggregations, I can't do anything like sorting, using charts, etc.

Dynamics crm + Plugin code to store sum formula across a entity collection

I have the below requirement to be implemented in a plugin code on an Entity say 'Entity A'-
Below is the data in 'Entity A'
Record 1 with field values
Price = 100
Quantity = 4
Record 2 with field values
Price = 200
Quantity = 2
I need to do 2 things
Add the values of the fields and update it in a new record
Store the Addition Formula in a different config entity
Example shown below -
Record 3
Price
Price Value = 300
Formula Value = 100 + 200
Quantity
Quantity Value = 6
Formula Value = 4 + 2
Entity A has a button named "Perform Addition" and once clicked this will trigger the plugin code.
Below is the code that i have tried -
AttributeList is the list of fields i need to perform sum on. All fields are decimal
Entity EntityA = new EntityA();
EntityA.Id = new Guid({"Guid String"});
var sourceEntityDataList = service.RetrieveMultiple(new FetchExpression(fetchXml)).Entities;
foreach (var value in AttributeList)
{
EntityA[value]= sourceEntityDataList.Sum(e => e.Contains(value) ? e.GetAttributeValue<Decimal>(value) : 0);
}
service.Update(EntityA);
I would like to know if there is a way through linq I can store the formula without looping?
and if not how can I achieve this?
Any help would be appreciated.
Here are some thoughts:
It's interesting that you're calculating values from multiple records and populating the result onto a sibling record rather than a parent record. This is different than a typical "rollup" calculation.
Dynamics uses the SQL sequential GUID generator to generate its ids. If you're generating GUIDs outside of Dynamics, you might want to look into leveraging the same logic.
Here's an example of how you might refactor your code with LINQ:
var target = new Entity("entitya", new Guid("guid"));
var entities = service.RetrieveMultiple(new FetchExpression(fetchXml)).Entities.ToList();
attributes.ForEach(a => target[a] = entities.Sum(e => e.GetAttributeValue<Decimal>(a));
service.Update(target);
The GetAttributeValue<Decimal>() method defaults to 0, so we can skip the Contains call.
As far as storing the formula on a config entities goes, if you're looking for the capability to store and use any formula, you'll need a full expression parser, along the lines of this calculator example.
Whether you'll be able to do the Reflection required in a sandboxed plugin is another question.
If, however, you have a few set formulas, you can code them all into the plugin and determine which to use at runtime based on the entities' properties and/or config data.

How to design querying multiple tags on analytics database

I would like to store user purchase custom tags on each transaction, example if user bought shoes then tags are "SPORTS", "NIKE", SHOES, COLOUR_BLACK, SIZE_12,..
These tags are that seller interested in querying back to understand the sales.
My idea is when ever new tag comes in create new code(something like hashcode but sequential) for that tag, and code starts from "a-z" 26 letters then "aa, ab, ac...zz" goes on. Now keep all the tags given for in one transaction in the one column called tag (varchar) by separating with "|".
Let us assume mapping is (at application level)
"SPORTS" = a
"TENNIS" = b
"CRICKET" = c
...
...
"NIKE" = z //Brands company
"ADIDAS" = aa
"WOODLAND" = ab
...
...
SHOES = ay
...
...
COLOUR_BLACK = bc
COLOUR_RED = bd
COLOUR_BLUE = be
...
SIZE_12 = cq
...
So storing the above purchase transaction, tag will be like tag="|a|z|ay|bc|cq|" And now allowing seller to search number of SHOES sold by adding WHERE condition tag LIKE %|ay|%. Now the problem is i cannot use index (sort key in redshift db) for "LIKE starts with %". So how to solve this issue, since i might have 100 millions of records? dont want full table scan..
any solution to fix this?
Update_1:
I have not followed bridge table concept (cross-reference table) since I want to perform group by on the results after searching the specified tags. My solution will give only one row when two tags matched in a single transaction, but bridge table will give me two rows? then my sum() will be doubled.
I got suggestion like below
EXISTS (SELECT 1 FROM transaction_tag WHERE tag_id = 'zz' and trans_id
= tr.trans_id) in the WHERE clause once for each tag (note: assumes tr is an alias to the transaction table in the surrounding query)
I have not followed this; since i have to perform AND and OR condition on the tags, example ("SPORTS" AND "ADIDAS") ---- "SHOE" AND ("NIKE" OR "ADIDAS")
Update_2:
I have not followed bitfield, since dont know redshift has this support also I assuming if my system will be going to have minimum of 3500 tags, and allocating one bit for each; which results in 437 bytes for each transaction, though there will be only max of 5 tags can be given for a transaction. Any optimisation here?
Solution_1:
I have thought of adding min (SMALL_INT) and max value (SMALL_INT) along with tags column, and apply index on that.
so something like this
"SPORTS" = a = 1
"TENNIS" = b = 2
"CRICKET" = c = 3
...
...
"NIKE" = z = 26
"ADIDAS" = aa = 27
So my column values are
`tag="|a|z|ay|bc|cq|"` //sorted?
`minTag=1`
`maxTag=95` //for cq
And query for searching shoe(ay=51) is
maxTag <= 51 AND tag LIKE %|ay|%
And query for searching shoe(ay=51) AND SIZE_12 (cq=95) is
minTag >= 51 AND maxTag <= 95 AND tag LIKE %|ay|%|cq|%
Will this give any benefit? Kindly suggest any alternatives.
You can implement auto-tagging while the files get loaded to S3. Tagging at the DB level is too-late in the process. Tedious and involves lot of hard-coding
While loading to S3 tag it using the AWS s3API
example below
aws s3api put-object-tagging --bucket --key --tagging "TagSet=[{Key=Addidas,Value=AY}]"
capture tags dynamically by sending and as a parameter
2.load the tags to dynamodb as a metadata store
3.load data to Redshift using S3 COPY command
You can store tags column as varchar bit mask, i.e. a strictly defined bit sequence of 1s or 0s, so that if a purchase is marked by a tag there will be 1 and if not there will be 0, etc. For every row, you will have a sequence of 0s and 1s that has the same length as the number of tags you have. This sequence is sortable, however you would still need lookup into the middle but you will know at which specific position to look so you don't need like, just substring. For further optimization, you can convert this bit mask to integer values (it will be unique for each sequence) and make matching based on that but AFAIK Redshift doesn't support that yet out of box, you will have to define the rules yourself.
UPD: Looks like the best option here is to keep tags in a separate table and create an ETL process that unwraps tags into tabular structure of order_id, tag_id, distributed by order_id and sorted by tag_id. Optionally, you can create a view that joins the this one with the order table. Then lookups for orders with a particular tag and further aggregations of orders should be efficient. There is no silver bullet for optimizing this in a flat table, at least I don't know of such that would not bring a lot of unnecessary complexity versus "relational" solution.

PowerBi DAX equivalent for SUMIFS with current row value as filter

In Excel I could, if I was in a table called 'Sales' that had four columns
Sales
Month, CustomerId, ProductId, TotalQuantity
Jan,1, CAR,
Feb,1, CAR,
I could add a formula:
=SUMIFS(Sales[Quantity],Sales[CustomerId],[#[CustomerId]])
That would go to the Sales table and sum the CustomerID column filtered by the CustomerID of the current row where the formula has been entered.
I am attempted to replicate this in a PowerBI Calculated Row but I can't get the # working for a row reference. It comes across like
TotalQuantity = CALCULATE(SUM(Sales[Quantity]),Sales[CustomerId] = Sales[CustomerId]))
Any idea how to get the equivalent # working?
I think the key function you are missing is EARLIER. That is not surprising because it has a misleading name - it really means "Current Row". You also need a FILTER function in the Filter parameter of CALCULATE, to reset the filter context to the entire table.
So your New Column function might look like this:
TotalQuantity = CALCULATE(SUM(Sales[Quantity]), FILTER(Sales, Sales[CustomerId] = EARLIER (Sales[CustomerId])))
Here's a neat example, from the most accessible source site for DAX formulas:
http://www.powerpivotpro.com/2013/07/writing-a-subtotal-calc-column-aka-the-simplest-use-of-the-earlier-function/
And FWIW here is the official doco on EARLIER:
https://msdn.microsoft.com/en-us/library/ee634551.aspx

Play Framework: How to render a table structure from plain SQL table

I would be happy to get a good way to get the "table" structure from a plain SQL table.
In my specific case, I need to render JSON structure used by Google Visualization API "datatable" object:
http://code.google.com/apis/chart/interactive/docs/reference.html#DataTable
However, having an example in HTML would help either.
My "source" is a plain SQL table of "DailySales": its columns are "Day" (date), "Product" and "DailySaleTotal" (daily sale for that product). Please recall that my "model" reflects the 3-column table above.
The table columns should be "products" (suppose we have very small number of such). Each row should represent a specific date, and the row data are the actual sales for that day.
Date Product1 Product2 Product3
01/01/2012 30 50 60
01/02/2012 35 3 15
I was trying to use nested #{list} tags in a template, but unfortunately I failed to find a natural way to provide a template with a "list" to represent the "row data".
Of course, I can build a "helper object" in Java that will build a list of the "sales data" items per date - but this looks very weird to me.
I would be thankful to anyone who can provide an elegant solution.
Max
When you load your model order it by date and product name. Then in your controller build a map with date as index and list of model objects that have the same date as value of the map
Then in your template you have a first list iteration on map keys for the rows and a second list iteration on the list value for the columns.
Something like
[
#{list modelMap.keys, as: 'date'}
[${date},#{list modelMap.get(date), as: 'product'}${product.dailySaleTotal}#{ifnot product_isLast},#{/ifnot}#{/list}]#{ifnot date_isLast},#{/ifnot}
#{/list}
]
you can then adapt your json rendering to the exact structure you want to have. Here it is an array of arrays.
Instead of generating the JSON yourself, like Seb suggested, you can generate it:
private static Result queryToJsonResult(String sql) {
SqlQuery sqlQuery = Ebean.createSqlQuery(sql);
return ok(Json.toJson(sqlQuery.findList()));
}

Resources