How to reorder struct columns? - data-structures

I'm trying to display my results from a CFQuery in a specific order. The order is to be maintained in the database so that it can be manipulated, and there are an unknown number of columns per table. The final row in the table is "ColumnOrder": each column has a number to specify it's sort order, 0 means "don't display". I'm trying to sort by looping say, "y" from 1 to maxCols:
0) do y = 1 to maxCols
1) in the sortColumn result set, use y to lookup the corresponding KEY
2) in the products result set, find the value from the corresponding KEY
3) insert said value into tempStruct[y]
4) loop.
I'm running into a wall trying to use structFindKey(). Here's my code:
<CFQUERY name="qParts" datasource="Pascal">
SELECT * FROM Turbos WHERE PartNumber LIKE <cfqueryparam cfsqltype="cf_sql_char" maxlength="30" value="%#mfr#%"> ORDER BY #sort# ASC
</CFQUERY>
<cfquery name="qPartsOrder" datasource="Pascal">
SELECT * FROM Turbos WHERE PartNumber = 'ColumnOrder'
</cfquery>
<cfset tempStruct=structnew()>
<cfloop index="columnOrder2" from="1" to="#ListLen(qPartsOrder.ColumnList, ',')#">
<cfdump var="#StructFindKey(qPartsOrder, columnOrder2)#">
<cfset tempStruct[columnOrder2] = StructFindKey(#qPartsOrder#, "#columnOrder2#")>
<cfset currentCol = "#ListGetAt(qParts.columnList, columnOrder2, ',')#">
<cfoutput>#qParts[currentCol][qParts.currentrow]# <br/></cfoutput>
</cfloop>
<cfdump var="#tempstruct#">
The line
<cfdump var="#StructFindKey(qPartsOrder, columnOrder2)#">
is throwing a BLANK!! error message, so I can't debug it and I'm stuck.
Any and all help would be appreciated (and YES I have to use SELECT *, this is a generic product display page for displaying ALL information in the database except a few which are denoted by a zero in the order column, remember?).

I'm not 100% sure that I understand the problem you are trying to solve. The is exacerbated by a very unconventional way of setting up a database.
To begin with, if you are not lucky you may run into a documented error where using a cfqueryparam tag throws an error of Value cannot be converted to requested type although I don't know if this still happens with current versions of ColdFusion (8+).
In any case, you can always select all of the columns of the table manually even if you don't know how many of them will ultimately be used:
SELECT partNumber, secondColumn, thirdColumn, ... , nthColumn
FROM Turbos
This is generally preferable to just using SELECT * although it presents some problems if you are in the habit of frequently adding/removing columns to tables.
Unless you need to use a Struct for good reason, you should use an Array instead. Structs don't store ordering information while Arrays do. Here is one way to sort through the records in qParts:
<cfset RecordsArray=ArrayNew(2)>
<cfset ColumnIndex=StructNew()>
<cfloop list="#qPartsOrder.ColumnList#" index="order_column">
<cfset ColumnIndex[order_column]=val(qPartsOrder[order_column][1])>
</cfloop>
<cfloop query="qParts">
<cfloop list="#qPartsOrder.ColumnList#" index="order_column">
<cfif val(ColumnIndex[order_column])>
<cfset RecordsArray[ColumnIndex[order_column]][qParts.CurrentRow]=qParts[order_column][qParts.CurrentRow]>
</cfif>
</cfloop>
</cfloop>
The result of this code will be a 2D array, with the first number referring to the column index and the second index pointing to the record row.
All in all, I think that unless you have zero control over how the database is structured, there is a better way to implement this, starting with how you've set up your database. It would really help to see some fake sample data as well as having a clearer idea of what you are trying to accomplish -- what will you do with these ordered fields once you have them, for example?

Dun you try to use StructSort ?

Related

Convert column value from null to value of similar row with similar values

sorry for the slightly strange Title I couldn't think of a succinct way to describe my problem.
I have a set of data that is created by one person, the data is structured as follows
ClientID ShortName WarehouseZone RevenueStream Name Budget Period
This data is manually inputted, but as there are many Clients and Many RevenueStreams only lines where budget != 0 have been included.
This needs to connect to another data set to generate revenue and there are times when revenue exists, but no budget exists.
For this reason I have gathered all customers and cross joined them to all codes and then appended these values into the main query, however as warehousezone is mnaually inputted there are a lot of entries where WarehouseZone is null.
This will always be the same for every instance of the customer.
Now after my convuluted explanation there's my question, how can I
-Psuedo Code that I hope makes sense.
SET WarehouseZone = WarehouseZone WHERE ClientID = ClientID AND
WarehouseZone != NULL
Are you sure that a client has one WarehouseZone? otherwise you need a aggregation.
Let's check, you can add a custom column that will return a record like this:
Table.Max(
Table.SelectColumns(
Table.SelectRows(#"Last Step" ,
each [ClientID] = _[ClientID])
, "Warehousezone")
,"Warehousezone"
)
This may create a new column that will bring the max warehousezone of a clientid everytime. At the end you can expand the record to get the value.
P/D The calculation is not so good for performance

How to design querying multiple tags on analytics database

I would like to store user purchase custom tags on each transaction, example if user bought shoes then tags are "SPORTS", "NIKE", SHOES, COLOUR_BLACK, SIZE_12,..
These tags are that seller interested in querying back to understand the sales.
My idea is when ever new tag comes in create new code(something like hashcode but sequential) for that tag, and code starts from "a-z" 26 letters then "aa, ab, ac...zz" goes on. Now keep all the tags given for in one transaction in the one column called tag (varchar) by separating with "|".
Let us assume mapping is (at application level)
"SPORTS" = a
"TENNIS" = b
"CRICKET" = c
...
...
"NIKE" = z //Brands company
"ADIDAS" = aa
"WOODLAND" = ab
...
...
SHOES = ay
...
...
COLOUR_BLACK = bc
COLOUR_RED = bd
COLOUR_BLUE = be
...
SIZE_12 = cq
...
So storing the above purchase transaction, tag will be like tag="|a|z|ay|bc|cq|" And now allowing seller to search number of SHOES sold by adding WHERE condition tag LIKE %|ay|%. Now the problem is i cannot use index (sort key in redshift db) for "LIKE starts with %". So how to solve this issue, since i might have 100 millions of records? dont want full table scan..
any solution to fix this?
Update_1:
I have not followed bridge table concept (cross-reference table) since I want to perform group by on the results after searching the specified tags. My solution will give only one row when two tags matched in a single transaction, but bridge table will give me two rows? then my sum() will be doubled.
I got suggestion like below
EXISTS (SELECT 1 FROM transaction_tag WHERE tag_id = 'zz' and trans_id
= tr.trans_id) in the WHERE clause once for each tag (note: assumes tr is an alias to the transaction table in the surrounding query)
I have not followed this; since i have to perform AND and OR condition on the tags, example ("SPORTS" AND "ADIDAS") ---- "SHOE" AND ("NIKE" OR "ADIDAS")
Update_2:
I have not followed bitfield, since dont know redshift has this support also I assuming if my system will be going to have minimum of 3500 tags, and allocating one bit for each; which results in 437 bytes for each transaction, though there will be only max of 5 tags can be given for a transaction. Any optimisation here?
Solution_1:
I have thought of adding min (SMALL_INT) and max value (SMALL_INT) along with tags column, and apply index on that.
so something like this
"SPORTS" = a = 1
"TENNIS" = b = 2
"CRICKET" = c = 3
...
...
"NIKE" = z = 26
"ADIDAS" = aa = 27
So my column values are
`tag="|a|z|ay|bc|cq|"` //sorted?
`minTag=1`
`maxTag=95` //for cq
And query for searching shoe(ay=51) is
maxTag <= 51 AND tag LIKE %|ay|%
And query for searching shoe(ay=51) AND SIZE_12 (cq=95) is
minTag >= 51 AND maxTag <= 95 AND tag LIKE %|ay|%|cq|%
Will this give any benefit? Kindly suggest any alternatives.
You can implement auto-tagging while the files get loaded to S3. Tagging at the DB level is too-late in the process. Tedious and involves lot of hard-coding
While loading to S3 tag it using the AWS s3API
example below
aws s3api put-object-tagging --bucket --key --tagging "TagSet=[{Key=Addidas,Value=AY}]"
capture tags dynamically by sending and as a parameter
2.load the tags to dynamodb as a metadata store
3.load data to Redshift using S3 COPY command
You can store tags column as varchar bit mask, i.e. a strictly defined bit sequence of 1s or 0s, so that if a purchase is marked by a tag there will be 1 and if not there will be 0, etc. For every row, you will have a sequence of 0s and 1s that has the same length as the number of tags you have. This sequence is sortable, however you would still need lookup into the middle but you will know at which specific position to look so you don't need like, just substring. For further optimization, you can convert this bit mask to integer values (it will be unique for each sequence) and make matching based on that but AFAIK Redshift doesn't support that yet out of box, you will have to define the rules yourself.
UPD: Looks like the best option here is to keep tags in a separate table and create an ETL process that unwraps tags into tabular structure of order_id, tag_id, distributed by order_id and sorted by tag_id. Optionally, you can create a view that joins the this one with the order table. Then lookups for orders with a particular tag and further aggregations of orders should be efficient. There is no silver bullet for optimizing this in a flat table, at least I don't know of such that would not bring a lot of unnecessary complexity versus "relational" solution.

Dynamic rowspan in a Table Coldfusion

Just to give some background. We have users in different parts of the region. Our application sends out reports in emails which can be accessed via a URL. This way we keep a track on who accessed the report and various other attributes.
Now, as part of the statistics, I am trying to display the attributes in a HTML table.
I have the query which contains the details about the "Region Name", "UserID", "ReportName", "AccessCount", "ViewDate" etc.
The requirement is to span the "Region Name" across all its rows.
For ex. 10 different users from Melbourne region have access a report XYZ.
My table should have Melbourne with rowspan = "10" and each row having each users' details.
I don't want Melbourne to repeat 10 times in the table.
I've tried using the <cfoutput group="RegionName" tag along with the HTML table but the table is not well-formed.
How can I achieve this?
You should be able to achieve this and you were along the right direction by looking at the groupby attribute (actually the attribute is group=""). The cfml won't look very pretty though (I prefer cfscript and likely would do the following with a few functions). Something like the following should render a well-formed table with the RegionName cell spanning multiple rows in a well-formed manner, just adjust with classes / formatting etc. as you see fit!
<!---
Make sure that myQuery is ordered by
RegionName ASC before anything else to
ensure the group by works as intended
--->
<table>
<thead>
<tr>
<th>Region</th>
<th>User</th>
</tr>
</thead>
<tbody>
<cfoutput query="myQuery" group="RegionName">
<!--- set up an array to hold the users for this region --->
<cfset arrOfUsers = ArrayNew(1)>
<cfoutput>
<cfset ArrayAppend(arrOfUsers,'<td>'&myQuery.UserName&'</td>)>
</cfoutput>
<!--- render time, use the array just generated so we know how many users are in this group --->
<cfloop from="1" to="#ArrayLen(arrOfUsers)#" index="i">
<tr>
<cfif i EQ 1>
<td rowspan="#ArrayLen(arrOfUsers)#">#myQuery.RegionName#</td>
</cfif>
#arrOfUsers[i]#
</tr>
</cfloop>
</cfoutput>
</tbody>
</table>

How do I use the Hive "test in(val1, val2)" built in function?

The Programming Hive book lists a test in built in function in Hive, but it is not obvious how to use it and I've been unable to find examples
Here is the information from Programming Hive:
Return type Signature Description
----------- --------- -----------
BOOLEAN test in(val1, val2, …) Return true if testequals one of the values in the list.
I want to know if it can be used to say whether a value is in a Hive array.
For example if I do the query:
hive > select id, mcn from patients limit 2;
id mcn
68900015 ["7382771"]
68900016 ["8847332","60015163","63605102","63251683"]
I'd like to be able to test whether one of those numbers, say "60015163" is in the mcn list for a given patient.
Not sure how to do it.
I've tried a number of variations, all of which fail to parse. Here are two examples that don't work:
select id, test in (mcn, "60015163") from patients where id = '68900016';
select id, mcn from patients where id = '68900016' and test mcn in('60015163');
The function is not test in bu instead in. In the table 6-5 test is a colum name.
So in order to know whether a value is in a Hive array, you need first to use explode on your array.
Instead of explode the array column, you can create an UDF, as it is explain here http://souravgulati.webs.com/apps/forums/topics/show/8863080-hive-accessing-hive-array-custom-udf-

Sorting values by property from cfoutput

This question for cold fusion programmers, and will be probably asked by me wrongly, because it is open question and actually can't be answered, coz u and me will be in a lack of inf about it :) But still all i need is just a hint or guess about it, so that i can understand and move on to achieve my aim.
So here comes the question:
I have the coldfusion output script
<cfquery datasource="#dsn2#">SELECT * FROM PRODUCT WHERE PRODUCT_ID = #PRODUCT_ID#</cfquery>
where some products are displayed, and all i need is to sort them by the property for example is_purchase whose values can be 0 or 1, plus i have a checkbox:
<input type="checkbox" name="is_purchase_stock" value="1" <cfif isdefined("attributes.is_purchase_stock")>checked</cfif> onClick="sayfalama.submit();">
There are actually functions smth like this(is_saleable_stock), u can see it from the all the script of the whole page with products:
http://vteam.net.ru/_fr/4/list_prices.cfm
Thank you everyone!
You want ORDER BY, something like this:
<cfquery datasource="#dsn2#">
SELECT * FROM PRODUCT
WHERE PRODUCT_ID = #PRODUCT_ID#
ORDER BY is_purchase <cfif StructKeyExists(attributes, "is_purchase_stock")>ASC<cfselse>DESC</cfif>
</cfquery>
EDIT. This is a reply to the question in a comment:
<cfquery datasource="#dsn2#">
SELECT * FROM PRODUCT
WHERE PRODUCT_ID = #PRODUCT_ID#
AND is_purchase = <cfif StructKeyExists(attributes, "is_purchase_stock")>1<cfselse>0</cfif>
</cfquery>

Resources