Let's take the Excel Pivot data structure (or concept), where we have a hierarchy on the Rows (x-) and on the Cols (y-axis).
Would it be possible (or have any attempts been made) to address location in the pivot table using XPath? I know there is MDX for a cube which I'm familiar with (of n-Dimensionality, or so it says, but in actuality the display is almost always in 2-dimensions), but what about using Xpath to do the same? For example, to address the Cat (subtotal) row, it seems like the following could be used:
Format: (Rows(Xpath), Cols(Xpath), Vals(List))
(
Rows: '//Animal[#Value="Cat"]',
Cols: '//' (or empty --> means everything)
Vals: '', empty for all values, or a list of the specific values
)
A few more examples:
Row for Dog named Sally
('//Animal[#value="Dog"]/Name[#value="Sally"],,)
Column for F(emale) dogs
(,'//Gender[#value="F"],)
Value ("cell") for Booker, Male
('//Animal[#value="Cat"]/Name[#value="Booker"]', '//Gender[#value="M"]', )
Rows for Book, Pebbles
('//Animal[#value="Cat"]/Name[#value="Booker" or #value="Tood",,)
Would this be a valid way to address a two-dimensional Pivot? What might be the challenges if any of using this approach? Note the above pivot table probably isn't the best example because an animal will be either M or F but not both, so that column is in effect irrelevant, but even so hopefully it's a good-enough of an example to communicate my intent.
Related
My first task is to add two new columns to a table, first column stores the values of M and X fields values in a single column(as a single unit with a pipe separator) and second column stores O and Z fields values in a single column(as a single unit with a pipe separator).
second task selecting agency and external letter rating(shown in image) from drop down and after saving the form the value from fields M and X should move to N and Y and this values should be stored in table column that are created from task one, Now if we save the form the values should move to O and Z fields in forms and this should continue.
Can any one help me how to proceed with this and I don't know how to separate a column value into pieces and display on form.
Better if you propose any new method that does the same work.
Adding columns:
That's a bad idea. Concatenating values is easy; storing them into a column as well. But, then - in the next step - you have to split those values into two values (columns? rows?) to be joined to another value and produce result. Can you do it? Sure. Should you? No.
What to do? If you want to store 4 values, then add 4 columns to a table.
Alternatively, see if you can create a master-detail relationship between two tables so you'd actually create a new table (with a foreign key to existing table) with two additional columns:
one that says is value stored related to M or Y
value itself
It looks like more job to do, but - should pay off in the future.
Layout:
That really looks like a tabular form, which only supports what I previously said. You can't "dynamically" add rows (or, even if you could, that's really something you should avoid because you'd have to add (actually, display) separate items (not rows that share the same item name).
Context:
I have a data set for the weights of truck and trailer combinations coming into my site over the span of a few years. I have organized my data by seasons as I am trying to prove that the truck:trailers in winter are noticeably heavier due to ice, snow, and mud. The theory is, if the tare weight is higher in this season (the weight of the truck after it empties its load) than its Avg tare weight (which I need to calculate from the data) it can be deduced that the truck:trailer combinations are coming in with extra weight that we pay for in part as some snow/ice/mud falls off in the trailer emptying process.
What I've done so far:
I've defined a custom date range for my seasons
I've grouped Truck:Trailer by: count to get a duplicates column and, all rows to keep all my details
I've filtered out every combination I've seen less than 50 times, as i want good representation for each truck:trailer combo so that I can better emphasize repeated patterns
I've added an index column to better keep track of the individuals before expanding the details
What I need to do:
I only want to work with truck:trailer combinations which have weighed in for all four seasons at least once
I need to find the average tare weight of the truck:trailer combinations based over the extended range for both summer and autumn (the dry time of the year) while preserving the raw tare data for all seasons, as I need to eventually compare the winter tare values to this average.
example of my data
When I'm finished I'd like the data to look something like this
Pivot Chart
query data
For your first question (all seasons) you can add a column that holds the distinct count of the values in [Season] for each [Driver:Trailer]. Then filter your table on that column, keeping only the 4's. To achieve this, add the following m-code to your script in the Advanced Editor. Change the part after in to #"DistinctCount Season"
#"DistinctCount Season" = Table.Join(#"insert name previous step","Driver:Trailer",
Table.Group(#"insert name previous step", {"Driver:Trailer"},
{{"DistinctCountSeasons", each Table.RowCount(Table.Distinct(_,"Season")),
type number}}),"Driver:Trailer")
Insert the name of your previous step where indicated.
For second question:
You can use a matrix-visual for that in you report. First create a measure:
[AverageTare] = AVERAGE(table'[Tare])
Then put [Season] on Rows and the [AverageTare] on Values. You can create a group (right-click on [Season] in the FIELDS-pain) called [DrySeason], to combine the values for Spring and Summer.
If that doesn't work for you, explore the AVERAGEX function.
EDIT
In excel you can use a pivottable. Put [Season] on Rows and the [AverageTare] on Values. Right-click a value in the pivottable. Select Value Field Setting and choose Average. Then select the Seasons you want to group, right-click and select Group.
EDIT 2
To add a column in the Power Query Editor that holds the average [Tare] for the [Season] in each row, add the following steps to your script in the Avanced Editor:
#"GroupedSeasonAvg" = Table.Group(#"Insert name previous step", {"Season"}, {{"AVG", each List.Average([Tare]), type number}}),
#"JoinOnSeason" = Table.NestedJoin(#"Insert name previous step",{"Season"},GroupedSeasonAvg,{"Season"},"AVGGrouped"),
#"ExtractSeasonAVG" = Table.ExpandTableColumn(JoinOnSeason, "AVGGrouped", {"AVG"}, {"SeasonAVG"})
It works something like this:
"GroupedSeasonAvg" : Creates a table with the avereges for each [Season]
"JoinOnSeason": Creates a new column with tables joining the [Season] value for each row to [Season] in the grouped table.
#"ExtractSeasonAVG": Expand each table and keep only [AVG].
The Problem
Given a database of 10,000 items, I would like to go about doing the following:
search by any of the columns
match results by a variable number of letters at the beginning of the result
print out duplicate results
the rest of the information for that entry is appended to the search
Consider the table in ms-access (omitting the primary key)
Header1|Header2|Header3
apple rotten green
apple fresh yellow
pear fresh blue
orange rotten pink
Given the following search by Header1; apple, pear
I would receive the result:
apple, rotten, green
apple, fresh, yellow
pear, fresh, blue
Similarly, given the search by Header1; pear, orange, pear
I would receive the result:
pear, fresh, blue
orange, rotten, pink
pear, fresh, blue
What I'm doing
My approach is to store the header you are searching for and an array containing the elements that you searched for. I retrieve the WHOLE database (it's large so this wouldn't be the preferred method) and order it by the header chosen, and also sort the input that the user gave me (both lists in ascending order).
By using simple comparisons (strComp = 0, -1, 1) I increment counter variables for the respective list. This, however, does not account for the cases where the user inputs a duplicate AND the table has a duplicate result. It only accounts for one or the other of those cases.
My solution to that issue would be to "roll" up and down when we find a result to check for nearby results as well, but that seems horrible, nor does it account for fuzzy string matching.
Any recommendations? The solution should somehow stay O(n) if possible given that the user input can (and will) be > 100,000
I suggest you construct a dynamic UNION ALL query, with one SELECT statement for each search.
UNION ALL returns all rows, including duplicates.
e.g.
SELECT * FROM myTable WHERE Header1 LIKE 'apple*'
UNION ALL
SELECT * FROM myTable WHERE Header1 LIKE 'pear*'
UNION ALL
SELECT * FROM myTable WHERE Header1 LIKE 'apple*'
With indexes on the columns that are searched, this should be reasonably fast.
My solution;
First: Store the data (comma delimited) from the database in a dictionary as value with the key being the value for the searched header. If an entry already exists, simply append the new data to the previous data with a bar delimiter.
Second: loop through the list of inputs and match them (with a simple first N characters comparison - if necessary) with the items in the dictionary. If you found a match, get the value and split by delimiters accordingly.
I believe this stays an O(n) solution so long as the first N characters comparison is not used.
I'm new to Oracle, I'm using oracle 11g. I'm storing postal codes of UK. Values are like these.
N22 5HF
SW1 4JD
N14 8IT
N22 1JT
E1 5DP
e1 8DS
E3 8TU
I should be able to easily compare first four characters of each postal code.
What is the best data type to store these data ?
As a slight variation on Lalit's answer, since you want the outward code rather than a fixed substring of the first four characters (which could incude a space and the start of the inward code), you can create a virtual column based on the first word of the value:
postcode varchar2(8),
outward_code generated always as
(substr(postcode, 1, instr(postcode, ' ', 1, 1) - 1))
And optionally, but probably if you're using this to search, an index on the virtual column.
This assumes the post codes are formatted properly in the first place. It won't work if you don't always have the space between the outward and inward codes. And to answer your original question, the actual post code should be a varchar2(8) column to hold alphanumeric valus up to the maximum size and with the standard format.
SQL Fiddle demo.
I should be able to easily compare first four characters of each postal code.
Then keep these first four characters in a separate column. And index this column. You could keep the other characters in different column. Now, if the codes are a mixture of alphanumeric characters, then you are left with VARCHAR2 data type.
Your query predicate would like -
WHERE post_code_col = substr('N22 5HF', 1, 4)
Thus the indexed column post_code_col would be efficient in performance.
On 11g, you have the option to create a virtual column. However, indexing it would be equivalent to a function-based index. So I woukd prefer the first way as I suggested above.
It is better to normalize the table during the design phase, else the issues would start creeping in later.
In my opinion you should use varchar2 data type because this field will not going to be in mathematical calculations (they should not be int or decimal) and these fields are not big enough (so this should not be text)
I'm writing a custom search function, and I have to filter through an association.
I have 2 active record backed models, cards and colors with a has_many_and_belongs_to, and colors have an attribute color_name
As my DB has grown to around 10k cards, my search function gets exceptionally slow because i have a select statement with a query inside it, so essentially im having to make thousands of queries.
i need to convert the array#select method into an active record query, that will yield the same results, and im having trouble coming up with a solution. the current (relevant code) is the following:
colors = [['Black'], ['Blue', 'Black']] #this is a parameter retrieved from a form submission
if color
cards = color.flat_map do |col|
col.inject( Card.includes(:colors) ) do |memo, color|
temp = cards.joins(:colors).where(colors: {color_name: color})
memo + temp.select{|card| card.colors.pluck(:color_name).sort == col.sort}
end
end
end
the functionality im trying to mimic is that only cards with colors exactly matching the incoming array will be selected by the search (comparing two arrays). Because cards can be mono-red, red-blue, or red-blue-green etc, i need to be able to search for only red-blue cards or only mono-red cards
I initially started along this route, but i'm having trouble comparing arrays with an active record query
color_objects = Color.where(color_name: col)
Card.includes(:colors).where('colors = ?', color_objects)
returns the error
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: syntax error
at or near "SELECT" LINE 1: ...id" WHERE "cards"."id" IN (2, 3, 4) AND
(colors = SELECT "co...
it looks to me like its failing because it doesnt want to compare arrays, only table elements. is this functionality even possible?
One solution might be to convert the habtm into has many through relation and make join tables which contain keys for every permutation of colors in order to access those directly
I need to be able to search for only green-black cards, and not have mono-green, or green-black-red cards show up.
I've deleted my previos answer, because i did not realized you are looking for the exact match.
I played a little with it and i can't see any solution without using an aggregate function.
For Postgres it will be array_agg.
You need to generate an SQL Query like:
SELECT *, array_to_string(array_agg(colors.color_name), ',')) as color_names FROM cards
JOINS cards_colors, colors
ON (cards.id = cards_colors.card_id AND colors.id = cards_colors.color_id)
GROUP BY cards.id HAVING color_names = 'green, black'
I never used those aggregators, so perhaps array_to_string is a wrong formatter, anyway you have to watch for aggregating the colors in alphabetical order. As long as you aint't having too many cards it will be slow enough, but it will scan every card in a table.
I you want to use an index on this query, you should denormalize your data structure, use an array of color_names on a cards record, index that array field and search on it. You can also keep you normalized structure and define an automatic association callback which will put the colorname to the card's color_names array every time a color is assigned to a card.
try this
colors = Color.where(color_name: col).pluck(:id)
Card.includes(:colors).where('colors.id'=> colors)