I have two tables, the first table has a list of invoice numbers and the second table has a list of products associated with each invoice. I want to sum the total cost of the products for each invoice and include it in the first table using Excel Power Query.
Table 1
[InvoiceNumbers] [OtherData]
Table 2
[Product] [Amount] [InvoiceNumber]
List.Sum() seems to be the function I need to use, but I cannot filter table 2 by invoice number using this function
Table.SelectRows() can be used to select the second table, and filter it to a specific set of rows, but I cannot seem to filter the rows of Table 2 using a column from Table 1.
I have also looked into Grouping and joining the table, but because of other factors I have left out, this is not going to work.
The full query Im working with looks like this.
List.Sum(Table.SelectRows(Table2, each [InvoiceNumber] = [InvoiceNumber])[Amount])
This just returns the sum of all the rows because [InvoiceNumber] is equal to itself.
How can I reference the Invoice Number of the row in Table 1 to use it as a condition in the Table.SelectRows() function? Or is there a better way to get the data sum the from Table 2?
Table 1 Final
[InvoiceNumber] [OtherData] [SumOfAmounts]
If there is a restriction to group the invoice details table, you could just reference it, and group the reference
1) Reference the table:
2) Group the referenced table:
3) Then merge the reference table and expand the total column
If this helps please remember to mark the answer
Related
There's a couple of tables that I need to use columns from in the select statement. the questions is: Create a View to display the employee id, first name and surname. In your query include the coin price and a 10% commission for the sales made by the employees.
the difficult part for me is that employees of the same employee number, can make multiple coin sales, so in the view, i need to be able to add all the coin sales together of each respective primary key (employee_id)
As you can see in this image, emp101 has sold two different coins, with the coin_id's of "7116" and "7112". In the view i want to be able to somehow tally each coin value that each employee_id has sold if that makes sense ?
There's multiple other tables, but there's too many to send, so i am just trying to get a general idea of how to do this. I understand the logistics of the question, i just dont know how to implement the answer with the correct syntax and methods etc...
Since this appears to be a homework question, here is a discussion of how to solve it:
You want to CREATE a VIEW and give the view a name (something like employee_commisions) and include 5 columns (employee_id, first_name, surname, coin_price and commission - or maybe only 4 columns if they want the combined price plus commission).
To get the values for that view, you want to SELECT ... FROM existing tables; however, your image does not include first_name, surname or a value for a coin so I am assuming that you will have an employees table and a coins table that you will need to INNER JOIN to the invoice table on their respective primary keys.
To get the total value for the coins, you want to aggregate the values and this would be done with the SUM aggregation function and, so that you get the value for each employee, you would need to GROUP BY the employee_id primary key. You will either need to include the other columns you are not aggregating by in the GROUP BY clause or apply an aggregation function to those columns such as MAX(surname).
The syntax for CREATE VIEW is here.
The syntax for SELECT is here.
I'm a new PBI user and would like help on the following:
I have 2 tables (Table 1 & 2). Table 1 is a bookings report showing sales orders, part numbers and order value. Table 2 is a margin report showing sales orders, part numbers with additional descriptive text and margin value.
I would like to copy margin values from Table 2 into a new column in Table 1 by looking up by sales order and part number.
Any help would be appreciated!
Tables1
Extract text using the LEFT function
Use the COMBINEVALUES function to create new column in each table. The new column will merge two existing old columns.
Depending if you have unique values in the lookup table, use LOOKUPVALUE function, if not, use this approach: DAX lookup first non blank value in unrelated table
there are 62000 records in my fact table which is not correct because I only have six records in my time dim, 240 records in my student dim and 140 records in my placement dim, does it have something to do with my where clause any help would be mostly appreciated.
INSERT INTO fact_placements (
report_id,
no_of_placements,
no_of_students,
fk1_time_id,
fk2_placement_id,
fk3_student_id )
SELECT
fact_seq.nextval,
no_of_placements,
no_of_students,
time_id,
placement_id,
student_id
FROM
time_dim,
placement_dim,
student_dim
WHERE
placement_dim.year = time_dim.year AND
student_dim.year = time_dim.year;
Unless you do a cartesian join i.e. without any WHERE clause, you will get less than 140 (placement) * 240 (student) * 6 (time) = 201600 fact records. Your current SQL uses the year column in the 3 tables to join, this is filtering down the records to the 62000 you are getting.
Your question title says that even this is "too many". If that is the case, then you would need to understand the granularity of your dimensions and the fact before joining these on any criteria. Are these all at the "year" level, if so do you have 1 record per year in each of these tables and no duplicates based on year?
If not, you might need to re-think the fact tables granularity or alternatively would need to join unique records based on year in each dimension to get the actual (less) number of records you are expecting, which can also be done by summarizing these tables based on year.
Ideally the fact table contains the combinations of the dimension keys with additional column i.e. the factual metrics (in this case no_of_placements and no_of_students). But depending on the available data not all combinations will be present in the fact table.
Also you might want to change the SQL syntax to use the INNER JOIN clause instead of the implied joins using the commas between table names in the FROM clause, as shown below
FROM time_dim
INNER
JOIN placement_dim
ON placement_dim.year = time_dim.year
INNER
JOIN student_dim
ON placement_dim.year = student_dim.year
There's no relationship between placement and student that's why you have so many records.
Your query is saying: Give me all the students and all the placements where year is the same.
I'm not sure that's what you want. What is really strange here is that you are loading a fact table with dimensions tables.
Cognos by default suppress duplicate/identical records. Duplicate rows do not appear in the report, but summaries are performed on all rows - including the duplicates that were eliminated.
To perform summaries on only the distinct rows, you must add the distinct key word when creating the summary definition. For example, the following summary:
Total(MyColumn)
Would become...
Total (distinct MyColumn)
But I would like Total of Column1 based on Distinct values of Column2. How to do this?
I assume your report is built on top of relational model.
The short answer to your questions is using FOR clause:
Using the AT and FOR Options with Relational Summary Functions
So you can do something like this:
Total(distinct MyColumn for Column2)
My question is why would you think distinct on one column is different from other column?
Cognos "eliminate" duplicate rows only if two or more rows are completely identical.
If one of the columns is different, then it's not a distinct row.
You can use grouping instead, which group together identical values on single column.
I recently was reading about Oracle Index Organized Tables (IOTs) but am not sure I quite understand WHEN to use them. So I have a small table:
create table categories
(
id VARCHAR2(36),
group VARCHAR2(100),
category VARCHAR2(100
)
create unique index (group, category, id) COMPRESS 2;
The id column is a foreign key from another table entries and my common query is:
select e.id, e.time, e.title from entries e, categories c where e.id=c.id AND e.group=? AND c.category=? ORDER by e.time
The entries table is indexed properly.
Both of these tables have millions (16M currently) of rows and currently this query really stinks (note: I have it wrapped in a pagination query also so I only get back the first 20, but for simplicity I omitted that).
Since I am basically indexing the entire table, does it make sense to create this table as an IOT?
EDIT by popular demand:
create table entries
(
id VARCHAR2(36),
time TIMESTAMP,
group VARCHAR2(100),
title VARCHAR2(500),
....
)
create index (group, time) compress 1;
My real question I dont think depends on this though. Basically if you have a table with few columns (3 in this example) and you are planning on putting a composite index on all three rows is there any reason not to use an IOT?
IOTs are great for a number of purposes, including this case where you're gonna have an index on all (or most) of the columns anyway - but the benefit only materialises if you don't have the extra index - the idea is that the table itself is an index, so put the columns in the order that you want the index to be in. In your case, you're accessing category by id, so it makes sense for that to be the first column. So effectively you've got an index on (id, group, category). I don't know why you'd want an additional index on (group, category, id).
Your query:
SELECT e.id, e.time, e.title
FROM entries e, categories c
WHERE e.id=c.id AND e.group=? AND c.category=?
ORDER by e.time
You're joining the tables by ID, but you have no index on entries.id - so the query is probably doing a hash or sort merge join. I wouldn't mind seeing a plan for what your system is doing now to confirm.
If you're doing a pagination query (i.e. only interested in a small number of rows) you want to get the first rows back as quick as possible; for this to happen you'll probably want a nested loop on entries, e.g.:
NESTED LOOPS
ACCESS TABLE BY ROWID - ENTRIES
INDEX RANGE SCAN - (index on ENTRIES.group,time)
ACCESS TABLE BY ROWID - CATEGORIES
INDEX RANGE SCAN - (index on CATEGORIES.ID)
Since the join to CATEGORIES is on ID, you'll want an index on ID; if you make it an IOT, and make ID the leading column, that might be sufficient.
The performance of the plan I've shown above will be dependent on how many rows match the given "group" - i.e. how selective an average "group" is.
Have you looked at dba-oracle.com, asktom.com, IOUG, another asktom.com?
There are penalties to pay for IOTs - e.g., poorer insert performance
Can you prototype it and compare performance?
Also, perhaps you might want to consider a hash cluster.
IOT's are a trade off. You are getting access performance for decreased insert/update performance. We typically use them for reference data that is batch loaded daily and not updated during the day. This is not to say it's the only way to use them, just how we use them.
Few things here:
You mention pagination - have you considered the first_rows hint?
Is that the order your index is in, with group as the first field? If so I'd consider moving ID to be the first column since that index will not be used.
foreign keys should have an index on the column. Consider addind an index on the foreign key (id column).
Are you sure it's not the ORDER BY causing slowness?
What version of Oracle are you using?
I ASSUME there is a primary key on table entries for field id, correct?
Why the WHERE condition does not include "c.group = e.group" ?
Try to:
Remove the order by condition
Change the index definition from "create unique index (group,
category, id)" to "create unique index (id, group, category)"
Reorganise table categories as an IOT on (group, category, id)
Reorganise table categories as an IOT on (id, group, category)
In each of the above case use EXPLAIN PLAN to review the cost