I noticed that LIMIT queries will return more than the expected number of rows when they are executed against tables that contain nested or repeated data. For example, the following query run against the persons sample data set from the developer guide produces the following results:
% bq query 'SELECT fullName, children.name FROM [persons.person] LIMIT 1'
+----------+---------------+
| fullName | children_name |
+----------+---------------+
| John Doe | Jane |
| John Doe | John |
+----------+---------------+
It looks like BQL is applying the LIMIT operator before flattening the results as opposed to the other way around (which I think would make more sense).
Is this a bug in the BQL implementation or is this the expected behavior? If this is the expected behavior can someone please provide an explanation for why this makes sense?
This is expected given the way BigQuery flattens query results. When you run the query, the LIMIT 1 applies to the repeated record. Then the results get flattened in the output, and you get two rows. A workaround is to use an explicit flatten operation. For example:
SELECT fullName, children.name
FROM (FLATTEN([persons.person], children.name) LIMIT 1
This will return only a single row.
Related
I'm building a table to manage some articles:
Table
| Company | Store | Sku | ..OtherColumns.. |
| 1 | 1 | 123 | .. |
| 1 | 2 | 345 | .. |
| 3 | 1 | 123 | .. |
Scenario
Most time company, store and sku will be used to SELECT rows:
SELECT * FROM stock s WHERE s.company = 1 AND s.store = 1 AND s.sku = 123;
..but sometimes the company will not be available when accessing the table.
SELECT * FROM stock s WHERE s.store = 1 AND s.sku = 123;
..Sometimes all articles will be selected for a store.
SELECT * FROM stock s WHERE s.company = 1 AND s.store = 1;
The Question
How to properly index the table?
I could add three indexes - one for each select, but i think oracle should be smart eneugh to re-use other indexes.
Would an Index "Store, Sku, Company" be used if the WHERE-condition has no company?
Would an Index "Company, Store, Sku" be used if the WHERE-condition has no company?
You can think of the index key as conceptually being the 'concatenation' of the all of the columns, and generally you need to have a leading element of that key in order to get benefit from the index. So for an index on (company,store,sku) then
WHERE s.company = 1 AND s.store = 1 AND s.sku = 123;
can potentially benefit from the index
WHERE s.store = 1 AND s.sku = 123;
is unlikely to benefit (but see footnote below)
WHERE s.company = 1 AND s.store = 1;
can potentially benefit from the index.
In all cases, I say "potentially" etc, because it is a costing decision by the optimizer. For example, if I only have (say) 2 companies and 2 stores then a query on company and store, whilst it could use the index is perhaps better suited to not to do so, because the volume of information to be queried is still a large percentage of the size of the table.
In your example, it might be the case that an index on (store,sku,company) would be "good enough" to satisfy all three, but that depends on the distribution of data. But you're thinking the right way, ie, get as much value from as few indexes as possible.
Footnote: There is a thing called a "skip scan" where we can get value from an index even if you do not specify the leading column(s), but you will typically only see that if the number of distinct values in those leading columns is low.
first - do you need index at all? Indexes are not for free. If your table is small enoguh, perhaps you don't need index at all.
Second - what is data structure? You have store column in every scenario - I can imagine situation in which filtering data on store dissects source data to enough degree to be good enough for you.
However if you want to have maximum reasonable performance benefit you need two:
(store, sku, company)
(store, company)
or
(store, company, sku)
(store, sku)
Would an Index "Store, Sku, Company" be used if the WHERE-condition has no company?
Yes
Would an Index "Company, Store, Sku" be used if the WHERE-condition has no company?
Probably not, but I can imagine scenarios in which it might happen (not for the index seek operation which is really primary purpose of indices)
You dissect data in order of columns. So you group data by first element and order them by first columns sorting order, then within these group you group the same way by second element etc.
So when you don't use first element of index in filtering, the DB would have to access all "subgroups" anyway.
I recommend reading about indexes in general. Start with https://en.wikipedia.org/wiki/B-tree and try to draw how it behaves on paper or write simple program to manage simplified version. Then read on indexes in database - any db would be good enough.
I am trying to the count of number of groups in my report I know I could do it in the SQL however trying to avoid adding redundant data to my dataset if I can.
I have a MainDataSet that could have multiple entries per distinct group item. All I want is the no. of groups not the count of items within the group.
For example words starting with alphabet letters, lets say I have 2 groups A and B only (NB: number of groups can change dynamically as I filter the MainDataSet based on user parameter selection):
Group | Data
------|-----
A | Apple
A | Ant
B | Balloon
B | Book
B | Bowl
Final Result:
Group | Index | NGroups
A | 1 | 2
B | 2 | 2
I know I can get the Index using a aggregate function as follows:
RunningValue(Fields!Group.Value, CountDistinct, "TablixName")
But how do I get the NGroups value?
I guess I could also create another dataset based on the MainDataSet (make use of a sql function) and do:
SELECT 'X' AS GroupCount, COUNT(Distinct Group) AS NGroups
FROM dbo.udf_MainDataSet()
WHERE FieldX = #Parameter1
Then use a LookUp:
Lookup("X", Fields!GroupCount.Value, Fields!NGroups.Value, "NewDataSet")
But is there a simple solution that I am not seeing?
CountDistinct(Fields!Group.Value, "TablixName")
Input:
TABLE NAME: SEARCH_RECORD
Column A Column B Column C Column D
ID CODE WORD CODE/WORD
--------------------------------------------
123 666Ani RAT 666Ani/RAT
124 777Cae CAT 777Cae/CAT
I need a query to check as a LIKE case
if i search with column B like '%6A' or column C '%A%' it will give result
suppose i want to get the like based on the column D search
**User will search like '%6A%'/'%AT%' (always / will be given by user)**
Expected output:
666Ani/RAT
so, I need a query for the above to get the ID as output (CASE query is preferable)
Need you valuable suggestion
.
It can't be done with simple like.
It should work if the pattern look like '%6A%/%AT%'. It is a valid pattern.
So, you can write: columnD like '%6A%/%AT%' or columnD like first_pattern||'/'||second_pattern if the come from as different variables.
Another approach, if you know for sure that there is only a /(you can check how many they are), may be to use two likes using substr to get first and then second part of the search string.
where
columnB like substr(match_string, 1, instr(match_string,'/'))
and
columnC like substr(match_string, instr(match_string,'/')+1)
I have a list of elements (database tables) that may have relations with each other, including themselves.
Let's assume I have table1, table2, table3 and table4. What I would like to create and use in a loop would be structure like this:
|table1|table2|table3|table4||
table1| 1:1 | | | ||
table2| | 1:1 | | ||
table3| | | 1:1 | ||
table4| | | | 1:1 ||
Note that while it is not necessary to have 1:1 relations pre-filled, it is important for me to be able easily address any of these cells in a loop.
There will be while(True) loop that will iterate over and over again until the iteration without change occurs.
What I am currently struggling with is to find an efficient and pythonic way of doing this with basic python structures, while keeping decent performance and easy addressing.
Example of what I tried
tableNames = d.items(); # returns me this format ([(tableName, ['column1', 'column2']), ... ])
for tName in tableNames:
# some sort of comparism and conditions resulting in whatever decision
# for example, that I need to insert information about
# relation on [actualtable][table2]
fancyStructure[actualtable][table2] = "1:N";
Is it a good idea to use dictionary? Respectively dictionary of dictionaries? What is the best way?
And is it okay to pre-initialize that dictionary to the values that I will change in time? Note that this script will work with millions or even billions of records.
EDIT:
Input:
dict([('tableName', ['columnName', 'columnName2']), ('tableName2', ['columnName', 'columnName2'])])
Expected output:
tableName : tableName [relation],
tableName : tableName2[relation],
tableName2 : tableName[relation],
tableName2 : tableName2[relation]
Relations will be linked to both of mentioned tables separated by colon and at the beginning there will be no value, or empty.
In propel there is this doUpdate function, that will return the numbers of affected rows by this query.
The question is, if there is no need to update the row ( because the set value is already the same as the field value), will those rows counted as the affected row?
Take for example, I have the following table:
ID | Name | Books
1 | S1oon | Me
2 | S1oon | Me
Let's assume that I write a ORM function of the equivalent of the following query:
update `new table` set
Books='Me'
where Name='S1oon';
What will the doUpdate result return? Will it return 0 ( because all the Books column are already Me, hence there is no need to update), or will it be 2 ( because there are 2 rows that fulfill the where condition) ?
Under the hood, Propel is using PDO's PDOStatement::rowCount() method to return the number of affected rows. So, the short answer is that you'll get "2" as you expect here, but the longer answer is that it may depend slightly on how PDO implements that function for your specific database. (I think if you did not get 2, it should be a bug for PDO, however.)
See the description of rowCount() in the PHP manual for more info.
One other thing to bear in mind is that when Propel calls methods (like save() or delete()) which are expected to return number-of-rows-modified and which may result in more than one row being modified (e.g. if you add a Book and its Author and then cause both to be INSERTed by calling book->save()), you will get the total number of rows modified.
It will return 2.