MongoDb Java with query : Select ....From ... Where field in (Select ......) - mongodb-java

My database has 2 collections : Cell(_id, segment_id, cell_id) and SegmentSpeed(_id, segment_id, speed)
And I want to perform the query below (sql format) in Java :
"Select * From SegmentSpeed Where segment_id in (Select segment_id From Cell Where cell_id>5)"
That is Mysql query, and the problem is I want to execute this query with MongoDb format in JAVA.
Thanks !

MongoDB does not support joins.
You may have to resort to some clunky and inefficient map-reduce operations if you are unable to change your schema.
What you can do is embed the whole SegmentSpeed inside a document of Cell.

Related

How does Oracle implement an SQL query over a VIEW

We have a VIEW defined over a base table using an SQL Queries ( SQL query1 )
Im trying to understand how Querying ( SQL query2 ) over a view work
When I run an SQL over a VIEW, does Oracle first execute query1 to create a temp table and then run query2 over the temp table ?
Or does it create a single composite query by combining query1 and query2 in order to give the result
( my query2 has high selectivity if run directly over the base table and a composite query should run much faster than executing query1 first )
Or does it create a single composite query by combining query1 and query2 in order to give the result
Yes, CBO (oracle cost-based optimizer) expands final query and transforms it and build an execution plan and you can check final query after transformation in trace 10053(optimizer trace) or using DBMS_UTILITY.EXPAND_SQL_TEXT
NB. DBMS_UTILITY.EXPAND_SQL_TEXT has appeared in 12.1, but you tagged Oracle 11g, so you need to use dbms_sql2.expand_sql_text, an example: https://github.com/xtender/xt_scripts/blob/master/expand_11.sql

Nifi executeSql with 30 threads very slow

We are using HDF to fetch large data from oracle. We have a generateTableFetch to create partition of 8000 records which create query like below :
Select * from ( Select a.*, ROWNUM rnum FROM (SELECT * FROM OPUSER.DEPENDENCY_TYPES WHERE (1=1))a WHERE ROWNUM <= 368000) WHERE rnum > 361000
Now this query is taking almost 20-25min to return from oracle.
Is there anything wrong that we are doing wrong or any configuration changes we can do.
Nifi uses jdbc connection so is there any oracle side configuration for that.
Also if we somehow add parallelism hint to the query example /parallel(c,2)/. WIll this help?
I'm guessing you're using Oracle 11 (or less) and have selected Oracle as the database type. Since LIMIT/OFFSET wasn't introduced until Oracle 12, NiFi uses the nested SELECT with ROWNUM approach to ensure each "page" of data contains unique values. If you are using Oracle 12+, make sure to use the Oracle 12+ database adapter instead, as it can leverage the LIMIT/OFFSET capabilities resulting in a faster query. Also make sure you have the appropriate index(es) in place to help with query execution.
As of NiFi 1.7.0, you might also consider setting the Column for Value Partitioning property. If you have a column (perhaps your DEPENDENCY_TYPES column) that is fairly uniformly distributed, and is not "too sparse" in relation to your Partition Size property value, GenerateTableFetch can use the column's values rather than the ROWNUM approach, resulting in faster queries. See NIFI-5143 and the GenerateTableFetch documentation for more details.
If you need to add hints to the JDBC session, then as of NiFi 1.9.0 (see NIFI-5780 for more details) you can add pre- and post-query statements to ExecuteSQL.

How to execute select query on oracle database using pi spark?

I have written a program using pyspark to connect to oracle database and fetch data. Below command works fine and returns the contents of the table:
sqlContext.read.format("jdbc")
.option("url","jdbc:oracle:thin:user/password#dbserver:port/dbname")
.option("dbtable","SCHEMA.TABLE")
.option("driver","oracle.jdbc.driver.OracleDriver")
.load().show()
Now I do not want to load the entire table data. I want to load selected records. Can I specify select query as part of this command? If yes how?
Note: I can use dataframe and execute select query on the top of it but I do not want to do it. Please help!!
You can use subquery in dbtable option
.option("dbtable", "(SELECT * FROM tableName) AS tmp where x = 1")
Here is similar question, but about MySQL
In general, the optimizer SHOULD be able to push down any relevant select and where elements so if you now do df.select("a","b","c").where("d<10") then in general this should be pushed down to oracle. You can check it by doing df.explain(true) on the final dataframe.

Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;
select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")
Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!
To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.
Can this be achieved this using hive ?!
Subqueries inside a WHERE clause are not supported in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
However, often you can use a JOIN statement instead to get to the same result:
https://karmasphere.com/hive-queries-on-table-data#join_syntax
For example, this query:
SELECT a.KEY, a.value
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);
can be rewritten to:
SELECT a.KEY, a.val
FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)
Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.
Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.
It will work if you put in :
select *
from TableA
where TA_timestamp in (select timestmp from TableB where id="hourDim")
EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.

Why the Select * FROM Table where ID NOT IN ( list of int ids) query is slow in sql server ce?

well this problem is general in sql server ce
i have indexes on all the the fields.
also the same query but with ID IN ( list of int ids) is pretty fast.
i tried to change the query to OUTER Join but this just make it worse.
so any hints on why this happen and how to fix this problem?
That's because the index is not really helpful for that kind of query, so the database has to do a full table scan. If the query is (for some reason) slower than a simple "SELECT * FROM TABLE", do that instead and filter the unwanted IDs in the program.
EDIT: by your comment, I recognize you use a subquery instead of a list. Because of that, there are three possible ways to do the same (hopefully one of them is faster):
Original statement:
select * from mytable where id not in (select id from othertable);
Alternative 1:
select * from mytable where not exists
(select 1 from othertable where mytable.id=othertable.id);
Alternative 2:
select * from mytable
minus
select mytable.* from mytable in join othertable on mytable.id=othertable.id;
Alternative 3: (ugly and hard to understand, but if everything else fails...)
select * from mytable
left outer join othertable on (mytable.id=othertable.id)
where othertable.id is null;
This is not a problem in SQL Server CE, but overall database.
The OPERATION IN is sargable and NOT IN is nonsargable.
What this mean ?
Search ARGument Able, thies mean that DBMS engine can take advantage of using index, for Non Search ARGument Ablee the index can't be used.
The solution might be using filter statement to remove those IDs
More in SQL Performance Tuning by Peter Gulutzan.
ammoQ is right, index does not help much with your query. Depending on distribution of values in your ID column you could optimise the query by specifying which IDs to select rather than not to select. If you end up requesting say more than ~25% of the table index will not be used anyway though because for nonclustered indexed (which is the only type of indexes which SQL CE supports if memory serves) it would be cheaper to scan the table. Otherwise (if the query is actually selective) you could re-write query with ID ranges to select ('union all' may work better than 'or' to combine ranges if SQL CE supports 'union all', not sure)

Resources