How to set limit on cassandra DB and set the range not just limit because I have to find 10 to 100 record from cassandra DB
I have one table user
i want to search only 100 user then next 100 use means i want to paginate it so Please tell me How this can implement in our cassandra (CQL)
You can use datastax java driver for the same, its very easy to use.
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/quick_start/qsSimpleClientCreate_t.html
And please refer the following
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
Related
Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.
it seems that the r2dbc-oracle doesnt have a proper back pressure implementation. If i select a bigger amoount of rows (say 10k) then it is way slower than a regular jdbc/JPA query. If i manually set the fetchsize to 1000 then the query is approx 8 times(!) faster.
so
can you confirm that back pressure is/is not implemented? if not: is that planned?
is there an easier way to set the fetchsize (maybe even global...) then using manual databaseclient.sql()-queries ?
Thanks for sharing these findings.
I can confirm that request signals from a Subscriber do not affect the fetch size of Oracle R2DBC's Row Publisher. Currently, the only supported way to configure the fetch size is by calling io.r2dbc.spi.Statement.fetchSize(int).
This behavior can be attributed to Oracle JDBC's implementation of oracle.jdbc.OracleResultSet.publisherOracle(Function). The Oracle R2DBC Driver is using Oracle JDBC's Publisher to fetch rows from the database.
I can also confirm that the Oracle JDBC Team is aware of this issue, and is working on a fix. The fix will have the publisher use larger fetch sizes when demand from a subscriber exceeds the value configured with Statement.fetchSize(int).
Source: I wrote the code for Oracle R2DBC and Oracle JDBC's row publisher.
I am trying to test performance of clickhouse to get sense how much memory i need for a dedicated server.
Currently I'm using PostgreSQL in production and now I want to migrate to clickhouse, so I inserted some of production data into a clickhouse server locally and executing the most used queries on production on clickhouse.
But I do not know how much memory does clickhouse use to execute these queries.
After some research I found the answer hope it help others.
clickhouse has table called 'system.query_log' that is used for storing statistics of each executed query like duration or memory usage
system.query_log
also there is table 'system.processes' that has information about current queries
system.processes
I'm using the following query to inspect recent queries. It returns memory use, query duration, number of read rows, used functions, and more:
SELECT * FROM system.query_log
WHERE type != 'QueryStart' AND NOT has(databases, 'system')
ORDER BY event_time_microseconds DESC
LIMIT 20;
I am using Cassandra 1.2.12 , i want to load data from cassandra using Java code, but i am forced to use limit in the query.
Using DataStax API to fetch data from Cassandra.
Lets assume keyspace as 'k' and columnfamily as 'c', read data from c on some condition which results in 10 million records, since i was getting time-out exception i limited it to 10000, and i know that i cant limit like 10001 to 20000.... and i want to load full 10 million records, How can i solve this problem.?
What you're asking about is called pagination, and you'll have to write queries with WHERE key > [some_value] to set your starting boundary for each slice you want to return. To get the correct value to use, you'll need to look at the last row returned by the previous slice.
If you're not dealing with numbers, you can use a function token() to do a range check, for example:
SELECT * FROM c WHERE token(name) > token('bob')
token() also may be required if you're paging by your partition key, which usually disallows slicing queries. For example (adapted from Datastax documentation):
CREATE TABLE c (
k int PRIMARY KEY,
v1 int,
v2 int
);
SELECT * FROM c WHERE token(k) > token(42);
Loading all of data from Cassandra is not a good option. With Kundera(supports datastax java driver), i know can set maxResults to Integer.MAX_VALUE, which would exclude LIMIT keyword while retrieving the data.
As Daniel said, probably what you are looking for is "pagination", use token() function for this and handle number of records per page pro grammatically. IMHO, high level apis should take care such things like of applying token implicitly in case of pagination required.
HTH,
-Vivek
how to specify no of records to delete in Tibco JDBC Update activity in batch update mode.
Actually I need to delete 25 million of records from the database so I wrote Tibco code to do the same and it is taking lot of time .. So I am planning to use Batch mode in Delete query so I don't know how to specify no of records in JDBC Update activity.
Help me if any one has any idea.. thanks
From the docs for the Batch Update checkbox:
This field is only meaningful if there are prepared parameters in the
SQL statement (see Prepared Parameters).
In which case the input will be an array of records. It will execute the statement once for each record.
To avoid running out of memory, you will still need to iterate over the 25mil, but you can iterate in groups of 1000 or 10000.
If this is not something you would do often (deleting 25M rows, sounds pretty one-off), an alternative is to use BW to create a file containing the delete statements and then giving the file to a DBA to execute.
please use subset feature of jdbc palette!! Let me know if you face any issues?
I would suggest two points:
If this is an one time activity then it is not adviced to use Tibco BW code for that. SQL script should be the better alternative.
When you say 25 million records- what criteria is this based on. It can be achieved through subset iteration .But there should be proper load testing in the Pre - Prod environment to check that the process is not causing any memory/DB issue.
You can also try using SQL procedure and invoking the same through BW.