Finding the total number of rows for all tables in CockroachDB - cockroachdb

I’m curious how many total rows I have across all of the tables in my deployment. Does CockroachDB have a command to count the total number of rows in all of my tables?

We don't currently have anything that's better than running a SELECT COUNT(*) query against every table in your database, which will be really slow. Instead, we recommend using the data size in the admin UI as an approximation.
If the exact count of all rows is still desired, you can can use a shell script to gather all the table names from information_schema and issue a COUNT(*) query for all of them.
For example, the following snippet will print out the row counts for every table in the database cats:
tables=$(cockroach sql -e "SELECT table_name FROM information_schema.tables WHERE table_schema='cats'" | sed 1,2d)
for table in $tables; do
cockroach sql -e "SELECT '$table', COUNT(*) FROM cats.$table"
done

Related

Oracle - discrepancy counting tables between TABLES view and TAB_COLUMNS view

I'm running some top-line stats for our Oracle database, to report a count of the total number of Tables. I'm using some very basic SQL queries against the db views: TABLES and TAB_COLUMNS
When comparing a count of the number of records in the DBA_TABLES view, with a count of the number of distinct Owner/Table_Name combinations in the the DBA_TAB_COLUMNS, I've found that there are significantly more tables listed in the TAB_COLUMNS view (a total of 12,508, in my case), than in the TABLES view (a total of 6,630).
Looking at a data-level sample of the disparities, these 'extra' tables showing in the TAB_COLUMNS all appear to contain no rows of data.
Clearly, a chat with my DBA to understand why so many empty tables is my next port of call (could be a number of reasons I'm sure) - but my question is: how come the TABLES view apparently excludes these tables, when the TAB_COLUMNS view includes them?
There's that useful dictionary thing which contains interesting info. For example:
SQL> select * from dictionary where table_name in ('USER_TAB_COLUMNS', 'USER_TABLES');
TABLE_NAME COMMENTS
------------------------------ --------------------------------------------------
USER_TABLES Description of the user's own relational tables
USER_TAB_COLUMNS Columns of user's tables, views and clusters
SQL>
As user_tab_columns contains tables and views and clusters, I guess that's the reason your queries returned different results. Try to remove what you don't need.
How? One option is to JOIN these two. Another is to switch to USER_OBJECTS whose object_type column says what is what.

Query to find the count of columns for all tables in impala/hive on Hue

I am trying to fetch a count of total columns for a list of individual tables/views from Impala from the same schema.
however i wanted to scan through all the tables from that schema to capture the columns in a single query ?
i have already performed a similar excercise from Oracle Exadata ,however since i a new to Impala is there a way to capture ?
Oracle Exadata query i used
select owner, table_name as view_name, count(*) as counts
from dba_tab_cols /*DBA_TABLES_COLUMNS*/
where (owner, table_name) in
(
select owner, view_name
from dba_views /*DBA_VIEWS*/
where 1=1
and owner='DESIRED_SCHEMA_NAME'
)
group by owner ,table_name
order by counts desc;
Impala
In Hive v.3.0 and up, you have INFORMATION_SCHEMA db that can be queried from Hue to get column info that you need.
Impala is still behind, with JIRAs IMPALA-554 Implement INFORMATION_SCHEMA in Impala and IMPALA-1761 still unresolved.

How to select the last table from a list of hive tables?

I have a list of hive tables and want to select the last table for performing some query.
Here is what I use to get the list of similar hive tables.
show tables 'test_temp_table*';
It displays the below result
test_temp_table_1
test_temp_table_2
test_temp_table_3
test_temp_table_4
test_temp_table_5
test_temp_table_6
I need to run some query on test_temp_table_6. I can do this using shell script by writing the output to a temp file and reading the last value from it but is there a simple way using hive query to get the last table that has the maximum number at the end?
Using shell:
last_table=$(hive -e "show tables 'test_temp_table*';" | sort -r | head -n1)
You can actually run a "select query" on the Hive metastore based on tablenames (and then use regular sql sorting using ORDER BY DESC and LIMIT 1) instead of using "SHOW TABLES", by following the approach mentioned here: Query Hive Metadata Store for table metadata

Hive count(*) query is not invoking mapreduce

I have external tables in hive, I am trying to run select count(*) from table_name query but the query returns instantaneously and gives result which is i think already stored. The result returned by query is not correct. Is there a way to force a map reduce job and make the query execute each time.
Note: This behavior is not followed for all external tables but some of them.
Versions used : Hive 0.14.0.2.2.6.0-2800, Hadoop 2.6.0.2.2.6.0-2800 (Hortonworks)
After some finding I have got a method that kicks off MR for counting number of records on orc table.
ANALYZE TABLE 'table name' PARTITION('partition columns') COMPUTE STATISTICS;
--OR
ANALYZE TABLE 'table name' COMPUTE STATISTICS;
This is not a direct alternative for count(*) but provides latest count of records in the table.
Doing a wc -l on ORC data won't give you an accurate result, since the data is encoded. This would work if the data was stored in a simple text file format with one row per line.
Hive does not need to launch a MapReduce for count(*) of an ORC file since it can use the ORC metadata to determine the total count.
Use the orcfiledump command to analyse ORC data from the command line
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility
From personal experience, COUNT(*) on an ORC table usually returns wrong figures -- i.e. it returns the number of rows on the first data file only. If the table was fed by multiple INSERTs then you are stuck.
With V0.13 you could fool the optimizer into running a dummy M/R job by adding a dummy "where 1=1" clause -- takes much longer, but actually counts the rows.
With 0.14 the optimizer got smarter, you must add a non-deterministic clause e.g. "where MYKEY is null". Assuming that MYKEY is a String, otherwise the "is null" clause may crash your query -- another ugly ORC bug.
By the way, a SELECT DISTINCT on partition key(s) will also return wrong results -- all existing partitions will be shown, even the empty ones. Not specific to ORC this time.
please try the below :
hive>set hive.fetch.task.conversion=none in your hive session and then trigger select count(*) operation in your hive session to mandate mapreduce

"Insert Into" clause in sybase 15.5

I am inserting some records(~10k) into a temporary table using the insert into cluase.
I prepared a select which will pick the indexes and perform better.
But when i use the same select with "Insert into cluase" then it results in a table scan .
my query looks like this
Insert into tmpio..table
select top 10000 Column_names
from Table
where <criteria>
if i check the query plan only for the SELECT query i can see that it picks the index,but for the entire query which inclused "INSERT INTO" it doesn't pick any index.
Is this behaviour normal?
Are indexes of no use when you are selecting data from one table and inserting it directly into the other table?
These are my assumptions prior to writing the query.
Target table should not have any indexes to improve performance.
The source table can have indexes and we can use them.
Looks like Sybase doesen´t match the index in some way.
It could be the "top 10000" that mess things up.
If you know what index to use you could force sybase to use it by entering:
select top 10000 Column_names from Table (index index_name)
Aside from your solution, you maybe should consider to use the "select * into" approach instead. It´s faster and more effective since it doesen´t use the transaction log.
select top 10000 Column_names into tmpio..table
from Table
where <criteria>
Ps.
Attention: when using select top N the query is still executed fully, just the data page reads stop after the specified number of rows is affected.

Resources