Count mismatch in Hive- Elasticsearch integration in hive table and ES index - elasticsearch

I had created Elastic search index using hive.
Here, I have one temp table, where load all the raw data.
From that table select some data on some criteria and insert them to a table which is integrated with Elastic search index.
After index creation I am comparing the count at hive table (in the main table on same criteria), on the table integrated with ES and elastic search index.
found count does not same.
In ES index it is: 4663296
On table integrated with ES: 4663296 (same as ES)
but in hive it's : 4611296 (main table on same criteria) - less then ES
So could some one please tell me why this count is more in ES. It should be same, am I right?
Thanks,
Rackto

It was found that there was some duplicate records in the ES.
So, what I am doing, add the id manually (some key in the data which is always unique), now the count is same.
Just need to add one table properties:
TBLPROPERTIES('......., 'es.mapping.id' = 'field_name_of_the_unique_id'); in hive table creation.
Thanks

Related

Elastic Logstash script for both insert and update

Newbie with Elastic Stack
I have an oracle database table where rows gets inserted daily at 12 AM and later on those column values will get updated if there is any change in their values.
I tried with doc_as_upsert but the values aren't updating instead of that new rows getting inserted and creating duplicates of that data
There is no unique id for that table.
Can I use elastic id as reference for updating data.
Can anyone suggest any solution for this problem
Thanks in advance

Does Elasticsearch performs full table scan on my Oracle table everytime when Logstash is run?

I wanted to know if the Elasticsearch performs the full table scan on my Oracle table if I try to ingest that table's delta data using Logstash
Elasticsearch doesn't interact with your database, it's Logstash that runs queries on your database. Whether Logstash scans the entire table depends on the query itself and the scanned table indexes. Most queries run from Logstash look similar to this:
SELECT * FROM TABLE WHERE FIELD_FOR_DELTA > :sql_last_value;
If FIELD_FOR_DELTA doesn't have an index then Oracle will search through all records to find entries satisfying the condition. But when FIELD_FOR_DELTA has an index then Oracle will either search through small portion of the table or will only check the record with highest value and finish the query if the value was equal or smaller. If you don't have an index for this field in your table then you should consider it, because of potentially improved query performance and lowered DB impact from Logstash.

Compare fields values of two document type values under one index

I have two mysql tables containing thousands of rows in each table. I have index all records into elasticsearch using logstash jdbc plugin.
The Idea is to compare difference between two tables using Elasticsearch in fast manner. Faster than SQL.
I have index called elasticdata and two documents type table1 and table2.
I have to compare each column1 of table1 document type with all rows of table2 document type. If difference is found print the value.
How can it be achieved in elastic search?

clustered and non-clustered index equivalent in oracle?

I am coming from MS sql server where clustered and non clustered index used to play important role.
But looks like there are nothing of this sort under oracle. I am sure there must be some equivalent of
clustered and non clustered index in oracle . Can somebody throw light on this ?
When we say create index in oracle is it equivalent to non clustered index ?
Index Organised table in Oracle stores data of the whole table sorted on the basis of say - primary key. So, this is the closest thing to have as clustered Index in Oracle, other than that all other indexes are non-clustered but the Index key for all other indexes are sorted too with Rowid in-front of them which points to the actual data.

Indexing an Oracle IOT by non-pk fields

I have a table in SQL Server that needs its normal pk index to be replaced by a clustered index on two fields. These other fields are not part of the primary key.
I need to do something similar in Oracle. As far as I know, this can be done using index-ordered tables, but my guess is these indexes are only constructed on the primary key.
Is there any way I can get a similar behavior to SQLServer's clustered index in Oracle?
Index organized tables are the concept of oracle which is near to clustered indexes in sql server. I found a discussion about the topic in oracle forum and one on asktom.
My question is: why do you want to adapt the behavior? What is the benefit you whant to obtain?
A clustered index in sql server is mostly the primary key index. Rowdata is stored into the index node. The conecpt on oracle to store row data into the index is an index organized table.
On oracle the iot is used to avoid a second lookup for row data into the table after index lookup.
The purpose of clustered index on sql server is to store the rowdata. A table can have only one clustered index. This index will hold the rowdata. Any other index is a non clustered index.
IMHO the concept of clustered index is bound to sql server data storage and there is no need to rebuild this behavior in oracle. Oracle has other concepts to store data.
Answer:
A regular index on oracle is all to solve your problem.

Resources