Logstash bulk update documens in ElasticSearch with WHERE conditon - elasticsearch

I have 2 tables in MySQL like this
Table DEPARTMENT
Id
Name
1
Department 1
2
Department 2
Table STAFF
Id
Department_Id
Name
1
1
Staff 1
2
1
Staff 2
3
2
Staff 3
4
1
Staff 4
STAFF table has about 10 million records.
All STAFF's informations has been pushed by Logstash to ElasticSearch. Each document in ElasticSearch now only have 3 fields are Staff_Id, Staff_Name and Department_Name. Something like this:
{
"Staff_Id": 1,
"Staff_Name": "Staff 1",
"Department_Name": "Department 1"
}
Because of practical needs, I need to add one more field called Department_Id to each document. Note that this field (Department_Id) does not exist on existing documents.
I am a newbie to both Logstash and ElasticSearch. How can I do this with Logstash? Interpreted in the SQL way would be:
SELECT * FROM DEPARTMENT;
UPDATE STAFF SET Department_Id = XXX WHERE Department_Name = YYY
Note that DEPARTMENT table has about 100.000 records and ElasticSearch has about 10 million documents.
Can you take a look?

Related

How to create a lot of categories for card laravel voyager

image
image2
Now I have like this
image3
but I want have more categories
like this
How to select many category?
Let’s say in your case, we need to define the category for every features. So feature can have many categories and inverse category can have many features. Thus, it will be a Many To Many relationships.
To accomplish this we need to create 3 tables features, categories, and intermediate table feature_category. The feature_category table will have the feature_id and category_id column which connects both feature and category table and this intermediate table called the pivot table.
Here are the table structures:
features
id - integer
name - string
categories
id - integer
name - string
feature_category
feature_id - integer
category_id - integer
=> category
id name
-- -------
1 Category 1
2 Category 2
3 Category 3
4 Category 4
=> features
id name
-- -------
1 Feature 1
2 Feature 2
3 Feature 3
4 Feature 4
=> feature_category
id feature_id category_id
-- ------- -------
1 1 1
2 2 2
3 3 2
4 3 3
4 3 4
===============================
Our feature_category table before sync() operation:
id feature_id category_id
-- ------- -------
1 1 1
2 2 2
3 2 3
4 2 4
5 3 2
6 4 3
Laravel Sync() example:
<?php
use App\Models\Feature;
$user = Feature::find(2);
// Want to keep only "Category 2" (Id 2) category
$user->category()->sync([2]);
After performing the above operation, our feature_category table will look like below:
id feature_id category_id
-- ------- -------
1 1 1
2 2 2
5 3 2
6 4 3
Checkboxes or dropdowns can be used from the frontend to select multiple categories for features and sync() method can be used to update feature cards accordingly.
First you need to create new table feature_category with two fields :
(same type of features.id) feature_id
(same type of categories.id) category_id
Second create belongsToMany relationship directly in Voyager.
Example :

Calculate last year group by ID with DAX

I have a table in SSAS Tabular Model like this
Table 1
ID END DATE
1 06/24/2016
1 06/24/2017
1 06/24/2018
2 08/08/2017
2 08/08/2016
3 12/12/2015
I would like to create a Mesure in DAX, in another related Table. The output should be this:
Table 2
ID MAXYEAR
1 2018
1 2018
1 2018
2 2017
2 2017
3 2015
PLEASE !!! WITHOUT USING EARLIER. Because my model is very large, and can´t use this function.
Create a relationship between the 2 tables, assuming that Table 2
contains unique values for ID.
Create a year column from end date
Year = year([END DATE])
After that, in Table 2 create a calculated column with the following code:
MaxYear = CALCULATE(max('Table'[Year]))
Table 2 should look like this

SAS Sorting within group

I would like to try and sort this data by descending number of events and from latest date, grouped by ID
I have tried proc sql;
proc sql;
create table new as
select *
from old
group by ID
order by events desc, date desc;
quit;
The result I currently get is
ID Date Events
1 09/10/2015 3
1 27/06/2014 3
1 03/01/2014 3
2 09/11/2015 2
3 01/01/2015 2
2 16/10/2014 2
3 08/12/2013 2
4 08/10/2015 1
5 09/11/2014 1
6 02/02/2013 1
Although the dates and events are sorted descending. Those IDs with multiple events are no longer grouped.
Would it be possible to achieve the below in fewer steps?
ID Date Events
1 09/10/2015 3
1 27/06/2014 3
1 03/01/2014 3
3 01/01/2015 2
3 08/12/2013 2
2 09/11/2015 2
2 16/10/2014 2
4 08/10/2015 1
5 09/11/2014 1
6 02/02/2013 1
Thanks
It looks to me like you're trying to sort by descending event, then by either the earliest or latest date (I can't tell which one from your explanation), also descending, and then by id. In your proc sql query, you could try calculating the min or max of the Date variable, grouped by event and id, and then sort the result by descending event, the descending min/max of the date, and id.

Run a simple sql group by query in kibana 4

I want to run a simple sql group by query in kibana 4 "Discover" page.
Each record in my elastic search index represent a log and has 3 columns: process_id (not unique value), log_time, log_message.
example:
process_id log_time log_message
---------------- -------------------- --------------------
1 2014/12/11 01:00 msg1
1 2014/12/11 01:10 msg2
1 2014/12/11 01:20 msg3
2 2014/12/11 11:00 msg4
2 2014/12/11 11:10 msg5
I want to generate a table in kibana that looks like:
process_id first log_time last log_time
---------------- ------------------------ --------------------
1 2014/12/11 01:00 2014/12/11 01:20
2 2014/12/11 11:00 2014/12/11 11:10
In sql the query is simple:
select process_id, max(log_time), min(log_time)
from logs_table
group by process_id
How can I run this query in Kibana? Is it possible to run the query in "Discover" page or should I create a panel (Visualize page)?
thanks.
I'm on Kibana 4.3, but this is possible on any version of Kibana. You need to create a Visualization panel of type Data Table.
Before that you need to make sure that you've created an index pattern for your index, such as this one, with the log_time date field as the timestamp for your index.
Then you can create your Data Table visualization and it must look like this, i.e. a split rows terms aggregation on the process_id field and then two metrics aggregation (one min and one max) on the log_time date field
Finally, your results will look like this as expected:

Oracle Self-Join on multiple possible column matches - CONNECT BY?

I have a query requirement from ----. Trying to solve it with CONNECT BY, but can't seem to get the results I need.
Table (simplified):
create table CSS.USER_DESC (
USER_ID VARCHAR2(30) not null,
NEW_USER_ID VARCHAR2(30),
GLOBAL_HR_ID CHAR(8)
)
-- USER_ID is the primary key
-- NEW_USER_ID is a self-referencing key
-- GLOBAL_HR_ID is an ID field from another system
There are two sources of user data (datafeeds)... I have to watch for mistakes in either of them when updating information.
Scenarios:
A user is given a new User ID... The old record is set accordingly and deactivated (typically a rename for contractors who become fulltime)
A user leaves and returns sometime later. HR fails to send us the old user ID so we can connect the accounts.
The system screwed up and didn't set the new User ID on the old record.
The data can be bad in a hundred other ways
I need to know the following are the same user, and I can't rely on name or other fields... they differ among matching records:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100 and EX000005 are connected properly by the NEW_USER_ID field. The rename occurred before there were global HR IDs, so EX0T1100 doesn't have one. EX000005 was given a new user ID, 'GL110456', and the two are only connected by having the same global HR ID.
Cleaning up the data isn't an option.
The query so far:
select connect_by_root cud.user_id RootUser,
count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots,
level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle,
cud.user_id, cud.new_user_id, cud.global_hr_id,
cud.user_type_code UserType, ccud.last_name, cud.first_name
from css.user_desc cud
where cud.user_id in ('EX000005','EX0T1100','GL110456')
-- Using this so I don't get sub-users in my list of root users...
-- It complicates the matches with GLOBAL_HR_ID, however
start with cud.user_id not in (select cudsub.new_user_id
from css.user_desc cudsub
where cudsub.new_user_id is not null)
connect by nocycle (prior new_user_id = user_id);
I've tried various CONNECT BY clauses, but none of them are quite right:
-- As a multiple CONNECT BY
connect by nocycle (prior global_hr_id = global_hr_id)
connect by nocycle (prior new_user_id = user_id)
-- As a compound CONNECT BY
connect by nocycle ((prior new_user_id = user_id)
or (prior global_hr_id = global_hr_id
and user_id != prior user_Id))
UNIONing two CONNECT BY queries doesn't work... I don't get the leveling.
Here is what I would like to see... I'm okay with a resultset that I have to distinct and use as a subquery. I'm also okay with any of the three user IDs in the ROOTUSER column... I just need to know they're the same users.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
Ideas?
Update
Nicholas, your code looks very much like the right track... at the moment, the lead(user_id) over (partition by global_hr_id) gets false hits when the global_hr_id is null. For example:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME
FP004468 FP004469 AARON TIMOTHY
FP004469 FOONG KOK WAH
I've often wanted to treat nulls as separate records in a partition, but I've never found a way to make ignore nulls work. This did what I wanted:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
... but there's got to be a better way. I haven't been able to get the query to finish yet on the full-blown user data (about 40,000 users). Both global_hr_id and new_user_id are indexed.
Update
The query returns after about 750 seconds... long, but manageable. It returns 93k records, because I don't have a good way of filtering level 2 hits out of the root - you have start with global_hr_id is null, but unfortunately, that isn't always the case. I'll have to think some more about how to filter those out.
I've tried adding more complex start with clauses before, but I find that separately, they run < 1 second... together, they take 90 minutes >.<
Thanks again for you help... plodding away at this.
You have provided sample of data for only one user. Would be better to have a little bit more. Anyway, lets look at something like this.
SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as(
2 select 'EX0T1100', 'EX000005', null from dual union all
3 select 'EX000005', null, 00126121 from dual union all
4 select 'GL110456', null, 00126121 from dual
5 )
6 select connect_by_root(user_id) rootuser
7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot
8 , level nodlevel
9 , connect_by_isleaf
10 , user_id
11 , new_user_id
12 , global_hr_id
13 from (select user_id
14 , coalesce(new_user_id, usr) new_user_id1
15 , new_user_id
16 , global_hr_id
17 from ( select user_id
18 , new_user_id
19 , global_hr_id
20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr
21 from user_desc
22 )
23 )
24 start with global_hr_id is null
25 connect by prior new_user_id1 = user_id
26 ;
Result:
ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID
-------- ---------- ---------- ----------------- -------- ----------- ------------
EX0T1100 3 1 0 EX0T1100 EX000005
EX0T1100 3 2 0 EX000005 126121
EX0T1100 3 3 1 GL110456 126121

Resources