Split a single row into multiple rows with grouping data check - Hive - hadoop

Now I'm using the query below in hive to split a row into multiple rows, but I also want to group a "Product" column based on "Category" column each group will match by the order of the group and have ";" to sperate each group and have "," separate item in the group.
SELECT id, customer, prodcut_split
FROM orders lateral view explode(split(product,';')) products AS prodcut_split
Here is my data look like now
| id | Customer| Category | Product |
+----+----------+---------------------------+-----------------------------------+
| 1 | John | Furniture; Technology | Bookcases, Chairs; Phones, Laptop |
| 2 | Bob | Office supplies; Furniture| Paper, Blinders; Tables |
| 3 | Dylan | Furniture | Tables, Chairs, Bookcases |
my desired result will look like:
| id | Customer| Category | Product |
+----+----------+----------------+-----------+
| 1 | John | Furniture | Bookcases |
| 1 | John | Furniture | Chairs |
| 1 | John | Technology | Phones |
| 1 | John | Technology | Laptop |
| 2 | Bob | Office supplies| Paper |
| 2 | Bob | Office supplies| Blinders |
| 2 | Bob | Furniture | Tables |
| 3 | Dylan | Furniture | Tables |
| 3 | Dylan | Furniture | Chairs |
| 3 | Dylan | Furniture | Bookcases |

I have tried this one and it's work well, all credit goes to this question: Hive - Split delimited columns over multiple rows, select based on position
select id,customer ,category, products
from
(
SELECT id, category, product
FROM tale_name
lateral VIEW posexplode(split(category,';')) category AS pos_category, category_split
lateral VIEW posexplode(split(product,';')) product AS pos_product, product_split
WHERE pos_category = pos_product) a
lateral view explode(split(product_split,',')) product_split AS products

Related

ArrayFormula - If cell contains match, combine other cells with TEXTJOIN

I have a Google Sheet that contains names of characters, together with corresponding values for the group name, "selected" and attack power. It looks like this:
Sheet1
| NAME | GROUP NAME | SELECTED | ATTACK POWER |
|:---------|:-----------|----------:|-------------:|
| guile | Team Red | 1 | 333 |
|----------|------------|-----------|--------------|
| blanka | Team Red | 1 | 50 |
|----------|------------|-----------|--------------|
| sagat | Team Red | | 500 |
|----------|------------|-----------|--------------|
| ruy | Team Blue | 1 | 450 |
|----------|------------|-----------|--------------|
| vega | Team Blue | 2 | 150 |
Sheet2
In my second sheet, I have two columns. Group name, which contains names of each team from Sheet1 and names, which contains my current ArrayFormula:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT('Sheet1'!A:A; 1*('Sheet1'!B:B=A2))))
Using this formula I can combine all characters into one cell (with textjoin, repeated with row breaks) based on the value in Group name. The result looks like the following:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
| | sagat |
|------------|---------------------------|
| Team Blue | ruy |
| | vega |
|------------|---------------------------|
The problem is that I only want to combine the characters with having a selected value of 1. End-result should instead look like this:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
|------------|---------------------------|
| Team Blue | ruy |
|------------|---------------------------|
I tried the following setup using a IF-statement, but it just returns a string of FALSE:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT(IF('Sheet1'!C:C="1";'Sheet1'!A:A); 1*('Sheet1'!B:B=A2))))
Can this be one?
paste in F2 cell:
=UNIQUE(FILTER(B:B, C:C=1))
paste in G2 cell and drag down:
=TEXTJOIN(CHAR(10), 1, FILTER(A:A, B:B=F2, C:C=1))
or G2 cell be like:
=ARRAYFORMULA(TEXTJOIN(CHAR(10), 1,
REPT(FILTER(Sheet1!A:A, Sheet1!C:C=1), 1*(FILTER(Sheet1!B:B, Sheet1!C:C=1)=F2))))

Use data gotten from one table to get data from another data

I have three tables
Job Model
+---------------------------+
| id | name |
+---------------------------+
| 1 | web design |
+---------------------------+
| 2 | desktop development |
+---------------------------+
Applicant Model
+----------------------------------------------------------+
| id | job_id | user_id | desc |
+----------------------------------------------------------+
| 1 | 2 | 1 | I am an expert developer |
+----------------------------------------------------------+
| 2 | 2 | 2 | I am good |
+----------------------------------------------------------+
User Model
+----------------+
| id | name |
+----------------+
| 1 | john |
+----------------+
| 2 | steve |
+----------------+
Using eloquent laravel,
How do I get applicants with job_id "2" and their user name together.
Tried a lot but unfortunately to no avail.
You would use a query something like this:
$jobs = Job::where('id', 2)
->join('applicants', 'jobs.id', '=', 'applicants.job_id')
->join('users', 'applicants.user_id', '=', 'users.id')
->select(['user.name', 'job.name', 'applicant.desc'])
->get();
You will have to change it to your requirements but should about what you need.
You can get more info here:
https://laravel.com/docs/5.7/queries
Hope that helps

Efficient way to join by levenshtein in Hive or Impala

I have two tables one includes about 17K (NLIST) records while the other 57K (FNAMES).
I would like to join the both by comparing the records using levenshtein formula.
Here is the example for the content of tables:
Table NLIST:
+------+-------------+
| ID | S_NAME |
+------+-------------+
| 1 | Avi |
| 2 | Moshe |
| 3 | David |
....
Table FNAMES:
+------+-------------+
| ID | NICKNAMES |
+------+-------------+
| 1 | Avile |
| 2 | Dudi |
| 3 | Moshiko |
| 4 | Avi |
| 5 | DAVE |
....
The above tables are just examples. In the real case the names column can include more than one word.
The required result should be:
+------+-------------+--------+
| ID | NICKNAMES | S_NAME |
+------+-------------+--------+
| 1 | Avile | Avi |
| 2 | Dudi | David |
| 3 | Moshiko | Moshe |
| 4 | Avi | Avi |
| 5 | DAVE | David |
...
Here is the code I use:
select FNAMES.NICKNAMES, NLIST.S_NAME
from NICKNAMES
LEFT OUTER JOIN NLIST
ON(true)
WHERE levenshtein (FNAMES.NICKNAMES, NLIST.S_NAME) <=4
The above code runs for a very long time and I stopped its running.
How can I make it run in a reasonable time?
In addition, I think the levenshtein distance depends on the length of the words. How can I find the optimal value for the distance (in this case I chose 4 arbitrarily)?
Hive Table performance is depends upon various point .
Query enginee
File format
use VECTORIZATION set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true;
If you have good server you can try with Impala and definitely it is faster than Hive.
You can do the fine tuning of impala which will give you an edge to execute this query faster .Tuning Impala for Performance

Elasticsearch index with jdbc driver

Sorry my english is bad
I am using elasticsearch and jdbc river. I have two table with many-to-many relations. For example:
product
+---+---------------+
| id| title |
+---+---------------+
| 1 | Product One |
| 2 | Product Two |
| 3 | Product Three |
| 4 | Product Four |
| 5 | Product Five |
+---+---------------+
product_category
+------------+-------------+
| product_id | category_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 4 |
| 2 | 5 |
+------------+-------------+
category
+---+---------------+
| id| name |
+---+---------------+
| 1 | Category One |
| 2 | Category Two |
| 3 | Category Three|
| 4 | Category Four |
| 5 | Category Five |
+---+---------------+
I want to use array type.
{
"id": 1,
"name": "Product one",
"categories": {"Category One", "Category Two", "Category Three"}
},
How should I write a sql?
Use elasticsearch-jdbc structured objects with sql, no need to group_concat:
SELECT
product.id AS _id,
product.id,
title,
name AS categories
FROM product
LEFT JOIN (
SELECT *
FROM product_category
LEFT JOIN category
ON product_category.category_id = category.id
) t
ON product.id = t.product_id
Since river has been deprecated since ES v1.5, maybe run a standalone importer is better.

Display record count in listbox using multiple tables and fields

i need help with a query, can't get it to work correctly. What i'm trying to achieve is to have a select box displaying the number of records associated with a particular theme, for some theme it works well for some it displays (0) when infact there are 2 records, I'm wondering if someone could help me on this, your help would be greatly appreciated, please see below my actual query + table structure :
SELECT theme.id_theme, theme.theme, calender.start_date,
calender.id_theme1,calender.id_theme2, calender.id_theme3, COUNT(*) AS total
FROM theme, calender
WHERE (YEAR(calender.start_date) = YEAR(CURDATE())
AND MONTH(calender.start_date) > MONTH(CURDATE()) )
AND (theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3)
GROUP BY theme.id_theme
ORDER BY theme.theme ASC
THEME table
|---------------------|
| id_theme | theme |
|----------|----------|
| 1 | Yoga |
| 2 | Music |
| 3 | Taichi |
| 4 | Dance |
| 5 | Coaching |
|---------------------|
CALENDAR table
|---------------------------------------------------------------------------|
| id_calender | id_theme1 | id_theme2 | id_theme3 | start_date | end_date |
|-------------|-----------|-----------|-----------|------------|------------|
| 1 | 2 | 4 | | 2015-07-24 | 2015-08-02 |
| 2 | 4 | 1 | 5 | 2015-08-06 | 2015-08-22 |
| 3 | 1 | 3 | 2 | 2014-10-11 | 2015-10-28 |
|---------------------------------------------------------------------------|
LISTBOX
|----------------|
| |
| Yoga (1) |
| Music (1) |
| Taichi (0) |
| Dance (2) |
| Coaching (1) |
|----------------|
Thanking you in advance
I think that themes conditions should be into brackets
((theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3))
Hope this help

Resources