ArrayFormula - If cell contains match, combine other cells with TEXTJOIN - filter

I have a Google Sheet that contains names of characters, together with corresponding values for the group name, "selected" and attack power. It looks like this:
Sheet1
| NAME | GROUP NAME | SELECTED | ATTACK POWER |
|:---------|:-----------|----------:|-------------:|
| guile | Team Red | 1 | 333 |
|----------|------------|-----------|--------------|
| blanka | Team Red | 1 | 50 |
|----------|------------|-----------|--------------|
| sagat | Team Red | | 500 |
|----------|------------|-----------|--------------|
| ruy | Team Blue | 1 | 450 |
|----------|------------|-----------|--------------|
| vega | Team Blue | 2 | 150 |
Sheet2
In my second sheet, I have two columns. Group name, which contains names of each team from Sheet1 and names, which contains my current ArrayFormula:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT('Sheet1'!A:A; 1*('Sheet1'!B:B=A2))))
Using this formula I can combine all characters into one cell (with textjoin, repeated with row breaks) based on the value in Group name. The result looks like the following:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
| | sagat |
|------------|---------------------------|
| Team Blue | ruy |
| | vega |
|------------|---------------------------|
The problem is that I only want to combine the characters with having a selected value of 1. End-result should instead look like this:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
|------------|---------------------------|
| Team Blue | ruy |
|------------|---------------------------|
I tried the following setup using a IF-statement, but it just returns a string of FALSE:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT(IF('Sheet1'!C:C="1";'Sheet1'!A:A); 1*('Sheet1'!B:B=A2))))
Can this be one?

paste in F2 cell:
=UNIQUE(FILTER(B:B, C:C=1))
paste in G2 cell and drag down:
=TEXTJOIN(CHAR(10), 1, FILTER(A:A, B:B=F2, C:C=1))
or G2 cell be like:
=ARRAYFORMULA(TEXTJOIN(CHAR(10), 1,
REPT(FILTER(Sheet1!A:A, Sheet1!C:C=1), 1*(FILTER(Sheet1!B:B, Sheet1!C:C=1)=F2))))

Related

How to split a row where there's 2 data in each cells separated by a carriage return?

Someone gives me a file with, sometimes, inadequate data.
Data should be like this :
+---------+-----------+--------+
| Name | Initial | Age |
+---------+-----------+--------+
| Jack | J | 43 |
+---------+-----------+--------+
| Nicole | N | 12 |
+---------+-----------+--------+
| Mark | M | 22 |
+---------+-----------+--------+
| Karine | K | 25 |
+---------+-----------+--------+
Sometimes it comes like this tho :
+---------+-----------+--------+
| Name | Initial | Age |
+---------+-----------+--------+
| Jack | J | 43 |
+---------+-----------+--------+
| Nicole | N | 12 |
| Mark | M | 22 |
+---------+-----------+--------+
| Karine | K | 25 |
+---------+-----------+--------+
As you can see, Nicole and Mark are put in the same row, but the data are separated by a carriage return.
I can do split by row, but it demultiply the data :
+---------+-----------+--------+
| Nicole | N | 12 |
| | M | 22 |
+---------+-----------+--------+
| Mark | N | 12 |
| | M | 22 |
+---------+-----------+--------+
Which make me lose that Mark is associated with the "2nd row" of data.
(The data here is purely an example)
One way to do this is to transform each cell into a list by doing a Text.Split on the line feed / carriage return symbol.
TextSplit = Table.TransformColumns(Source,
{
{"Name", each Text.Split(_,"#(lf)"), type text},
{"Initial", each Text.Split(_,"#(lf)"), type text},
{"Age", each Text.Split(_,"#(lf)"), type text}
}
)
Now each column is a list of lists which you can combine into one long list using List.Combine and you can glue these columns together to make table with Table.FromColumns.
= Table.FromColumns(
{
List.Combine(TextSplit[Name]),
List.Combine(TextSplit[Initial]),
List.Combine(TextSplit[Age])
},
{"Name", "Initial", "Age"}
)
Putting this together, the whole query looks like this:
let
Source = <Your data source>
TextSplit = Table.TransformColumns(Source,{{"Name", each Text.Split(_,"#(lf)"), type text},{"Initial", each Text.Split(_,"#(lf)"), type text},{"Age", each Text.Split(_,"#(lf)"), type text}}),
FromColumns = Table.FromColumns({List.Combine(TextSplit[Name]),List.Combine(TextSplit[Initial]),List.Combine(TextSplit[Age])},{"Name","Initial","Age"})
in
FromColumns

<blockquote> tag inserted when using image in cell of RST table?

When I use the following code:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | |.. image:: /images/merchant-rating.png |
+----------------------+---------------+---------------------------------------------------------------------+
The text preceding the image in column C gets wrapped in <blockquote> tags in the HTML output. Is there any way to avoid this?
To avoid the blockquote tag in the first paragraph of the third column, you could try using this:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | | |img| |
+----------------------+---------------+---------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png
Instead, you'll get two paragraphs.
Use a substitution and remove the separating line so that Sphinx interprets the content as a single block of text.
+-----------------+--------------+------------------------------------------------------------------+
| A | B | C |
+=================+==============+==================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |img| |
+-----------------+--------------+------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png

Split a single row into multiple rows with grouping data check - Hive

Now I'm using the query below in hive to split a row into multiple rows, but I also want to group a "Product" column based on "Category" column each group will match by the order of the group and have ";" to sperate each group and have "," separate item in the group.
SELECT id, customer, prodcut_split
FROM orders lateral view explode(split(product,';')) products AS prodcut_split
Here is my data look like now
| id | Customer| Category | Product |
+----+----------+---------------------------+-----------------------------------+
| 1 | John | Furniture; Technology | Bookcases, Chairs; Phones, Laptop |
| 2 | Bob | Office supplies; Furniture| Paper, Blinders; Tables |
| 3 | Dylan | Furniture | Tables, Chairs, Bookcases |
my desired result will look like:
| id | Customer| Category | Product |
+----+----------+----------------+-----------+
| 1 | John | Furniture | Bookcases |
| 1 | John | Furniture | Chairs |
| 1 | John | Technology | Phones |
| 1 | John | Technology | Laptop |
| 2 | Bob | Office supplies| Paper |
| 2 | Bob | Office supplies| Blinders |
| 2 | Bob | Furniture | Tables |
| 3 | Dylan | Furniture | Tables |
| 3 | Dylan | Furniture | Chairs |
| 3 | Dylan | Furniture | Bookcases |
I have tried this one and it's work well, all credit goes to this question: Hive - Split delimited columns over multiple rows, select based on position
select id,customer ,category, products
from
(
SELECT id, category, product
FROM tale_name
lateral VIEW posexplode(split(category,';')) category AS pos_category, category_split
lateral VIEW posexplode(split(product,';')) product AS pos_product, product_split
WHERE pos_category = pos_product) a
lateral view explode(split(product_split,',')) product_split AS products

Efficient way to join by levenshtein in Hive or Impala

I have two tables one includes about 17K (NLIST) records while the other 57K (FNAMES).
I would like to join the both by comparing the records using levenshtein formula.
Here is the example for the content of tables:
Table NLIST:
+------+-------------+
| ID | S_NAME |
+------+-------------+
| 1 | Avi |
| 2 | Moshe |
| 3 | David |
....
Table FNAMES:
+------+-------------+
| ID | NICKNAMES |
+------+-------------+
| 1 | Avile |
| 2 | Dudi |
| 3 | Moshiko |
| 4 | Avi |
| 5 | DAVE |
....
The above tables are just examples. In the real case the names column can include more than one word.
The required result should be:
+------+-------------+--------+
| ID | NICKNAMES | S_NAME |
+------+-------------+--------+
| 1 | Avile | Avi |
| 2 | Dudi | David |
| 3 | Moshiko | Moshe |
| 4 | Avi | Avi |
| 5 | DAVE | David |
...
Here is the code I use:
select FNAMES.NICKNAMES, NLIST.S_NAME
from NICKNAMES
LEFT OUTER JOIN NLIST
ON(true)
WHERE levenshtein (FNAMES.NICKNAMES, NLIST.S_NAME) <=4
The above code runs for a very long time and I stopped its running.
How can I make it run in a reasonable time?
In addition, I think the levenshtein distance depends on the length of the words. How can I find the optimal value for the distance (in this case I chose 4 arbitrarily)?
Hive Table performance is depends upon various point .
Query enginee
File format
use VECTORIZATION set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true;
If you have good server you can try with Impala and definitely it is faster than Hive.
You can do the fine tuning of impala which will give you an edge to execute this query faster .Tuning Impala for Performance

Display record count in listbox using multiple tables and fields

i need help with a query, can't get it to work correctly. What i'm trying to achieve is to have a select box displaying the number of records associated with a particular theme, for some theme it works well for some it displays (0) when infact there are 2 records, I'm wondering if someone could help me on this, your help would be greatly appreciated, please see below my actual query + table structure :
SELECT theme.id_theme, theme.theme, calender.start_date,
calender.id_theme1,calender.id_theme2, calender.id_theme3, COUNT(*) AS total
FROM theme, calender
WHERE (YEAR(calender.start_date) = YEAR(CURDATE())
AND MONTH(calender.start_date) > MONTH(CURDATE()) )
AND (theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3)
GROUP BY theme.id_theme
ORDER BY theme.theme ASC
THEME table
|---------------------|
| id_theme | theme |
|----------|----------|
| 1 | Yoga |
| 2 | Music |
| 3 | Taichi |
| 4 | Dance |
| 5 | Coaching |
|---------------------|
CALENDAR table
|---------------------------------------------------------------------------|
| id_calender | id_theme1 | id_theme2 | id_theme3 | start_date | end_date |
|-------------|-----------|-----------|-----------|------------|------------|
| 1 | 2 | 4 | | 2015-07-24 | 2015-08-02 |
| 2 | 4 | 1 | 5 | 2015-08-06 | 2015-08-22 |
| 3 | 1 | 3 | 2 | 2014-10-11 | 2015-10-28 |
|---------------------------------------------------------------------------|
LISTBOX
|----------------|
| |
| Yoga (1) |
| Music (1) |
| Taichi (0) |
| Dance (2) |
| Coaching (1) |
|----------------|
Thanking you in advance
I think that themes conditions should be into brackets
((theme.id_theme=calender.id_theme1)
OR (theme.id_theme=calender.id_theme2)
OR (theme.id_theme=calender.id_theme3))
Hope this help

Resources