Re: Transpose data using Linq - linq

I have a table, in the following format,
|BallotNo | City | CandidateNo | Votes
|Box1 | AA | Cand1 | 1200
|Box1 | AA | Cand2 | 1500
|Box2 | BB | Cand1 | 2500
|Box2 | BB | Cand2 | 3600
uing linq, I want to a get a result in the format
|Box1 |AA |Cand1 |1200 |Cand2 |1500
|Box2 |BB |Cand1 |2500 |Cand2 |3600
Thanks

You are looking for a grouping option.
As I have understood, you need to group by City row, it is pretty easy, see the http://msdn.microsoft.com/library/bb534492.aspx link on how to use the GroupBy extension method.

Related

How can i merge multiple columns from two different files in talend

Lets say i have multiple columns coming from two different files like that :
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
| | |
And another one like this :
USERNAME | AGE |
Jonathan | 33 |
Mike | 41 |
And i want to merge the data of the columns that have the same name into one like this while keeping the data of the columns that are unique at each field:
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
Jonathan | 33 | |
Mike | 41 | |
Sorry if the answer is obvious, im new to talend, thanks.
What tool is available toy you?
The Append function in SAS for example can do this for you.
You can use the append approach in Python, R or other language you intend using.
For Talen:
Copy the complete subjob1 – copy me sub job and paste it to create a second sub job.
Link the two sub jobs using an onSubjobOK link.
Open tFixedFlowInput, and change Records from first subjob to Records from second subjob.
Open tFileOutputDelimited on the new sub job, and tick Append, as shown in the following screenshot:
use a tUnite component to accomplish that
here is the link of the documentation : https://help.talend.com/r/fr-FR/8.0/orchestration/tunite
your flow would be
tFileInput1(excel or csv ) ----------------------------------------------
|
| ->tUnite -> tLogRow
tFileInput2(excel or csv )->tMap (add to empty fields GENDER & Children )|

ArrayFormula - If cell contains match, combine other cells with TEXTJOIN

I have a Google Sheet that contains names of characters, together with corresponding values for the group name, "selected" and attack power. It looks like this:
Sheet1
| NAME | GROUP NAME | SELECTED | ATTACK POWER |
|:---------|:-----------|----------:|-------------:|
| guile | Team Red | 1 | 333 |
|----------|------------|-----------|--------------|
| blanka | Team Red | 1 | 50 |
|----------|------------|-----------|--------------|
| sagat | Team Red | | 500 |
|----------|------------|-----------|--------------|
| ruy | Team Blue | 1 | 450 |
|----------|------------|-----------|--------------|
| vega | Team Blue | 2 | 150 |
Sheet2
In my second sheet, I have two columns. Group name, which contains names of each team from Sheet1 and names, which contains my current ArrayFormula:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT('Sheet1'!A:A; 1*('Sheet1'!B:B=A2))))
Using this formula I can combine all characters into one cell (with textjoin, repeated with row breaks) based on the value in Group name. The result looks like the following:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
| | sagat |
|------------|---------------------------|
| Team Blue | ruy |
| | vega |
|------------|---------------------------|
The problem is that I only want to combine the characters with having a selected value of 1. End-result should instead look like this:
| GROUP NAME | NAME |
|:-----------|:--------------------------|
| Team Red | guile |
| | blanka |
|------------|---------------------------|
| Team Blue | ruy |
|------------|---------------------------|
I tried the following setup using a IF-statement, but it just returns a string of FALSE:
=ARRAYFORMULA(TEXTJOIN(CHAR(10); 1;
REPT(IF('Sheet1'!C:C="1";'Sheet1'!A:A); 1*('Sheet1'!B:B=A2))))
Can this be one?
paste in F2 cell:
=UNIQUE(FILTER(B:B, C:C=1))
paste in G2 cell and drag down:
=TEXTJOIN(CHAR(10), 1, FILTER(A:A, B:B=F2, C:C=1))
or G2 cell be like:
=ARRAYFORMULA(TEXTJOIN(CHAR(10), 1,
REPT(FILTER(Sheet1!A:A, Sheet1!C:C=1), 1*(FILTER(Sheet1!B:B, Sheet1!C:C=1)=F2))))

<blockquote> tag inserted when using image in cell of RST table?

When I use the following code:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | |.. image:: /images/merchant-rating.png |
+----------------------+---------------+---------------------------------------------------------------------+
The text preceding the image in column C gets wrapped in <blockquote> tags in the HTML output. Is there any way to avoid this?
To avoid the blockquote tag in the first paragraph of the third column, you could try using this:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | | |img| |
+----------------------+---------------+---------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png
Instead, you'll get two paragraphs.
Use a substitution and remove the separating line so that Sphinx interprets the content as a single block of text.
+-----------------+--------------+------------------------------------------------------------------+
| A | B | C |
+=================+==============+==================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |img| |
+-----------------+--------------+------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png

Efficient way to join by levenshtein in Hive or Impala

I have two tables one includes about 17K (NLIST) records while the other 57K (FNAMES).
I would like to join the both by comparing the records using levenshtein formula.
Here is the example for the content of tables:
Table NLIST:
+------+-------------+
| ID | S_NAME |
+------+-------------+
| 1 | Avi |
| 2 | Moshe |
| 3 | David |
....
Table FNAMES:
+------+-------------+
| ID | NICKNAMES |
+------+-------------+
| 1 | Avile |
| 2 | Dudi |
| 3 | Moshiko |
| 4 | Avi |
| 5 | DAVE |
....
The above tables are just examples. In the real case the names column can include more than one word.
The required result should be:
+------+-------------+--------+
| ID | NICKNAMES | S_NAME |
+------+-------------+--------+
| 1 | Avile | Avi |
| 2 | Dudi | David |
| 3 | Moshiko | Moshe |
| 4 | Avi | Avi |
| 5 | DAVE | David |
...
Here is the code I use:
select FNAMES.NICKNAMES, NLIST.S_NAME
from NICKNAMES
LEFT OUTER JOIN NLIST
ON(true)
WHERE levenshtein (FNAMES.NICKNAMES, NLIST.S_NAME) <=4
The above code runs for a very long time and I stopped its running.
How can I make it run in a reasonable time?
In addition, I think the levenshtein distance depends on the length of the words. How can I find the optimal value for the distance (in this case I chose 4 arbitrarily)?
Hive Table performance is depends upon various point .
Query enginee
File format
use VECTORIZATION set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true;
If you have good server you can try with Impala and definitely it is faster than Hive.
You can do the fine tuning of impala which will give you an edge to execute this query faster .Tuning Impala for Performance

Get 1 value of each date SSRS

Ussing SSRS, I have data with duplicate values in Field1. I need to get only 1 value of each month.
Field1 | Date |
----------------------------------
30 | 01.01.1990 |
30 | 01.01.1990 |
30 | 01.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
It should be ssrs expression with average value of each month or mb there are other solutions to get requested data by ssrs expression. Requested data in table:
30 | 01.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
Hope for help.
There is no SumDistinct function in SSRS, and it is real lack of it (CountDistinct exist although). So you obviously can't achieve what you want easy way. You have two options:
Implement a new stored procedure with select distinct, returning reduced set of fields to avoid repeated data that you need. You then need to use this stored procedure to build new dataset and use in your table. But this way obviously may be not applicable in your case.
The other option is to implement your own function, which will save state of aggregation and perform distinct sum. Take a look at this page, it contains examples of code that you need.

Resources