How to display like below explaination in freemarker - logic

Input
ListingName = abhay, hkm, Himanshu
ListingAddress = Delhi, Noida, Agra
This can have many values not only 3
Output
abhay
Delhi
hkm
Noida
Himanshu
Agra
Please help me how to do this in FreeMarker.

You need to loop the array with a <#list>. This loop is going through ListingName and using the ListingName's index to retrieve corresponding address from the second array.
<#assign ListingName = ["abhay", "hkm", "Himanshu"]>
<#assign ListingAddress = ["Delhi", "Noida", "Agra"]>
<#list ListingName as listing>
${listing}
${ListingAddress[listing?index]}
</#list>

Related

PIG:Twitter Sentiment Analysis

I am trying to implement the twitter sentiment analysis.I need to get all positive tweets and negative tweets and store them in particular text files.
sample.json
{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "google is a good company", "user_id": 450990391}{"id": 252479809098223616, "created_at": "Wed Apr 12 08:23:20 +0000 2016", "text": "facebook is a bad company","user_id": 450990391}
dictionary.text having all the positive and negetive words list
weaksubj 1 bad adj n negative
strongsubj 1 good adj n positive
Pig Script:-
tweets = load 'new.json' using JsonLoader('id:chararray,text:chararray,user_id:chararray,created_at:chararray');
dictionary = load 'dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray);
words = foreach tweets generate FLATTEN( TOKENIZE(text) ) AS word,id,text,user_id,created_at;
sentiment = join words by word left outer, dictionary by word;
senti2 = foreach sentiment generate words::id as id,words::created_at as created_at,words::text as text,words::user_id as user_id,dictionary::polarity as polarity;
res = FILTER senti2 BY polarity MATCHES '.*possitive.*';
describe res:-
res: {id: chararray,created_at: chararray,text: chararray,user_id: chararray,polarity: chararray}
But when I dump res I dont see any output, but it executes fine without any errors.
What is the mistake that I am doing here.
Please suggest me.
Mohan.V
I see 2 errors here
1 : line 2 - When you DUMP dictionary , you will see all the records
in column 1 with rest of columns showing empty.
Solution : Specify an appropriate delimiter using PigStorage();
dictionary = load 'dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray);
DUMP dictionary;
(weaksubj 1 bad adj n negative,,,,,)
(strongsubj 1 good adj n positive,,,,,)
Second error :
line 6 : Correct the spelling of positive ! use something like
res = FILTER senti2 BY UPPER(polarity) MATCHES '.*POSITIVE.*';
I see spelling mistake in:
res = FILTER senti2 BY polarity MATCHES '.*possitive.*';
Isn't it '.*positive.*' ?
As per my recommendations you should use custom UDF's for solving your problem . Now you can use elephant-bird-pig-4.1.jar,json-simple-1.1.1.jar .
Also if you wanted to look at example for these then you can use these Sentiment Analysis Tutorial .
If you wanted code then you can refer these code and format your code according to tutorial and my code ,
REGISTER ‘/usr/local/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/ usr/local /elephant-bird-pig-4.1.jar';
REGISTER '/ usr/local /json-simple-1.1.1.jar’;
load_tweets = LOAD '/user/new.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
extract_details = FOREACH load_tweets GENERATE myMap#'id' as id,myMap#'text' as text;
tokens = foreach extract_details generate id,text, FLATTEN(TOKENIZE(text)) As word;
dictionary = load '/user/dictionary.text' AS (type:chararray,length:chararray,word:chararray,pos:chararray,stemmed:chararray,polarity:chararray);
word_rating = join tokens by word left outer, dictionary by word using 'replicated’; describe word_rating;
rating = foreach word_rating generate tokens::id as id,tokens::text as text, dictionary::rating as rate;
word_group = group rating by (id,text);
avg_rate = foreach word_group generate group, AVG(rating.rate) as tweet_rating;
positive_tweets = filter avg_rate by tweet_rating>=0;

Spark - How to count number of records by key

This is probably an easy problem but basically I have a dataset where I am to count the number of females for each country. Ultimately I want to group each count by the country but I am unsure of what to use for the value since there is not a count column in the dataset that I can use as the value in a groupByKey or reduceByKey. I thought of using a reduceByKey() but that requires a key-value pair and I only want to count the key and make a counter as the value. How do I go about this?
val lines = sc.textFile("/home/cloudera/desktop/file.txt")
val split_lines = lines.map(_.split(","))
val femaleOnly = split_lines.filter(x => x._10 == "Female")
Here is where I am stuck. The country is index 13 in the dataset also.
The output should something look like this:
(Australia, 201000)
(America, 420000)
etc
Any help would be great.
Thanks
You're nearly there! All you need is a countByValue:
val countOfFemalesByCountry = femaleOnly.map(_(13)).countByValue()
// Prints (Australia, 230), (America, 23242), etc.
(In your example, I assume you meant x(10) rather than x._10)
All together:
sc.textFile("/home/cloudera/desktop/file.txt")
.map(_.split(","))
.filter(x => x(10) == "Female")
.map(_(13))
.countByValue()
Have you considered manipulating your RDD using the Dataframes API ?
It looks like you're loading a CSV file, which you can do with spark-csv.
Then it's a simple matter (if your CSV is titled with the obvious column names) of:
import com.databricks.spark.csv._
val countryGender = sqlContext.csvFile("/home/cloudera/desktop/file.txt") // already splits by field
.filter($"gender" === "Female")
.groupBy("country").count().show()
If you want to go deeper in this kind of manipulation, here's the guide:
https://spark.apache.org/docs/latest/sql-programming-guide.html
You can easily create a key, it doesn't have to be in the file/database. For example:
val countryGender = sc.textFile("/home/cloudera/desktop/file.txt")
.map(_.split(","))
.filter(x => x._10 == "Female")
.map(x => (x._13, x._10)) // <<<< here you generate a new key
.groupByKey();

How to split a column which has data in XML form to different rows of new Database as KEY VALUE in TALEND

In old DB i have a data in one column as
<ADDRESS>
<CITY>ABC</CITY>
<STATE>PQR</SERVICE>
</ADDRESS>
In my new DB i want this data to be stored in KEY VALUE fashion like:
USER_ID KEY VALUE
1 CITY ABC
1 STATE PQR
Someone please help me how to migrate this kind of data using TALEND tool.
Design job like below.
tOracleInput---tExtractXMLFiled---output.
tOracleInput component you can select XML column and make datatype as String.
tExtractXmlFiled component pass this XML column as " XML Filed" and set the Loop xpath Expression as "/ADDRESS"
Add new two Columns in output Schema of tExtractXmlFiled for city & STATE
Set XPath Query in Mapping for city "/ADDRESS/CITY" and for STATE "/ADDRESS/STATE"
Now you have both the values in output.
See the image for more details.
as I explain in your previous post you can follow the same approach for making Key value pair.
how-to-split-one-row-in-different-rows-in-talend
Or you can use tUnpivot component as you did here.
As you said source data has Special character then use below expression to replace it.
Steps: after oracle input add tMap and use this code for replacement of special symbol
row24.XMLField.replaceAll("&", "<![CDATA["+"&"+"]]>")
once that is done execute the job and see the result it should work.
I'd use tJavaFlex.
Component Settings:
tJavaFlex schema:
In the begin part, use
String input = ((String)globalMap.get("row2.xmlField")); // get the xml Fields value
String firstTag = input.substring(input.indexOf("<")+1,input.indexOf(">"));
input = input.replace("<"+firstTag+">","").replace("</"+firstTag+">","");
int tagCount = input.length() - input.replace("</", "<").length();
int closeTagFinish = -1;
for (int i = 0; i<tagCount ; i++) {
in the main part, parse the XML tag name and value, and have the output schema contain that 2 additional column. MAIN part will be like:
/*set up the output columns */
output.user_id = ((String)globalMap.get("row2.user_id"));
output.user_first_name = ((String)globalMap.get("row2.user_first_name"));
output.user_last_name = ((String)globalMap.get("row2.user_last_name"));
Then we can calculate the key-value pairs for the XML, without knowing the KEY values.
/*calculate columns out of XML */
int openTagStart = input.indexOf("<",closeTagFinish+1);
int openTagFinish = input.indexOf(">",openTagStart);
int closeTagStart = input.indexOf("<",openTagFinish);
closeTagFinish = input.indexOf(">",closeTagStart);
output.xmlKey = input.substring(openTagStart+1,openTagFinish);
output.xmlValue = input.substring(openTagFinish+1,closeTagStart);
tJavaFlex End part:
}
Output looks like:
.-------+---------------+--------------+------+--------.
| tLogRow_2 |
|=------+---------------+--------------+------+-------=|
|user_id|user_first_name|user_last_name|xmlKey|xmlValue|
|=------+---------------+--------------+------+-------=|
|1 |foo |bar |CITY |ABC |
|1 |foo |bar |STATE |PQR |
'-------+---------------+--------------+------+--------'

Linq to XML elements and descendants in the same search

Still learning Linq and had a problem trying to retrieve and element as well as descendants of another in the same select. I searched for a solution, but could not find what I was looking for and came up with a solution. But is it the right way to do it? Somehow, although it works, it does feel right.
I have the following XML structure:
<Tables>
<Table>
<SourceTable>WrittenRecordsTable</SourceTable>
<Researcher>Fred Blogs</Researcher>
<QuickRef>cwr</QuickRef>
<TableType>WrittenRecords</TableType>
<FieldMapping>
<RecordID>ID</RecordID>
<StartYear>StartYear</StartYear>
<EndYear>EndYear</EndYear>
<LastName>LastName</LastName>
<Title>Title</Title>
<Subject>Subject</Subject>
<Description>Reference</Description>
</FieldMapping>
</Table>
</Tables>
and the following Linq to XML:
var nodes = (from n in xml.Descendants("FieldMapping")
select new
{
SourceTable = (string)n.Parent.Element("SourceTable").Value,
RecordID = (string)n.Element("RecordID").Value,
StartYear = (string)n.Element("StartYear").Value,
EndYear = (string)n.Element("EndYear").Value,
LastName = (string)n.Element("LastName").Value,
Title = (string)n.Element("Title").Value,
Subject = (string)n.Element("Subject").Value
}).ToList();
It is the way I retrieve the SourceTable element that feels wrong. Am I worrying too much or is there a better way? Also, is it better to work with c# expressions rather than queries?
If your document structure always has to contain those nodes your query (and accessing SourceTable) is fine. It's fairly obvious of what's going on given reader knows how XML looks like.
However, if you want more top-down approach (which might seem more natural and easier to grasp), you can always query Table node first and store FieldMapping in a variable, but I wouldn't say it has any advantages over your approach:
var nodes = (from table in doc.Descendants("Table")
let fieldMapping = table.Element("FieldMapping")
select new
{
SourceTable = (string)table.Element("SourceTable").Value,
RecordID = (string)fieldMapping.Element("RecordID").Value,
StartYear = (string)fieldMapping.Element("StartYear").Value,
EndYear = (string)fieldMapping.Element("EndYear").Value,
LastName = (string)fieldMapping.Element("LastName").Value,
Title = (string)fieldMapping.Element("Title").Value,
Subject = (string)fieldMapping.Element("Subject").Value
}).ToList();

Sorting maps within maps by value

I'm trying to sort a map in Groovy that has maps as value. I want to iterate over the map and print out the values sorted by lastName and firstName values. So in the following example:
def m =
[1:[firstName:'John', lastName:'Smith', email:'john#john.com'],
2:[firstName:'Amy', lastName:'Madigan', email:'amy#amy.com'],
3:[firstName:'Lucy', lastName:'B', email:'lucy#lucy.com'],
4:[firstName:'Ella', lastName:'B', email:'ella#ella.com'],
5:[firstName:'Pete', lastName:'Dog', email:'pete#dog.com']]
the desired results would be:
[firstName:'Ella', lastName:'B', email:'ella#ella.com']
[firstName:'Lucy', lastName:'B', email:'lucy#lucy.com']
[firstName:'Pete', lastName:'Dog', email:'pete#dog.com']
[firstName:'Amy', lastName:'Madigan', email:'amy#amy.com']
[firstName:'John', lastName:'Smith', email:'john#john.com']
I've tried m.sort{it.value.lastName&&it.value.firstName} and m.sort{[it.value.lastName, it.value.firstName]}. Sorting by m.sort{it.value.lastName} works but does not sort by firstName.
Can anybody help with this, much appreciated, thanks!
This should do it:
m.values().sort { a, b ->
a.lastName <=> b.lastName ?: a.firstName <=> b.firstName
}

Resources