How to split a column which has data in XML form to different rows of new Database as KEY VALUE in TALEND - oracle

In old DB i have a data in one column as
<ADDRESS>
<CITY>ABC</CITY>
<STATE>PQR</SERVICE>
</ADDRESS>
In my new DB i want this data to be stored in KEY VALUE fashion like:
USER_ID KEY VALUE
1 CITY ABC
1 STATE PQR
Someone please help me how to migrate this kind of data using TALEND tool.

Design job like below.
tOracleInput---tExtractXMLFiled---output.
tOracleInput component you can select XML column and make datatype as String.
tExtractXmlFiled component pass this XML column as " XML Filed" and set the Loop xpath Expression as "/ADDRESS"
Add new two Columns in output Schema of tExtractXmlFiled for city & STATE
Set XPath Query in Mapping for city "/ADDRESS/CITY" and for STATE "/ADDRESS/STATE"
Now you have both the values in output.
See the image for more details.
as I explain in your previous post you can follow the same approach for making Key value pair.
how-to-split-one-row-in-different-rows-in-talend
Or you can use tUnpivot component as you did here.
As you said source data has Special character then use below expression to replace it.
Steps: after oracle input add tMap and use this code for replacement of special symbol
row24.XMLField.replaceAll("&", "<![CDATA["+"&"+"]]>")
once that is done execute the job and see the result it should work.

I'd use tJavaFlex.
Component Settings:
tJavaFlex schema:
In the begin part, use
String input = ((String)globalMap.get("row2.xmlField")); // get the xml Fields value
String firstTag = input.substring(input.indexOf("<")+1,input.indexOf(">"));
input = input.replace("<"+firstTag+">","").replace("</"+firstTag+">","");
int tagCount = input.length() - input.replace("</", "<").length();
int closeTagFinish = -1;
for (int i = 0; i<tagCount ; i++) {
in the main part, parse the XML tag name and value, and have the output schema contain that 2 additional column. MAIN part will be like:
/*set up the output columns */
output.user_id = ((String)globalMap.get("row2.user_id"));
output.user_first_name = ((String)globalMap.get("row2.user_first_name"));
output.user_last_name = ((String)globalMap.get("row2.user_last_name"));
Then we can calculate the key-value pairs for the XML, without knowing the KEY values.
/*calculate columns out of XML */
int openTagStart = input.indexOf("<",closeTagFinish+1);
int openTagFinish = input.indexOf(">",openTagStart);
int closeTagStart = input.indexOf("<",openTagFinish);
closeTagFinish = input.indexOf(">",closeTagStart);
output.xmlKey = input.substring(openTagStart+1,openTagFinish);
output.xmlValue = input.substring(openTagFinish+1,closeTagStart);
tJavaFlex End part:
}
Output looks like:
.-------+---------------+--------------+------+--------.
| tLogRow_2 |
|=------+---------------+--------------+------+-------=|
|user_id|user_first_name|user_last_name|xmlKey|xmlValue|
|=------+---------------+--------------+------+-------=|
|1 |foo |bar |CITY |ABC |
|1 |foo |bar |STATE |PQR |
'-------+---------------+--------------+------+--------'

Related

Laravel Save Multiple Data to 1 column

So I have 2 variable for storing the selected time of the user ('time_to' and 'time_from) with these sample data('7:30','8:00')
How can I save these two into 1 column('c_time') so it would look like this('7:30-8:00')?
if i understand you correctly, you can create a column of string (varchar) type. and then create the data for your column like this :
$time_to = '7:30';
$time_from= '8:00';
$colValueToBeStored = "(".$time_to."-".$time_from.")";
then just put $colValueToBeStored inside your column.
and to reverse it:
$colValueToBeStored = "(7:30-8:00)";
$res = explode("-",str_replace([")","("],"",$colValueToBeStored));
$time_to = $res[0];
$time_from = $res[1];
define your c_time column type as JSON, that way you can store multiple values, as it will be easier to retrieve as well. Like,
...
$cTime['time_to'] = "7:30";
$cTime['time_from'] = "8:00";
$cTimeJson = json_encode($cTime);
// save to db
...

Extracting a specific number from a string using regex function in Spark SQL

I have a table in mysql which has POST_ID and corresponding INTEREST:
I used following regular expression query to select interest containing 1,2,3.
SELECT * FROM INTEREST_POST where INTEREST REGEXP '(?=.*[[:<:]]1[[:>:]])(?=.*[[:<:]]3[[:>:]])(?=.*[[:<:]]2[[:>:]])';
I imported the table in HDFS. However, when I use the same query in SparkSQL, it shows null records.
How to use REGEXP function here in spark to select interest containing 1,2,3?
The Regex you are using need to be changed a bit. You could do something like the following.
scala> val myDf2 = spark.sql("SELECT * FROM INTEREST_POST where INTEREST REGEXP '^[1-3](,[1-3])*$'")
myDf2: org.apache.spark.sql.DataFrame = [INTEREST_POST_ID: int, USER_POST_ID: int ... 1 more field]
scala> myDf2.show
+----------------+------------+--------+
|INTEREST_POST_ID|USER_POST_ID|INTEREST|
+----------------+------------+--------+
| 1| 1| 1,2,3|
I got the solution. You can do something like this:
var result = hiveContext.sql("""SELECT USER_POST_ID
| FROMINTEREST_POST_TABLE
| WHERE INTEREST REGEXP '(?=.*0[1])(?=.*0[2])(?=.*0[3])' """)
result.show
Fetching Records from INTEREST_POST_TABLE

Using Rails Update to Append to a Text Column in Postgresql

Thanks in advance for any help on this one.
I have a model in rails that includes a postgresql text column.
I want to append (i.e. mycolumn = mycolumn || newdata) data to the existing column. The sql I want to generate would look like:
update MyOjbs set mycolumn = mycolumn || newdata where id = 12;
I would rather not select the data, update the attribute and then write the new data back to the database. The text column could grow relatively large and I'd rather not read that data if I don't need to.
I DO NOT want to do this:
#myinstvar = MyObj.select(:mycolumn).find(12)
newdata = #myinstvar.mycolumn.to_s + newdata
#myinstvar.update_attribute(:mycolumn, newdata)
Do I need to do a raw sql transaction to accomplish this?
I think you could solve this problem directly writing your query using the arel gem, that's already provided with rails.
Given that you have these values:
column_id = 12
newdata = "a custom string"
you can update the table this way:
# Initialize the Table and UpdateManager objects
table = MyOjbs.arel_table
update_manager = Arel::UpdateManager.new Arel::Table.engine
update_manager.table(table)
# Compose the concat() function
concat = Arel::Nodes::NamedFunction.new 'concat', [table[:mycolumn], new_data]
concat_sql = Arel::Nodes::SqlLiteral.new concat.to_sql
# Set up the update manager
update_manager.set(
[[table[:mycolumn], concat_sql]]
).where(
table[:id].eq(column_id)
)
# Execute the update
ActiveRecord::Base.connection.execute update_manager.to_sql
This will generate a SQL string like this one:
UPDATE "MyObjs" SET "mycolumn" = concat("MyObjs"."mycolumn", 'a custom string') WHERE "MyObjs"."id" = 12"

Pig Latin issue

please help me out..its really urgent..deadline nearing, and im stuck with it since 2 weeks..breaking my head but no result. i am a newbie in piglatin.
i have a scenario where i have to filter data from a csv file.
the csv is on hdfs, and has two columns.
grunt>> fl = load '/user/hduser/file.csv' USING PigStorage(',') AS (conv:chararray, clnt:chararray);
grunt>> dump f1;
("first~584544fddf~dssfdf","2001")
("first~4332990~fgdfs4s","2001")
("second~232434334~fgvfd4","1000")
("second~786765~dgbhgdf","1000)
("second~345643~gfdgd43","1000")
what i need to do is i need to extract only the first word before the 1st '~' sign and concat that with the second column value of the csv file. Also i need to group the concatenated result returned and count the number of such similar rows, and create a new csv file as out put, where there would be 2 columns again. 1st column would be the concatenated value and the 2nd column would be the row count.
i.e
("first 2001","2")
("second 1000","3")
and so on.
I have written the code here but its just not working. i have used STRSPLIT. it is splitting the values of the first column of input csv file. but i dont know how to extract the first split value.
code is given below:
convData = LOAD '/user/hduser/file.csv' USING PigStorage(',') AS (conv:chararray, clnt:chararray);
fil = FILTER convData BY conv != '"-1"'; --im using this to filter out the rows that has 1st column as "-1".
data = FOREACH fil GENERATE STRSPLIT($0, '~');
X = FOREACH data GENERATE CONCAT(data.$0,' ',convData.clnt);
Y = FOREACH X GROUP BY X;
Z = FOREACH Y GENERATE COUNT(Y);
var = FOREACH Z GENERATE CONCAT(Y,',',Z);
STORE var INTO '/user/hduser/output.csv' USING PigStorage(',');
STRSPLIT returns a tuple, the individual elements of which you can access using the numbered syntax. This is what you need:
data = FOREACH fil GENERATE STRSPLIT($0, '~') AS a, clnt;
X = FOREACH data GENERATE CONCAT(a.$0,' ', clnt);

How to iterate through table using selenium?

I have a table called UserManagement that contains information about the user.This table gets updated whenever new user is created. If i create two users then i need check whether two users are actually created or not. Table contains ID,UserName,FirstName,LastName,Bdate..ctc. Here ID will be generated automatically.
I am running Selenium-TestNG script.Using Selenium,how can i get the UserName of the two users which i have created? Should i have to iterate through table? If so how to iterate through the table?
Use ISelenium.GetTable(string) to get the contents of the table cells you want. For example,
selenium.GetTable("UserManagement.0.1");
will return the contents of the table's first row and second column. You could then assert that the correct username or usernames appear in the table.
Get the count of rows using Selenium.getxpathcount(\#id = fjsfj\td\tr") in a variable rowcount
Give the columncount in a variable
Ex:
int colcount = 5;
Give the req i.e New user
String user1 = "ABC"
for(i = 0;i <=rowcount;i++)
{
for(j=0;j<=colcount;j++)
{
if (user1==selenium.gettable("//#[id=dldl/tbody" +i "td"+j))
{
system.out.println(user1 + "Inserted");
break;
}
break;
}
}
Get the number of rows using:
int noOfRowsInTable = selenium.getXpathCount("//table[#id='TableId']//tr");
If the UserName you want to get is at fixed position, let's say at 2nd position, then for each row iterate as given below:
selenium.getText("xpath=//table[#id='TableId']//tr//td[1]");
Note: we can find the number of columns in that table using same procedure
int noOfColumnsInTable = selenium.getXpathCount("//table[#id='TableId']//tr//td");
Generically, something like this?
table = #browser.table(:id,'tableID')
table.rows.each do |row|
# perform row operations here
row.cells.each do |cell|
# do cell operations here
end
end

Resources