Any solutions on splitting output file? - etl

I'm a newbie here in Talend and what I am trying to do is to split different output file, here is an example of the file that I am working on...
File of Example
For example:
Whenever I see a column with a true value I need to produce different file of it including the row with the value of true.
So the output should be like this:
output example
output example
output example
Thanks in advance guys, hope someone could help me.

I would try this :
tFileInputExcel -> main -> tReplicate -> main -> tFilterRow (quoteName equals quote1
) -> tlog
-> main -> tFilterRow (quoteName equals
quote2
) -> tlog
-> main -> tFilterRow (quoteName equals quote3
) -> tlog
-> main -> tFilterRow (quoteName equals quote4
) -> tlog
Or using a tMap component
would do the staff using filtering optin in the output

Here is a dynamic solution.
Your input file should be sorted on "quoteName"
tFileInput : read your file
tFilterRow : filter on isLastItem : only "true" value (so you'll get only one row for each quoteName)
tflowToIterate : convert your flow into an iteration : you'll have n iterations created (n being the number of distinct quoteNames).
tFileInput : re-read your entire file in your current Iteration
tFilterRow : filter on quote=((String)globalMap.get("row2.quote")) (with row2.quote being the value of your globalVariable created by tFlowToIterate)
tFileOutput : output file. You can put something like : "C:/Temp/"+((String)globalMap.get("row3.quote"))+".txt" in order to generate your distinct files.
Use "Outline" view to get access to global variables created by tFlowToIterate.

Related

How to take a complex matrix based input from the user and pass it onto a vbs function?

I am trying to automate a page, where different forms gets displayed to different user (based on credentials), and different input fields get displayed for which input is seeked from the end-user.
Here is an example :
User | Forms accessible | Fields accessible
User 1 > Form 1 > Name
> Address
> Occupation
User 2 > Form 2 > Safety code
> Access auth. code
> Form 3 > Auditor acesss
> Auditee access
User x > Form n > Field 1.....n
My Approach so far:
I have tried the dictionary approach someting like
dictionary("Users") = "User1~User2~User3"
dictionary("Form") = "Form1#Form2~Form3#FormN" '<<"#" is the seperator for users
dictionary("Fields") = "field1$field2#field3$field4~field5#Field6"
Here is the mapping explained:
User1 → Form 1 → Field1 and field2
User 2 → form 2 → field3
User 2 → Form 3 → field4 and field 5
User 3 → Form N → Field 1...N
And then based on these separator and inputs i filter them out and make appropriate operations.
I am SURE that this is NOT an elegant solution. Would really appreciate if someone can shed some light on this.
One proposed solution is to maintain an external XML file which maintains access matrix for users AND forms AND Fields, and then pass that XML to the current function.

Collect to set with joining Java 8

Hi I'm trying to have a string represents concatenation of set of names for each teacher,
thus I need to use both Collectors.toSet and Collectors.joining(", ") how can I use them in 1 combine line ?
I can only make each one of them separately how can i do both of them ?
students.stream().collect(Collectors.groupingBy(student -> student.getTeacherName(), mapping(student -> student.getName(), toSet())
students.stream().collect(Collectors.groupingBy(student -> student.getTeacherName(), mapping(student -> student.getName(), joining(", "))
You should be able to use collectingAndThen():
students.stream()
.collect(groupingBy(Student::getTeacherName,
mapping(Student::getName,
collectingAndThen(toSet(), set -> String.join(", ", set)))))
I'm assuming you already know how to produce the set. We'll call it teacherSet.
You want to re-stream after producing the set:
// create teacher set...
teacherSet.stream().collect(Collectors.joining(","));
You can also join after you're done producing the set using String.join. Here is an example:
String.join(",", Arrays.stream("1,2,3,4,3,2,1".split(",")).collect(Collectors.toSet());
Or in your case:
// create teacher set...
String.join(",", teacherSet);

How to use Fields in Elm 0.13

I have been trying to get the fields to work, but keep failing. I have also been trying to look for examples, but the only examples I could find are using Elm 0.14, which use the new Channel API which isn't available in Elm 0.13.
So I started from the example offered in the catalog
import Graphics.Input.Field (..)
import Graphics.Input (..)
name : Input Content
name = input noContent
nameField : Signal Element
nameField = field defaultStyle name.handle identity "Name" <~ name.signal
And in order to use the field I tried
main : Signal Element
main = Signal.lift2 display Window.dimensions gameState
display : (Int,Int) -> GameState -> Element
display (w,h) g =
container w h middle <|
collage gameWidth gameHeight
(if | g.state == Menu ->
[ rect gameWidth gameHeight
|> filled black
, toForm nameField
, plainText "*The name entered in the nameField*"
]
| otherwise -> []
)
But I keep getting the following error
Expected Type: Signal.Signal Graphics.Element.Element
Actual Type: Graphics.Element.Element
Why isn't the element a signal anymore... The function definition clearly states it should output a signal, right? Now how would I be able to enter a name, that I would then be able to use inside a variable?
Elm 0.13 had some annoyingly confusing type error messages. Expected/Actual are usually swapped. In this case the problem comes from using nameField : Signal Element in display : (Int,Int) -> GameState -> Element. display is a pure (non-signal) function, but to be pure, you can't use a signal anywhere in there. To solve this, hoist the nameField signal up a level, to main. To use what is entered in the field, use the input signal:
main : Signal Element
main = Signal.lift4 display Window.dimensions gameState name.signal
nameField : Content -> Element
nameField = field defaultStyle name.handle identity "Name"
display : (Int,Int) -> GameState -> Content -> Element
display (w,h) g currentContent =
container w h middle <|
collage gameWidth gameHeight
(if | g.state == Menu ->
[ rect gameWidth gameHeight
|> filled black
, toForm (nameField currentContent) -- use something other than `currentContent` here to influence the field content.
, plainText currentContent.string
]
| otherwise -> []
)

Reverse the group data as a different record using Pig

Split the group record in to different records :
for eg :
Input : (A,(3,2,3))
Output in to 3 new lines:
A,3
A,2
A,3
Can any one let me know the option to do this please?
The problem is when you convert the output of Arraylist to tuple then it will be difficult to achieve what you want, so I recommend this approach, so it will be easy to get the output .
In your UDF code, instead of creating Arraylist, append the output into string seperated by comma and return back to pig script.
You final output should be like this from UDF as a string ie "3,2,3"
Then use the below code to get the result
C = FOREACH B GENERATE $0,NewRollingCount(BagToString($1)) AS rollingCnt
D = FOREACH C GENERATE $0,FLATTEN(TOKENIZE(rollingcnt));
DUMP D;

Debugging HXT performance problems

I'm trying to use HXT to read in some big XML data files (hundreds of MB.)
My code has a space-leak somewhere, but I can't seem to find it. I do have a little bit of a clue as to what is happening thanks to my very limited knowledge of the ghc profiling tool chain.
Basically, the document is parsed, but not evaluated.
Here's some code:
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core
import System.Environment (getArgs)
import Control.Monad (liftM)
main = do file <- (liftM head getArgs) >>= parseTuba
case file of(Left m) -> print "Failed."
(Right _) -> print "Success."
data Sentence t = Sentence [Node t] deriving Show
data Node t = Word { wSurface :: !t } deriving Show
parseTuba :: FilePath -> IO (Either String ([Sentence String]))
parseTuba f = do r <- runX (readDocument [] f >>> process)
case r of
[] -> return $ Left "No parse result."
[pr] -> return $ Right pr
_ -> return $ Left "Ambiguous parse result!"
process :: (ArrowXml a) => a XmlTree ([Sentence String])
process = getChildren >>> listA (tag "sentence" >>> listA word >>> arr (\ns -> Sentence ns))
word :: (ArrowXml a) => a XmlTree (Node String)
word = tag "word" >>> getAttrValue "form" >>> arr (\s -> Word s)
-- | Gets the tag with the given name below the node.
tag :: (ArrowXml a) => String -> a XmlTree XmlTree
tag s = getChildren >>> isElem >>> hasName s
I'm trying to read a corpus file, and the structure is obviously something like <corpus><sentence><word form="Hello"/><word form="world"/></sentence></corpus>.
Even on the very small development corpus, the program takes ~15 secs to read it in, of which around 20% are GC time (that's way too much.)
In particular, a lot of data is spending way too much time in DRAG state. This is the profile:
monitoring DRAG culprits. You can see that decodeDocument gets called a lot, and its data is then stalled until the very end of the execution.
Now, I think this should be easily fixed by folding all this decodeDocument stuff into my data structures (Sentence and Word) and then the RT can forget about these thunks. The way it's currently happening though, is that the folding happens at the very end when I force evaluation by deconstruction of Either in the IO monad, where it could easily happen online. I see no reason for this, and my attempts to strictify the program have so far been in vain. I hope somebody can help me :-)
I just can't even figure out too many places to put seqs and $!s in…
One possible thing to try: the default hxt parser is strict, but there does exist a lazy parser based on tagsoup: http://hackage.haskell.org/package/hxt-tagsoup
In understand that expat can do lazy processing as well: http://hackage.haskell.org/package/hxt-expat
You may want to see if switching parsing backends, by itself, solves your issue.

Resources