talend take out zeros before the comma - etl

I have a file with two columns the first one with a name and the second one with a number.
The size of the number column is 20 chars, the numbers use to be less than 2 chars size the rest of the chars are complite with 0.
I need to take out all the zeros before the comma. I should use a tMap, How?

The solution:
Using a tMap, put a Var in the midle of both files (Input and output).
In the var use:
"0"+row1.numberField.split(",")[0].replace("0", "") + "." + row1.numberField.split(",")[1]
Example:
000000001,58
Result:
01.58
Solution 2:
Define your own routine:
public static String calcImp(String theNumber) {
Float theFNumber = new Float(theNumber.replace(",", "."));
return Float.toString(theFNumber).replace(".", ",");
}
Example:
000000001,587
Result:
1,587

Related

LINQ: select rows where any word of string start with a certain character

I want extract from a table all rows where in a column (string) there is at least one word that starts with a specified character.
Example:
Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'
If the specified character is T -> I would extract all 3 rows
If the specified character is S -> I would extract only the second column
...
Please help me
Assuming you mean "space delimited sequence of characters, or begin to space or space to end" by "word", then you can split on the delimiter and test them for matches:
var src = new[] {
"this is the first row",
"this is th second row",
"this is the third row"
};
var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));
The LINQ Enumerable.Any method tests a sequence to see if any element matches, so you can split each string into a sequence of words and see if any word begins with the desired letter, compensating for case.
Try this:
rows.Where(r => Regex.IsMatch(r, " [Tt]"))
You can replace the Tt with Ss (both assuming you want either upper case or lower case).
The problem of course is, what is a "word"?
Is the character sequence 'word' in the sentence above a word according to your definition? It doesn't start with a space, not even a white-space.
A definition of a word could be:
Define wordCharacter: something like A-Z, a-z.
Define word:
- the non-empty sequence of wordCharacters at the beginning of a string followed by a non-wordcharacter
- or the non-empty sequence of wordCharacters at the end of a string preceded by a non-wordcharacter
- any non-empty sequence of wordCharacters in the string preceded and followed by a non-wordcharacter
Define start of word: the first character of a word.
String: "Some strange characters: 'A', 9, äll, B9 C$ X?
- Words: Some, strange characters, A
- Not Words: 9, äll, B9, C$ X?
So you first have to specify precisely what you mean by word, then you can define functions.
I'll write it as an extension method of IEnumerable<string>. Usage will look similar to LINQ. See Extension Methods Demystified
bool IsWordCharacter(char c) {... TODO: implement your definition of word character}
static IEnumerable<string> SplitIntoWords(this string text)
{
// TODO: exception if text null
if (text.Length == 0) return
int startIndex = 0;
while (startIndex != text.Length)
{ // not at end of string. Find the beginning of the next word:
while (startIndex < text.Length && !IsWordCharacter(text[startIndex]))
{
++startIndex;
}
// now startIndex points to the first character of the next word
// or to the end of the text
if (startIndex != text.Length)
{ // found the beginning of a word.
// the first character after the word is either the first non-word character,
// or the end of the string
int indexAfterWord = startWordIndex + 1;
while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
{
++indexAfterWord;
}
// all characters from startIndex to indexAfterWord-1 are word characters
// so all characters between startIndexWord and indexAfterWord-1 are a word
int wordLength = indexAfterWord - startIndexWord;
yield return text.SubString(startIndexWord, wordLength);
}
}
}
Now that you've got a procedure to split any string into your definition of words, your query will be simple:
IEnumerabl<string> texts = ...
char specifiedChar = 'T';
// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
// split the text into words
// keep only the words that start with specifiedChar
// if there is such a word: keep the text
.Where(text => text.SplitIntoWords()
.Where(word => word.Length > 0 && word[0] == specifiedChar)
.Any());
var yourChar = "s";
var texts = new List<string> {
"this is the first row",
"this is th second row",
"this is the third row"
};
var result = texts.Where(p => p.StartsWith(yourChar) || p.Contains(" " + yourChar));
EDITED:
Alternative way (I'm not sure it works in linq query)
var result = texts.Where(p => (" " + p).Contains(" " + yourChar));
you can use .ToLower() if you want Case-insensitive check.

VBScript: How can I trim a number to 4 decimal places but not round it?

My code currently looks like this:
FormatNumber((CDbl(0.05935)),4)
The returned value is 0.0594 rather than 0.0593 which is what I need.
You can try parsing this number to string then trimming it and again parsing back to float.
example:
v = 100.0097
x = Str$(v) ' Gives " 100.0097"
//This adds a leading space for positive numbers
or
x = CStr(v) ' Gives "100.0097"
and then trim it as your need
finalstr = LEFT(variable, (LEN(variable)-4))
then parse it to float
finaltrimed = CDbl(finalstr)

Variable floating-point precision format string

I'm trying to print floating point numbers as percentages, and I'd like for the number of digits after the decimal place to vary as needed. Currently I have:
fmt.Printf("%.2f%%\n", 100*(value/total))
The problem is that if my percentage is, say, exactly 50, I will get the following output:
50.00%
While what I want is to get:
50%
Is there any way for the format string to indicate that a maximum of 2 digits of precision should be used, but only if needed?
There's no direct solution with the fmt package.
But you can remove the dot and zeros at end with a regular expression:
r, _ := regexp.Compile(`\.?0*$`)
fmt.Printf("%s%%\n", r.ReplaceAllString(fmt.Sprintf("%.2f", 100*(value/total)),""))
Bonus: the same regex works for any number of trailing zeros.
Side note: You'll display 50.0041 the same way than 50, which might be a little misleading.
There's no way to do that inside fmt with e.g. another flag or what have you. You'll have to write out the logic yourself. You could do something like:
var final string
doubledecimal := fmt.Sprintf("%.2f", 100*value/total)
if doubledecimal[len(doubledecimal)-2:] == "00" {
final = doubledecimal[:len(doubledecimal)-3]
} else {
final = doubledecimal
}
fmt.Printf("%s%%\n, final)
You could similarly use strings.Split to split on the decimal point and work from there.
You could even adjust this to turn 50.10% into 50.1%.
doubledecimal := fmt.Sprintf("%.2f", 100*value/total)
// Strip trailing zeroes
for doubledecimal[len(doubledecimal)-1] == 0 {
doubledecimal = doubledecimal[:len(doubledecimal)-1]
}
// Strip the decimal point if it's trailing.
if doubledecimal[len(doubledecimal)-1] == "." {
doubledecimal = doubledecimal[:len(doubledecimal)-1]
}
fmt.Printf("%s%%\n", doubledecimal)
One way could be to have an if statement controlling the print output, i.e. if the result is cleanly divisible by 1 (result%1 == 0) then print the result to no decimal places. Otherwise print to .2f as you've done above. Not sure if there is a shorter way of doing this, but I think this should work.

Record Reader Split to convert Fixed Length to Delimited ASCII file

I have a file which is of 128 MB so it is splitted into 2 blocks (Block size =64 MB ).
I am trying to convert a Fixed Length File to a Delimited ASCII File using Custom Record Reader class
Problem:
When the first split of the file is processed I am able to get the records properly when I see with a hive table on top of the data it is also accessing data node2 to fetch characters until the end of the record.
But, the second split is starting with a \n character and also the number of records is getting doubled.
Ex:
First Split: 456 2348324534 34953489543 349583534
Second Split:
456 23 48324534 34953489543 349583534
As part of the record reader inorder to skip the characters which is read in the first input split the following piece of code is added
FixedAsciiRecordReader(FileSplit genericSplit, JobConf job) throws IOException {
if ((start % recordByteLength) > 0) {
pos = start - (start % recordByteLength) + recordByteLength;
}
else {
pos = start;
}
fileIn.skip(pos);
}
The Input Fixed Length file has a \n character at the end of each record.
Should Any value be set to the start variable as well?
I found the solution to this problem, i have a variable length header in my Input fixed length file which was not skipped, so the position was not exactly starting at the beginning of a record instead it was starting at position (StartofRecord - HeaderLength). This made each record to read a few characters(as much as the headerlength) from the previous record.
Updated Code:
if ((start % recordByteLength) > 0) {
pos = start - (start % recordByteLength) + recordByteLength + headerLength;
}
else {
pos = start;
}
fileIn.skip(pos);

Split a string by 2 various number of chars, skipping non alphanumerics

I have a string like:
hn$8m3kj4.23hs#8;
i need to split it as follow: first entry should be of one char length, second entry of 2 chars, third entry of one char, fourth - by 2 chars and so on.
then concatenate one char with two chars entries by a semicolon :
if some chars at the end remains unpaired, they should be displayed as well.
it is important to skip all non alphanumeric chars.
so the final string should be:
h:n8 m:3k j:42 3:hs 8:
see, 8 has no 2 chars pair but it is displayed anyway.
i have tried with a loop but i get huge code.
also tried regexs but it split by wrong number of chars.
you can try this:
s = "hn$8m3kj4.23hs#8;"
s.gsub(/\W/, '').scan(/(.)(..)?/).map { |i| i.join ':' }.join ' '
=> "h:n8 m:3k j:42 3:hs 8:"
this will not skip underscores though.
if you need to skip them as well, use this one:
s = "hn$8m3k_j4.23hs#8;_"
s.gsub(/\W|_/, '').scan(/(.)(..)?/).map { |i| i.join ':' }.join ' '
=> "h:n8 m:3k j:42 3:hs 8:"
See live demo here

Resources