How to format thousands from digit a with a point - hadoop

I'am looking for the best way to format thousands from digit a with a point. I am creating a view from an other table and I need for some columns to have this type of format.
ex : 10000 -> 10.000
Thank you in advance for your possible answers !

There is way how to do it without UDF but it can be performance issue. But for small sets of data it works:
You can use the following combination of build-in functions:
// Template
select reverse(concat_ws(".",split(reverse(<column-to-format>),"(?<=\\G.{3})")));
// sample 1 - format number 12345678
select reverse(concat_ws(".",split(reverse("12345678"),"(?<=\\G.{3})")));
// sample 2 - format number 10000
select reverse(concat_ws(".",split(reverse("10000"),"(?<=\\G.{3})")));
Explanation for sample 1 with number 12345678
1. reverse("12345678") Result: 87654321
2. split(reverse("12345678"),"(?<=\\G.{3})")")) Result: array [876,543,21]
3. concat_ws(".",split(reverse("10000"),"(?<=\\G.{3})")) Result "876.543.21"
4. finally reverse it back. Result: "12.543.876"

Related

SQL Server Reporting: How calculate value based on the previous calculated value int the same column?

I'm trying to calculate a row value based on the previous row value in the same column within a report expression. I can't precalculate this from database since starting point of calculation is dependent from input parameters and values in a table should be recalculated dynamically within report itself.
In Excel analogical data and formula look like as it is shown below (starting point is always 100):
B C D E
Price PreviousPrice CalcValue Formula
1 NULL NULL 100
2 2.6 2.5 104 B2/C2*D1
3 2.55 2.6 102 B3/C3*D2
4 2.6 2.55 104 B4/C4*D3
5 2.625 2.6 105 B5/C5*D4
6 2.65 2.625 106 B6/C6*D5
7 2.675 2.65 107 B7/C7*D6
I tried to calculate expected values ("CalcValue" is the name of column where expression is set) like this:
=Fields!Price.Value/ PreviousPrice.Value * Previous(reportitems("CalcValue").Value))
but got an error "Aggregate functions can be used only on report items contained in page headers and footers"
Can you please advice whether expected result is achievable in my case and suggest a solution?
Thank you in advance!
Sadly I'm still facing with issue: calculated column does not consider previous calculated value. E.g., I added CalcVal field with 100 as default and tried to calculate using above approach, like: =previous(runningValue(Fields!CalcVal.Value, sum, "DataSet1") ) * Fields!Price.Value/Fields!PreviousPrice.Value.
But in this case it always multiples Fields!Price.Value/Fields!PreviousPrice.Value by 100..
For example CalcVal on Fly always show 200
=previous(runningValue(Fields!CalcVal.Value, sum, "DataSet1")) * 2
https://imgur.com/Wtg3Wsg
I tried with your sample data, here is how I achieved the results
Formula to use, You might have to take care of null values
=Fields!Price.Value/(Fields!PreviousPrice.Value*Previous(Fields!CalcValue.Value))
Edit: Update to answer after Op's comment
CalcValue is caluated with below formula i.e on the fly
=RunningValue(CountDistinct("Tablix6"),Count,"Tablix6"*100
and then Final value as below
=Fields!Price.Value/(Fields!PreviousPrice.Value*
Previous(RunningValue(CountDistinct("Tablix6"),Count,"Tablix6"))*100)

How to compare two dates with current date in Tibco BW?

For Example, if you have a date coming from Start activity, How can we compare that date with current date?
There actually a function present exactly for that purpose. It is called compare-date or compare-dateTime (if time is also relevant).
The signature of this function is:
compare-date('1980-08-05','1980-09-04')
returns -1 if first
returns 1 if second
return 0 if both are equal.
In your case this could mean something like:
compare-date($Start/root/Date,current-date())
You can do in following way:
Take two mappers for example, for first use two inputs and give following xpath in an order:
1.For current Date :
tib:parse-date("yyyyMMdd",tib:format-date("yyyyMMdd",current-date()))
For Start Input
tib:parse-date("yyyy-MM-dd", $Start/root/Date)
and compare it in somewhere in your End's Input like:
if ($parseDate/root/currentdate= $parseDate/root/rDate) then substring(
$parseDate/root/currentdate - $parseDate/root/Date, 3,
string-length(($parseDate/root/currentdate - $parseDate/root/Date))-3) else
substring( $parseDate/root/currentdate - $parseDate/root/Date,2,
string-length(($parseDate/root/currentdate - $parseDate/root/Date))-2)

Highlighting mininimum row value in Pander

I am trying to display a dataframe in an RMarkdown document using the Pander package.
I would like to highlight the minimum value in each row of values. Here's what I have tried:
df <- replicate(4, rnorm(5))
df <- as.data.frame(df)
df$min <- apply(df, 1, min)
emphasize.strong.cells(which(df == df$min, arr.ind = T))
pander(df[1:4])
When I do this I get the error:
Error in check.highlight.parameters(emphasize.strong.cells, nrow(t), ncol(t)) :
Too high number passed for column indexes that should be kept below 6
I can print out the whole table (with the min column) without any trouble or I can print out a partial table without emphasis, but neither of these is ideal. I want the highlighting, but I do not wish to include the 'min' column.
I imagine the fact that I am leaving some highlighted cells out of the pander command is causing the error.
Is there a way around this? Or a better way to do this?
Thanks.
Subquestion: What if I wanted to highlight the minimum in the first few rows and the maximum in the next few. Is that possible in a single table?
Instead of the which lookup, with the possibility to match row minimums in the wrong rows, you can easily construct those array indices with a simple sequence (1:N) and calling which.min on each row, eg with apply:
> df <- replicate(4, rnorm(5))
> df <- as.data.frame(df)
> emphasize.strong.cells(cbind(1:nrow(df), apply(df, 1, which.min)))
> pander(df)
----------------------------------------------
V1 V2 V3 V4
----------- ----------- ----------- ----------
0.6802 0.1409 **-0.7992** 0.1997
0.6797 **-0.2212** 1.016 0.6874
2.031 -0.009855 0.3881 **-1.275**
1.376 0.2619 **-2.337** -0.1066
**-0.4541** 1.135 -0.1566 0.2912
----------------------------------------------
About your next question: you could of course do that in a single table, eg rbind two matrices created similarly as described above with which.min and which.max.

How to give equations in Apache pig

I am trying to get a value from this equation
--counted gives the total row count in a file
samplecount = counted*(10/100);
How to sample data according to this
--Load data
examples = LOAD '/home/sreeveni/myfiles/PE/USCensus1990New.csv' ;
--Group data
groupedByUser = group examples all;
--count no of lines in the file
counted = FOREACH groupedByUser generate COUNT(examples) ;
--sampling
sampled = SAMPLE examples counted*(10/100);
store sampled into '/home/sreeveni/myfiles/OUT/samplesout';
Showing error in above line
Invalid scalar projection: counted : A column needs to be projected
from a relation for it to be used as a scalar
Please advice.
Am I doing anything wrong.
i guess sample works with a number between [0,1]. In your case, its exceeding the required value. If you want just 10% of the data, pass 0.1 directly and to get that in a code, find this percentage in a FOREACH statement only.
If you are trying to generate a sample of "examples" with 10% of the total number of rows, all you have to do is:
SAMPLE examples 0.1;
Read the documentation for SAMPLE command here.

Sorting a Group by Numbers which is like 1.0, 1.2,..., 1.14...and so on in SSRS

I have a table having sequence number like 1.0,1.1,1.2,....1.13,1.14 and so on.
I need to Sort the columns in the table based on this sequence number.
The problem here is when i give sort it is giving me the following:
1.1 Data
1.10 Data2
1.11 Data3
1.12 Data4
1.2 Data5
which is wrong and i need like 1.1, 1.2, 1.3...1.10
Suggest me some idea to get it.
Earlier response is appreciable.
Thanks.
Pavithra
To do this you need to extract the relevant numbers from the sequence string, cast these substrings to int, then sort by these extracted int values.
Say I have data like yours:
I have added two Calculated Fields to the Dataset:
FirstGroup:
=CInt(Left(Fields!MyValue.Value, InStr(Fields!MyValue.Value, ".") - 1))
SecondGroup:
=CInt(Mid(Fields!MyValue.Value
, InStr(Fields!MyValue.Value, ".") + 1
, InStr(Fields!MyValue.Value, " ") - (InStr(Fields!MyValue.Value, ".") + 1)))
You can see I'm just using the built in SSRS text functions.
Create a Table and sort by these values:
This works for your example; obviously you might need to tweak slightly for your larger Dataset.

Resources