Dynamic number system in Qlik Sense - dashboard

My data consists of large numbers, I have a column say - 'amount', while using it in charts(sum of amount in Y axis) it shows something like 1.4G, I want to show them as if is billion then e.g. - 2.8B, or in millions then 80M or if it's in thousands (14,000) then simply- 14k.
I have used - if(sum(amount)/1000000000 > 1, Num(sum(amount)/1000000000, '#,###B'), Num(sum(amount)/1000000, '#,###M')) but it does not show the M or B at the end of the figure and also How to include thousand in the same code.

EDIT: Updated to include the dual() function.
This worked for me:
=dual(
if(sum(amount) < 1, Num(sum(amount), '#,##0.00'),
if(sum(amount) < 1000, Num(sum(amount), '#,##0'),
if(sum(amount) < 1000000, Num(sum(amount)/1000, '#,##0k'),
if(sum(amount) < 1000000000, Num(sum(amount)/1000000, '#,##0M'),
Num(sum(amount)/1000000000, '#,##0B')
))))
, sum(amount)
)
Here are some example outputs using this script to format it:
=sum(amount)
Formatted
2,526,163,764
3B
79,342,364
79M
5,589,255
5M
947,470
947k
583
583
0.6434
0.64
To get more decimals for any of those, like 2.53B instead of 3B, you can format them like '#,##0.00B' by adding more zeroes at the end.
Also make sure that the Number Formatting property is set to Auto or Measure expression.

Related

Invalid syntax loop in Stata

I'm trying to run a for loop to make a balance table in Stata (comparing the demographics of my dataset with national-level statistics)
For this, I'm prepping my dataset and attempting to calculate the percentages/averages for some key demographics.
preserve
rename unearnedinc_wins95 unearninc_wins95
foreach var of varlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019 { //continuous or binary; to put categorical vars use kwallis test
dis "for variable `var':"
tabstat `var'
summ `var'
local `var'_samplemean=r(mean)
}
clear
set obs 11
gen var=""
gen sample=.
gen F=.
gen pvalue=.
replace var="% Female" if _n==1
replace var="Age" if _n==2
replace var="% Non-white" if _n==3
replace var="HH size" if _n==4
replace var="% Parent" if _n==5
replace var="% Employed" if _n==6
replace var="Savings stock ($)" if _n==7
replace var="Debt stock ($)" if _n==8
replace var="Earned income last mo. ($)" if _n==9
replace var="Unearned income last mo. ($)" if _n==10
replace var="% Under FPL 2019" if _n==11
foreach col of varlist sample {
replace `col'=100*round(`fem_`col'mean', 0.01) if _n==1
replace `col'=round(`age_`col'mean') if _n==2
replace `col'=100*round(`nonwhite_`col'mean', 0.01) if _n==3
replace `col'=round(`hhsize_`col'mean', 0.1) if _n==4
replace `col'=100*round(`parent_`col'mean', 0.01) if _n==5
replace `col'=100*round(`employed_`col'mean', 0.01) if _n==6
replace `col'=round(`savings_wins95_`col'mean') if _n==7
replace `col'=round(`debt_wins95_`col'mean') if _n==8
replace `col'=round(`earnedinc_wins95_`col'mean') if _n==9
replace `col'=round(`unearninc_wins95_`col'mean') if _n==10
replace `col'=100*round(`underfpl2019_`col'mean', 0.01) if _n==11
}
I'm trying to run the following loop, but in the second half of the loop, I keep getting an 'invalid syntax' error. For context, in the first half of the loop (before clearing the dataset), the code stores the average values of the variables as a macro (`var'_samplemean). Can someone help me out and mend this loop?
My sample data:
clear
input byte fem float(age nonwhite) byte(hhsize parent) float employed double(savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95) float underfpl2019
1 35 1 6 1 1 0 2500 0 0 0
0 40 0 4 1 1 0 10000 1043 0 0
0 40 0 4 1 1 0 20000 2400 0 0
0 40 0 4 1 1 .24 20000 2000 0 0
0 40 0 4 1 1 10 . 2600 0 0
Thanks!
Thanks for sharing the snippet of data. Apart from the fact the variable unearninc_wins95 has already been renamed in your sample data, the code runs fine for me without returning an error.
That being said, the columns for your F-statistics and p-values are empty once the loop at the bottom of your code completes. As far as I can see there is no local/varlist called sample which you're attempting to call with the line foreach col of varlist sample{. This could be because you haven't included it in your code, in which case please do, or it could be because you haven't created the local/varlist sample, in which case this could well be the source of your error message.
Taking a step back, there are more efficient ways of achieving what I think you're after. For example, you can get (part of) what you want using the package stat2data (if you don't have it installed already, run ssc install stat2data from the command prompt). You can then run the following code:
stat2data fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019, saving("~/yourstats.dta") stat(count mean)
*which returns:
preserve
use "~/yourstats.dta", clear
. list, sep(11)
+----------------------------+
| _name sN smean |
|----------------------------|
1. | fem 5 .2 |
2. | age 5 39 |
3. | nonwhite 5 .2 |
4. | hhsize 5 4.4 |
5. | parent 5 1 |
6. | employed 5 1 |
7. | savings_wins 5 2.048 |
8. | debt_wins95 4 13125 |
9. | earnedinc_wi 5 1608.6 |
10. | unearninc_wi 5 0 |
11. | underfpl2019 5 0 |
+----------------------------+
restore
This is missing the empty F-statistic and p-value variables you created in your code above, but you can always add them in the same way you have with gen F=. and gen pvalue=.. The presence of these variables though indicates you want to run some tests at some point and then fill the cells with values from them. I'd offer advice on how to do this but it's not obvious to me from your code what you want to test. If you can clarify this I will try and edit this answer to include that.
This doesn't answer your question directly; as others gently point out the question is hard to answer without a reproducible example. But I have several small comments on your code which are better presented in this form.
Assuming that all the variables needed are indeed present in the dataset, I would recommend something more like this:
local myvarlist fem age nonwhite hhsize parent employed savings_wins95 debt_wins95 earnedinc_wins95 unearninc_wins95 underfpl2019
local desc `" "% Female" "Age" "% Non-white" "HH size" "% Parent" "% Employed" "Savings stock ($)" "Debt stock ($)" "Earned income last mo. ($)" "Unearned income last mo. ($)" "% Under FPL 2019" "'
local i = 1
gen variable = ""
gen mean = ""
local i = 1
foreach var of local myvars {
summ `var', meanonly
local this : word `i' of `desc'
replace variable = "`this'" in `i'
if inlist(`i', 1, 3, 5, 6, 11) {
replace mean = strofreal(100 * r(mean), "%2.0f") in `i'
}
else if `i' == 4 {
replace mean = strofreal(r(mean), "%2.1f") in `i'
}
else replace mean = strofreal(r(mean), "%2.0f") in `i'
local ++i
}
This has not been tested.
Points arising include:
Using in is preferable for what you want over testing the observation number with if.
round() is treacherous for rounding to so many decimal places. Most of the time you will get what you want, but occasionally you will get bizarre results arising from the fact that Stata works in binary, like any equivalent program. It is safer to treat rounding as a problem in string manipulation and use display formats as offering precisely what you want.
If the text you want to show is just the variable label for each variable, this code could be simplified further.
The code hints at intent to show other stuff, which is easily done compatibly with this design.

Spotfire Custom Expression : Calculate (Num/Den) Percentages

I am trying to plot Num/Den type percentages using OVER. But my thoughts does not appear to translate into spotfire custom expression syntax.
Sample Input:
RecordID CustomerID DOS Age Gender Marker
9621854 854693 09/22/15 37 M D
9732721 676557 09/18/15 65 M D
9732700 676557 11/18/15 65 M N
9777003 5514882 11/25/15 53 M D
9853242 1753256 09/30/15 62 F D
9826842 1260021 09/30/15 61 M D
9897642 3375185 09/10/15 74 M N
9949185 9076035 10/02/15 52 M D
10088610 3512390 09/16/15 33 M D
10120650 41598 10/11/15 67 F N
9949185 9076035 10/02/15 52 M D
10088610 3512390 09/16/15 33 M D
10120650 41598 09/11/15 67 F N
Expected Out:
Row Labels D Cumulative_D N Cumulative_N Percentage
Sep 6 6 2 2 33.33%
Oct 2 8 1 3 37.50%
Nov 1 9 1 4 44.44%
My counts are working.
I want to take the same Cumulative_N & Cumulative_D count and plot Percentage over [Axis.X] as a line chart.
Here's what I am using:
UniqueCount(If([Marker]="N",[CustomerID])) / UniqueCount(If([Marker]="D",[CustomerID])) THEN SUM([Value]) OVER (AllPrevious([Axis.X])) as [CumulativePercent]
I understand SUM([Value]) is not the way to go. But I don't know what to use instead.
Also tried the one below as well, but did not:
UniqueCount(If([Marker]="N",[CustomerID])) OVER (AllPrevious([Axis.X])) / UniqueCount(If([Marker]="D",[CustomerID])) OVER (AllPrevious([Axis.X])) as [CumulativePercent]
Can you have a look ?
I found a way to make it work, but it may not fit your overall solution. I should mention i used Count() versus UniqueCount() so that the results would mirror your desired output.
Add a transformation to your current data table
Insert a calculated column Month([DOS]) as [TheMonth]
Set Row Identifers = [TheMonth]
Set value columns and aggregation methods to Count([CustomerID])
Set column titles to [Marker]
Leave the column name pattern as %M(%V) for %C
That will give you a new data table. Then, you can do your cumulative functions. I did them in a cross table to replicate your expected results. Insert a new cross table and set the values to:
Sum([D]), Sum([N]), Sum([D]) OVER (AllPrevious([Axis.Rows])) as [Cumulative_D],
Sum([N]) OVER (AllPrevious([Axis.Rows])) as [Cumulative_N],
Sum([N]) OVER (AllPrevious([Axis.Rows])) / Sum([D]) OVER (AllPrevious([Axis.Rows])) as [Percentage]
That should do it.
I don't know if Spotfire released a fix or based on everyone's inputs I could get the syntax right. But here is the solution that worked for me.
For Columns D & N,
COUNT([CustomerID])
For columns Cumulative_D & Cumulative_N,
Count([CustomerID]) OVER (AllPrevious([Axis.X])) where [Axis.X] is DOS(Month), Marker
For column Percentage,
Count(If([Marker]="N",[CustomerID])) OVER (AllPrevious([Axis.X])) / Count(If([Marker]="D",[CustomerID])) OVER (AllPrevious([Axis.X]))
where [Axis.X] is DOS(Month)

Confusing result of Matlab's size() function

Very simple issue here I suspect, I am completely new to Matlab (1st day) and have been unable to resolve this by consulting the documentation.
As part of a larger problem, I need to get the dimensions of an image into two variables (row_obj and col_obj for the image object).
According to the documentation:
[m,n] = size(obj)
m: The number of rows in obj.
n: The number of columns in obj.
So, following that, I wrote:
[row_obj, col_obj] = size(object);
disp(row_obj);
disp(col_obj);
disp(size(object));
Which produced the output:
> >> call_i_spy
> 21
>
> 81
>
> 21 27 3
It appears row_obj is correct. If disp(size(object)) produces 21 27 3, then why is col_obj not 27 (the value I need)? What is [row_obj, col_obj] = size(object); actually doing?

Ruby time subtraction

There is the following task: I need to get minutes between one time and another one: for example, between "8:15" and "7:45". I have the following code:
(Time.parse("8:15") - Time.parse("7:45")).minute
But I get result as "108000.0 seconds".
How can I fix it?
The result you get back is a float of the number of seconds not a Time object. So to get the number of minutes and seconds between the two times:
require 'time'
t1 = Time.parse("8:15")
t2 = Time.parse("7:45")
total_seconds = (t1 - t2) # => 1800.0
minutes = (total_seconds / 60).floor # => 30
seconds = total_seconds.to_i % 60 # => 0
puts "difference is #{minutes} minute(s) and #{seconds} second(s)"
Using floor and modulus (%) allows you to split up the minutes and seconds so it's more human readable, rather than having '6.57 minutes'
You can avoid weird time parsing gotchas (Daylight Saving, running the code around midnight) by simply doing some math on the hours and minutes instead of parsing them into Time objects. Something along these lines (I'd verify the math with tests):
one = "8:15"
two = "7:45"
h1, m1 = one.split(":").map(&:to_i)
h2, m2 = two.split(":").map(&:to_i)
puts (h1 - h2) * 60 + m1 - m2
If you do want to take Daylight Saving into account (e.g. you sometimes want an extra hour added or subtracted depending on today's date) then you will need to involve Time, of course.
Time subtraction returns the value in seconds. So divide by 60 to get the answer in minutes:
=> (Time.parse("8:15") - Time.parse("7:45")) / 60
#> 30.0

Smaller variation between times of different days

I have working on a algorithm that select a set of date/time objects with a certain characteristic, but with no success.
The data to be used are in a list of lists of date/time objects,
e.g.:
lstDays[i][j], i <= day chooser, j <= time chooser
What is the problem? I need a set of nearest date/time objects. Each time of this set must come from different days.
For example: [2012-09-09 12:00,2012-09-10 12:00, 2012-09-11 12:00]
This example of a set of date/time objects is the best example because it minimize to zero.
Important
Trying to contextualize this: I want to observe if a phenomenon occurs at the same time in differents days. If not, I want to evaluate if distance between the hours is reasonable for my study.
I would like a generic algorithm to any number of days and time. This algorithm should return all set of datetime objects and its time distance:
[2012-09-09 12:00,2012-09-10 12:00, 2012-09-11 12:00], 0
[2012-09-09 13:00,2012-09-10 13:00, 2012-09-11 13:05], 5
and so on.
:: "0", because the diff between all times on the first line from datetime objects is zero seconds.
:: "5", because the diff between all times on the second line from datetime objects is five seconds.
Edit: Code here
for i in range(len(lstDays)):
for j in range(len(lstDays[i])):
print lstDays[i][j]
Output:
2013-07-18 11:16:00
2013-07-18 12:02:00
2013-07-18 12:39:00
2013-07-18 13:14:00
2013-07-18 13:50:00
2013-07-19 11:30:00
2013-07-19 12:00:00
2013-07-19 12:46:00
2013-07-19 13:19:00
2013-07-22 11:36:00
2013-07-22 12:21:00
2013-07-22 12:48:00
2013-07-22 13:26:00
2013-07-23 11:18:00
2013-07-23 11:48:00
2013-07-23 12:30:00
2013-07-23 13:12:00
2013-07-24 11:18:00
2013-07-24 11:42:00
2013-07-24 12:20:00
2013-07-24 12:52:00
2013-07-24 13:29:00
Note: lstDays[i][j] is a datetime object.
lstDays = [ [/*datetime objects from a day i*/], [/*datetime objects from a day i+1*/], [/*datetime objects from a day i+2/*], ... ]
And I am not worried with perfomance, a priori.
Hope that you can help me! (:
Generate a histogram:
hours = [0] * 24
for object in objects: # whatever your objects are
# assuming object.date_time looks like '2013-07-18 10:55:00'
hour = object.date_time[11:13] # assuming the hour is in positions 11-12
hours[int(hour)] += 1
for hour in xrange(24):
print '%02d: %d' % (hour, hours[hour])
You can always resort to calculating the times into a list, then estimate the differences, and group those objects that are below that limit. All packed into a dictionary with the difference as the value and the the timestamps as keys. If this is not exactly what you need, I'm pretty sure it should be easy to select whatever result you need from it.
import numpy
import datetime
times_list = [object1.time(), object2(), ..., objectN()]
limit = 5 # limit of five seconds
groups = {}
for time in times_list:
delta_times = numpy.asarray([(tt-time).total_seconds() for tt in times_list])
whr = numpy.where(abs(delta_times) < limit)[0]
similar = [str(times_list[ii]) for ii in whr]
if len(similar) > 1:
similar.sort()
max_time = numpy.max(delta_times[whr]) # max? median? mean?
groups[tuple(similar)] = max_time

Resources