What's the complexity of " has(arr,ele) " function in Clickhouse? - clickhouse

What's the complexity of " has(arr,ele) " function in Clickhouse?
What's the complexity of " has(arr,ele) " function in Clickhouse?

To solve has(arr,ele) Clickhouse reads all data from the column, uncompresses all data, checks each element of each array.
You may try to use bloom_filter skip index https://kb.altinity.com/altinity-kb-queries-and-syntax/skip-indexes/skip-index-bloom_filter-for-array-column/

Related

Protractor - Unable to loop every item by using ElemArrayFinder each

I am using Protractor + Jasmine.
I have 2 elements (links) in a table, and I need to delete one after one - Or one-by-one. After I deleted the first item, the table will refresh and re-populated with the remaining elements (or links).
My code below only deleting the first element and exiting the code. I am unable to loop and delete all elements.
I am getting the total count correct.
element.all(by.xpath("//span[#class='abc']")).count().then(function (count)
{
element.all(by.xpath("//span[#class='abc']")).each(function (elem, index)
{
elem.getText().then(function (name)
{
console.log("NAME IS " + name);
var row = element(by.xpath('//span[contains(text(),"' + name + '")]/../../..'));
row.click();
var overFlow = element(by.xpath('//span[contains(text(),"' + name + '")]/../../..//*[#class="zzz"]'));
helper.clickElemWithJavascript(overFlow);
helper.scrollIntoView(deleteButton);
helper.clickElemWithJavascript(deleteButton);
})
})
}); //count
Promises chaining is one solution for this kind of issues.
I figured it out in this way and this is my approach:
I made these steps as a function.
I get the total count of elements using element.all().each(). This returns array of elements/array of values.
Using a For loop and length of the above array, calling the function (step 1) which has steps to delete Single element.
I followed Promises chaining. So unless the previous step was not finished, control flow will not execute the next step.
I am not familiar with Async and Await, so I followed the approach above.

How to sort and filter data in "table" and "tabulate" in Stata

Commands tab x and table x returns summary stats sorted by x.
Is there a way to sort and filter tables of summary statistics by summary statistics, such as means and frequencies?
For example, I would like to have a table of means sorted by means.
There is a combination of collapse and then sort for that, but they replace the dataset in memory.
Is the answer provided by Nick the only option: Stata: Summary stats with table. Order by N?
Nick solved your problem in his earlier answer. The crucial line was:
gsort -n
which sorted by descending values of the count variable. Change the "n" to any of the other statistics and you will sort by that statistic. Here's a sort by descending values of the mean.
sysuse auto.dta, clear
gen make2 = substr(make,1, strpos(make," ")-2)
replace make2 = make if missing(make2)
collapse (count)n=price (mean)mean=price (p50)median=price (sd)sd=price ///
(min)min=price (max)max=price, by(make2)
gsort -mean
format mean-max %9.2f
format n %9.0f
list make2 mean n median sd min max, sep(0) noobs

(PHP) Probability of two random strings be same

I have this code for generating random strings
public function random_string($length = 5)
{
$chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890';
return substr(str_shuffle($chars),0,$length);
}
So, is it possible that two generated strings can be same ?
In my case, there can be a maximum of 62P5 (using permutation) numbers of strings of 5 characters.
But whats the possibility that the 10th & 1000th generated random strings be same ?
This is known as the birthday problem and can be solved by
$chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890';
$length = 5;
$numChars = strlen($chars);
$numberOfStringsGenerated = 50000;
print "numStringsGenerated: " . $numberOfStringsGenerated . "\n";
print "numChars: " . $numChars . "\n";
print "lengthOfString: " . $length . "\n";
$totalPerms = 1;
for ($ii=0; $ii<$length; $ii++) {
$totalPerms *= $numChars - $ii;
}
print "totalPerms: " . $totalPerms . "\n";
$totalProbablity = 1;
for ($ii=0; $ii< $numberOfStringsGenerated; $ii++) {
$totalProbablity *= ($totalPerms - $ii)/$totalPerms;
}
print "Probablity: ";
print 1 - $totalProbablity . "\n";
Here is the codepad output
Here is the Wikipedia page
This calculation assumes that the PRNG for the str_shuffle is good enough for all permutations to be equally likely, which won't be exactly true, especially as the number of chars increases.
Obviously that is possible.
The right way to do this, is store already used strings in a database.
I use this in a system that generates a random session token that is stored at database.
Always that I generate a new random session token, i query for it in session table. If no results, OK, else, generate a new token.
The probability is very low, but not impossible, and everytime you insert new tokens at table, the probability of overwriting just grow.

How to create missing records within date-time range in pig latin

I have input records of the form
2013-07-09T19:17Z,f1,f2
2013-07-09T03:17Z,f1,f2
2013-07-09T21:17Z,f1,f2
2013-07-09T16:17Z,f1,f2
2013-07-09T16:14Z,f1,f2
2013-07-09T16:16Z,f1,f2
2013-07-09T01:17Z,f1,f2
2013-07-09T16:18Z,f1,f2
These represent timestamps and events. I have written these by hand, but actual data should be sorted based on time.
I would like to generate a set of records which would be input to graph plotting function which needs continuous time series. I would like to fill in missing values, i.e. if there are entries for "2013-07-09T19:17Z" and "2013-07-09T19:19Z", I would like to generate entry for "2013-07-09T19:18Z" with predefined value.
My thoughts on doing this:
Use MIN and MAX to find the start and end date in the series
Write UDF which takes min and max and returns relation with missing
timestamps
Join above 2 relations
I cannot get my head around on how to implement this in PIG though. Would appreciate any help.
Thanks!
Generate another file using a script (outside pig)with all time stamps between MIN and MAX , including MIN and MAX. Load this as a second data set. Here is a sample that I used from your data set. Please note I filled in only few gaps not all.
2013-07-09T01:17Z,d1,d2
2013-07-09T01:18Z,d1,d2
2013-07-09T03:17Z,d1,d2
2013-07-09T16:14Z,d1,d2
2013-07-09T16:15Z,d1,d2
2013-07-09T16:16Z,d1,d2
2013-07-09T16:17Z,d1,d2
2013-07-09T16:18Z,d1,d2
2013-07-09T19:17Z,d1,d2
2013-07-09T21:17Z,d1,d2
Do a COGROUP on the original dataset and the generated dataset above. Use a nested FOREACH GENERATE to write output dataset. If first dataset is empty, use the values from second set to generate output dataset else the first dataset. Here is the piece of code I used on these two datasets.
Org_Set = LOAD 'pigMissingData/timeSeries' USING PigStorage(',') AS (timeStamp, fl1, fl2);
Default_set = LOAD 'pigMissingData/timeSeriesFull' USING PigStorage(',') AS (timeStamp, fl1, fl2);
coGrouped = COGROUP Org_Set BY timeStamp, Default_set BY timeStamp;
Filled_Data_set = FOREACH coGrouped {
x = COUNT(times);
y = (x == 0? (Default_set.fl1, Default_set.fl2): (Org_Set.fl1, Org_Set.fl2));
GENERATE FLATTEN(group), FLATTEN(y.$0), FLATTEN(y.$1);
};
if you need further clarification or help let me know
In addition to #Rags answer, you could use the STREAM x THROUGH command and a simple awk script (similar to this one) to generate the date range once you have the min and max dates. Something similar to (untested! - you might need to single line the awk script with semi-colon command delimitation, or better to ship it as a script file)
grunt> describe bounds;
(min:chararray, max:chararray)
grunt> dump bounds;
(2013/01/01,2013/01/04)
grunt> fullDateBounds = STREAM bounds THROUGH `gawk '{
split($1,s,"/")
split($2,e,"/")
st=mktime(s[1] " " s[2] " " s[3] " 0 0 0")
et=mktime(e[1] " " e[2] " " e[3] " 0 0 0")
for (i=st;i<=et;i+=60*24) print strftime("%Y/%m/%d",i)
}'`;

Sorting Ruby Arrays - failed with TypeError: can't convert Symbol into Integer

I am trying to sort an array that contains hashes. The array looks something like.
[:amazon, [{:price=>" 396 ", :author=>"Motwani", :name=>"Randomized Algorithms ", :url=>"", :source=>"amazon"},
{:price=>" 255 ", :author=>"Thomas H. Cormen", :name=>"Introduction To Algorithms ", :url=>"", :source=>"amazon"}]]
I am trying to sort this array using:
source_array.sort_by { |p| p[1][:price] }
But I keep on getting error:
failed with TypeError: can't convert Symbol into Integer
Not sure what indexing is going wrong here
You're trying to sort an array of two elements:
hash :amazon,
inner big array.
So, any sort call on top array will try to sort these two elements.
What are you trying to achieve could be done this way:
a[1] = a[1].sort_by {|f| f[:price].to_i}
Edit: for a more general approach:
# declare source array
a = [:amazon,
[{:price=>" 396 ", :author=>"Motwani", :name=>"Randomized Algorithms ", :url=>"", :source=>"amazon"},
{:price=>" 255 ", :author=>"Thomas H. Cormen", :name=>"Introduction To Algorithms ", :url=>"", :source=>"amazon"}]]
# convert to hash for easier processing
b = Hash[*a]
# now sort former inner table by price
b.merge!(b) {|k, v| v.sort_by {|p| p[:price].to_i}}
# return to old representation
b.to_a[0]
=> [:amazon, [{:price=>" 255 ", :author=>"Thomas H. Cormen", :name=>"Introducti
on To Algorithms ", :url=>"", :source=>"amazon"}, {:price=>" 396 ", :author=>"M
otwani", :name=>"Randomized Algorithms ", :url=>"", :source=>"amazon"}]]
Your input is actually a pair (name, [book]), so make sure you only sort the second element of the pair (the books array):
[source_array[0], source_array[1].sort_by { |book| book[:price].to_i }]

Resources