Consecutive Event Sequence Matching in Clickhouse - clickhouse

I am trying to do some funnel analysis using Clickhouse. I am aware of sequenceMatch/windowFunnel functions but they allow events in between sequences. I am trying to show how many users navigated to a certain a certain path with different querystring params consecutively.
Given the following array [url, eventsequence]
['/someurl/page?a=1', 1]
['/someurl/page?a=2', 2]
['/someurl/page?a=3', 4]
['/someurl/page?a=4', 5]
['/someurl/page?a=4', 6]
I would like to evaluate that the above sequence of events saw the user navigate directly from page to page 3 seperate times, events 1->2, 4->5 and 5-6.

Worked this out - you can pass in a sequence to sequenceCount and use the pattern which says make sure the events have no gaps
(?1)(?t<=1)(?2)
sequenceCount('(?1)(?t<=1)(?2)')(sequence,
ilike(page, '%a%'),
ilike(page, '%a%')) as sequences

Related

Mathematica removing columns cannot take positions through 2 to 3 error

I have a matrix consisting of 3 rows and 4 columns of which which I require the central two columns.
I have attempted extracting the central two columns as follows:
a = a[[2 ;; 3, All]];
On the mathematica function list, the first entry in a[[2 ;; 3, All]] represents the rows and the second the columns, however whenever I try a[[All,2 ;; 3]] it removes the top row rather than the two columns. For some reason they seem inverted. I tried going around this by switching the entries around however, when I use a[[2 ;; 3, All]], I get the error: Part: Cannot take positions 2 through 3 in a.
I cannot wrap my head around why this keeps happening. It also refuses to extract single columns from the matrix as well.
You show that you are assigning a variable to itself and then saying that things don't work for you. That makes me think you might have previously made assignments to variables and the results of that are lurking in the background and might be responsible for what you are seeing.
With a fresh start of Mathematica, before you do anything else, try
mat={{a,b,c,d},
{e,f,g,h},
{i,j,k,l}};
take23[row_]:=Take[row,{2,3}];
newmat = Map[take23, mat]
Map performs the function take23 on every row and returns a list containing all the results giving
{{b,c},
{f,g},
{j,k}}
If need be you can abbreviate that to
newmat = Map[Take[#,{2,3}]&, mat]
but that requires you understand # and & and it gives the same result.
If necessary you can further abbreviate that to
newmat = Take[#,{2,3}]& /# mat
Map is widely used in Mathematica programming and can do many more things than just extract elements. Learning how to use that will increase your Mathematica skill greatly.
Or if you really need to use ;; then this
newmat = mat[[All, 2;;3]]
I interpret the documentation for that to mean you want to do something with All the rows and then within each row you want to extract from the second to the third item. That seems to work for me and instantly returns the same result.
If you instead wrote
newmat = mat[[1;;2, 2;;3]]
that would tell it that you wanted to work from row 1 down to row 2 and within those you want to work from column 2 to column 3 and that gives
{{b,c},
{f,g}}

Reporting Multiple Values & Sorting

Having a bit of an issue and unsure if it's actually possible to do.
I'm working on a file that I will enter target progression vs actual target reporting the % outcome.
PAGE 1
¦NAME ¦TAR 1 %¦TAR 2 %¦TAR 3 %¦TAR 4 %¦OVERALL¦SUB 1¦SUB 2¦SUB 3¦
¦NAME1¦ 114%¦ 121%¦ 100%¦ 250%¦ 146%¦ 2¦ 0¦ 0%¦
¦NAME2¦ 88%¦ 100%¦ 90%¦ 50%¦ 82%¦ 0¦ 1¦ 0%¦
¦NAME3¦ 82%¦ 54%¦ 64%¦ 100%¦ 75%¦ 6¦ 6¦ 15%¦
¦NAME4¦ 103%¦ 64%¦ 56%¦ 43%¦ 67%¦ 4¦ 4¦ 24%¦
¦NAME5¦ 87%¦ 63%¦ 89%¦ 0%¦ 60%¦ 3¦ 2¦ 16%¦
Now I already have it sorting all rows by the Overall % column so I can quickly see at a glance but I am creating a second page that I need to reference points.
So on the second page I would like to somehow sort and reference different columns for example
PAGE 2
TOP TAR 1¦Name of top %¦Top %¦
TOP TAR 2¦Name of top %¦Top %¦
Is something like this possible to do?
Essentially I'm creating an Employee of the Month form that automatically works out who has topped what.
I'm willing to drop a paypal donation for whoever can figure this out for me as I've been doing it manually every month and would appreciate the time saved
I don't think a complicated array formula is necessary for this - I am suggesting a fairly standard Index/Match approach.
First set up the row titles - you can just copy and transpose them from Page 1, or use a formula in A2 of Page 2 like
=transpose('Page 1'!B1:E1)
The use them in an index/match to get the data in the corresponding column of the main sheet and find its maximum (in C2)
=max(index('Page 1'!A:E,0,match(A2,'Page 1'!A$1:E$1,0)))
Finally look up the maximum in the main sheet to find the corresponding name:
=index('Page 1'!A:A,match(C2,index('Page 1'!A:E,0,match(A2,'Page 1'!A$1:E$1,0)),0))
If you think there could be a tie for first place with two or more people getting the same score, you could use a filter to get the different names:
So if the max score is in B8 this time (same formula)
=max(index('Page 1'!A:E,0,match(A8,'Page 1'!A$1:E$1,0)))
the different names could be spread across the corresponding row using transpose (in C8)
=ArrayFormula(TRANSPOSE(filter('Page 1'!A:A,index('Page 1'!A:E,0,match(A8,'Page 1'!A$1:E$1,0))=B8)))
I have changed the test data slightly to show these different scenarios
Results

Rails Active Record Query Finding Item Out Of Scope

I'm not sure I fully understand active record querying, but I am running into a very peculiar issue where my active record conditionals seem to be pulling in items outside my current scope. Here is an example of what I am seeing:
In Rails 2.3 console
>> Transaction.single_card.find(:all).map(&:id)
=> [0, 1, 2, 3, 4, 5]
>> Transaction.single_card.find(:all, :conditions => "cards.number = '1234'").map(&:id)
=> [9]
<this line added because the [9] was being cut in half by scroll bar>
How is this happening? Why, when I add extra conditions to my query, do I pull a record that should not be there at all? From my understanding, the extra conditional should check Transactions 0..5 (the transactions with a single card) and see if the card number is 1234. But the query pulls Transaction 9, which has 2 cards associated with it, which is why it did not appear in the initial query. What is going on?
Extra note: The single_card named scope :includes the cards reference

Sorting by counting the intersection of two lists in MongoDB

We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.

recursive nested loops

Example Scenario: Note, this can be as deep or as shallow depending on the website.
Spider scans the first page for links. it stores it as array1.
spider enters the first link, it's now on second page. it sees links, and stores it as array2.
spider enters the first link on the second page, its now on third page.
it sees links upon, and stores it as array 3.
Please note that this is generic scenario. I want to highlight the need to do many loops within loops.
rootArray[array1,array2,array3....]
how can i do a recursive nested loops ? array2 is the children of each VALUE of array1 (we assume the structure is very uniform, each VALUE of array 1 has similiar links in array2). Array 3 is the children of each Value of array2. and so on.
module Scratch
def self.recur(arr, depth, &fn)
arr.each do |a|
a.is_a?(Array) ? recur(a, depth+1, &fn) : fn.call(a, depth)
end
end
arr = [[1, 2, 3], 4, 5, [6, 7, [8, 9]]]
recur(arr, 0) { |x,d| puts "#{d}: #{x}" }
end
You'll want to store these results in a tree, not a collection of arrays. Page1 would have child nodes for each link. Each of those has child nodes for its links, etc. An alternate approach would be to just store all of the links in one array, recursing through the site to find the links in question. Do you really need them in a structure analogous to that of the site?
You'll also want to check for duplicate links when adding any new link to the list/tree/whatever that you've already got. Otherwise, loops like page_1 -> page_2 -> page_1... will break your app.
What's your real goal here? Page crawlers aren't exactly new technology.
It all depends on what you are trying to do.
If you are harvesting links then a hash or set will work well. An array can be used too but can lead to some gotchas.
If you need to show the structure of the site you'll want a tree or arrays of arrays along with some way of flagging which urls you've visited.
In any case you need to avoid redundant links to keep from getting into a loop. It's also real common to put some sort of limitation on how deep you'll descend and whether you'll remember and/or follow links outside of the site.
Gweg, I just answered this on your other post.
How do I create nested FOR loops with varying depths, for a varying number of arrays?

Resources