How do I include nulls in my get_path extraction? - data-extraction

I'm using a regex / get_path syntax to extract utm_campaign names from my event property data. some sessions have a utm campaign parameter, while others don't. I'd like to grab all sessions regardless, but pull the utm_campaign name out where it's found. In my current syntax, I'm only able to pull sessions that have a utm_campaign found in it's event properties. Does anyone know how to fix this?
Here's the query:
select
user_id,
regexp_substr(get_path(event_properties, 'queryString'), '\\b+\[-utm_campaign=\](\\w+)&\\b+', 1, 1, 'e', 1) as campaign_name
from e

Related

Using HLL_COUNT.MERGE outside of SQL

I can use the following query to general all the HLL sketches of the distinct counts:
SELECT category, count(distinct city), HLL_COUNT.INIT(city) FROM `table`
GROUP BY category
And I get something like this:
While I would normally use the HLL_COUNT.merge(...) function to get the total count, for example:
select 'all -- hll', HLL_COUNT.MERGE(x), null from (select category, count(distinct city), HLL_COUNT.INIT(city) x from `datadocs-163219.010ff92f6a62438aa47c10005fe98fc9.inv` group by category) _
For various reasons, I need to do the MERGE outside of SQL/BigQuery. Is there some sort of library/open source library where I could do something like the following:
>>> hll_set
>>> {'CHAQMBgCIAuCBz8QFBgPIBQyN8hxlqEBvMMBnLMBgWnD5gTB3AH+ROgD/YMEpM8Jr70C6Q2LwwfZlQ3QMNu8AYDSBKf7AbOSqgE=', 'CHAQDhgCIAuCBxwQBxgPIBQyFP3PBMBtibMR3sgC77oViasKwfMF', 'CHAQJxgCIAuCBzIQEBgPIBQyKshxlqEBvMMBzfECh6gJxJABoNwF/rEGwf0PgYYFvOoFmzjJPZwg2y3nbw==', 'CHAQBBgCIAuCBw4QAhgPIBQyBpSJAfapKA==', 'CHAQBRgCIAuCBxEQAxgPIBQyCbaJBfqsH57tBw==', 'CHAQGBgCIAuCBykQDRgPIBQyId6SAtNvwJ0XgO8Ct/EFlvUOskG1E87ZA7/OApwg2y3nbw==', 'CHAQZhgCIAuCB2MQIxgPIBQyW5SJAcqJAbzDAcvcAoIV2xSMFsTyA42IAYkl+Wvj/AHqdJxRlEGbywG/WNjoAqS9BP3CAuPrBNSFAfdDt+YEoeIBr+ICmIYF6CL/MaLNAqKdA8k9rxntBrPVrAE=', 'CHAQEBgCIAuCByQQChgPIBQyHN6SAqjtArAJ/esCj9wSg+8KiVKNygHrpgXIogU=', 'CHAQpgkYAiALggfZAhChARgPIBQyzwKPBMwRkAzxP+wPogyqC8qJAeBo8BHsSOypAbAJriL+MYYR/1jnKqIyzR3wJIkI/QXkecNH7WCzQZgMuDvxFLh+xkboA7QB12akDhu5E+4+3KgBjAZ4nxLBRMw0xRWvIPZYszt+v1gnz2a0BZoF4wzQggHqOewsJeAxgguGErUCjGG3KuhKgUyfCtItkjOMZZwCpi3phgHlA+wRknEhwiq1Os4slgmhELEWl1f1rgH+B6e4AdCtAdkE4R7fK/gihHSRFqipAbYY9BmqP5oBgqsBvhrvEKGRAcpj7XHEVaAUrY8BylLRDgWn1wGpT6IS6irPHewb/AbKHqgQjQPyAeU82zuSHpgQ04UBzwqkFIADiBD4X6ABjBihFsIy6wmovgHNKssPsQOvGcADrQOQevMQvxKMBtANizqbP7l21+kB0UDxY92rVYCBMcD5H8CiEA=='}
>>> hll_merge_method(hll_set)
>>> 193
Is it possible to do this in any way using a library outside of BQ with the hash generated from it?
That's a feature request you might already find in the issue tracker: the current hash is Google proprietary, but one day BigQuery could use an open one. Vote that request up.
https://issuetracker.google.com/issues/62153424
There might be news soon, and subscribing to the issue will keep you updated.
2019 Update: Find the open source version of BigQuery's HyperLogLog++ at:
https://github.com/google/zetasketch

SSRS Dropdown 'Any' Value not working

new to this world so looking for help with what I think wold be a simple thing to fix, however me and the guy who is training me on all that is SQL and SSRS cannot figure this out
I have a report within SSRS and SQL which is working perfectly bar one thing
I have a drop down list parameter which has all our customers names, and the report shows volumes of what that customer has obtained so far etc, and for an individual customer, this works perfectly. However, when trying to see the total volumes by choosing 'Any' from the drop down list, it returns no data, rather than returning everything
Can anyone please advise what I could be missing here, or what I need to show you to help resolve this issue
Cheers
Liam
Assuming you Stored Proc parameter is varchar and represents either customer names or an 'Any' value then the following should work.
SELECT myField1, myField2 -- etc
FROM myTable t
WHERE (t.ClientName = #myParameterName OR #myParameterName = 'Any')
Optionally Please Note: Personally I don't use SPs and usually just put the code to grab the data in the dataset. Some companies don't like you doing this but if you are able to do this I think this makes life easier.
If you can put the stored proc code directly in your dataset query then you can make the report more flexible. You can change your parameter to be MultiValue, you don't need and 'Any' value added to your parameter list either and then you can simply do something like
SELECT myField1, myField2 -- etc
FROM myTable t
WHERE t.ClientName IN(#myParameterName)
SSRS will take all the selected parameter values and inject them into the dataset query correctly, so there is nothing else you need to do. SSRS will also add a 'Select All' option to your parameter in case you want to gran data for everything. The report will work for 1, 2, 10 or all client names.

How to create a single query with multiple where columns and value are optional

for example,
select * from a where id=#id and date between #date1 and #date2
when i passed #id=1 #date1='09/29/17'and #date2='09/30/17'
then an output is correct
When I passed #id=0 (means no id found in a table) #date1='09/29/17'and #date2='09/30/17' then output comes between date and ignore id column
means I want the filters like a shopping website
The way that you would accomplish this would be to use an OR on the ID column, to leave your SQL looking like:
SELECT *
FROM a
WHERE (#id=0 OR id=#id)
AND date BETWEEN #date1 AND #date2
The thing you need to be careful and aware of using this type of approach is that depending on your DBMS you will likely end up in a scenario where the execution is optimized for whatever the first execution path took. If it's a very big table, this could end up causing problems.

Retrieving number of distinct values in Parse?

Is there any Parse equivalent to the SQL statement, SELECT DISTINCT?
I have the tables: User, Group, Tag, and Post. Each user has posts and can create tags and groups. The Tag table contains the fields:
"name" String
"post" Pointer<Post>
"creator" Pointer<_User>
I am trying to make a cloud function to return the most popular tags to my Android app for an AutoCompleteTextView. I was thinking that I should just do something like a SELECT DISTINCT name query in Parse and order by descending, but you can't do that in Parse. Is there any workaround for this?
I saw that this post did not get an answer and that this post says to work out distinct values client-side.
Thanks
It's not implemented yet, check this answer on Parse.com, which you've mentioned it too:
https://parse.com/questions/retrieving-unique-values
The answer in following post is actually means: read all rows, select unique ones :)

MySQL get rows, but for any matching get the latest version

I'm developing a CMS, and implementing versioning by adding a new entry to the database with the current timestamp.
My table is set up as follows:
id | page | section | timestamp | content
"Page" is the page being accessed, which is either the path to the page ($page_name below), or '/' (to indicate 'global' fields).
"Section" is the section of the page being edited.
I want to be able to select all rows for a given page, but each section should only be selected once, the one with the latest timestamp being selected.
I've tried using the following CodeIgniter Active Record code:
$this->db->select('DISTINCT(section), content');
$this->db->where_in('page', array('/', $page_name));
$this->db->order_by('timestamp', 'desc');
$query = $this->db->get('cms_content');
Which is producing the following SQL:
SELECT DISTINCT(section), `content`
FROM (`cms_content`)
WHERE `page` IN ('/', 'index.html')
AND `enabled` = 1
ORDER BY `timestamp` desc
Which is returning both test rows (rows have all same fields except id, timestamp and content).
Any ideas as to where I'm going wrong?
Thanks!
Your mistake is thinking that DISTINCT applies only to section - an easy mistake to make as the parentheses are misleading here. In fact the DISTINCT applies to the entire row whether or not you have parentheses. It is therefore best to omit the parentheses to avoid confusion.
Your problem is a classic 'max per group' problem. There are many, many ways to write this query and it is probably one of the most popular SQL questions on this site so you can search Stack Overflow to find ways to solve it. One way to get you started is to only select rows which hold the maximum timestamp for that section:
SELECT section, content
FROM cms_content T1
WHERE page IN ('/', 'index.html')
AND enabled = 1
AND timestamp = (
SELECT MAX(timestamp)
FROM cms_content T2
WHERE page IN ('/', 'index.html')
AND enabled = 1
AND T1.section = T2.section
)
I'm sorry but I do not know how to convert this SQL code into CodeIgniter Active Record. If another user more familiar with Active Record wishes to use this as a starting point for their own answer, they are welcome.
DISTINCT is for all columns selected, and because "content" differs you will get two different rows.
You only want to order by timestamp and limit 1 because you always want the latest.
But may I suggest that you keep a cross reference to the "active" page? That way, you are able to revert to a previous revision without dumping the new ones.
Meaning:
page
----
id
info
active_page_id
page_revisions
--------------
id
page_id
content
timestamp
...
Meaning, you have one-to-many between page <-> page_revisions, aswell as a one-to-one between page and page_revisions to keep track of the "current" revision. With this approach you are able to just join in the active revision.
This will do the job in Codeigniter, without temporary tables:
$this->db->query( "SELECT *
FROM cms_content AS c1
LEFT JOIN cms_content AS c2
ON c1.page=c2.page
AND c1.section=c2.section
AND c1.timestamp < c2.timestamp
WHERE c2.timestamp IS NULL AND page=?", $page );

Resources