Clickhouse results are showing null rows sometimes - clickhouse

I'm trying to process some IP data that we store on our clickhouse database. We have some users who have IPv6 addresses logged, and some people have multiple IP addresses logged, so what I am trying to achieve is to get only the IPv4 addresses, and if there are multiple IP addresses listed, then I choose the first one logged.
Here is the query that I made to filter them:
SELECT IF(ip LIKE '%,%', arrayElement(splitByChar(',', assumeNotNull(ip)), 1), ip) AS ip
FROM usage_analytics.users
WHERE ip NOT LIKE '%:%'
the results are not consistent. Sometimes it works fine, and gets all IPv4 addresses. However sometimes it returns null rows always at around 70 rows into the results. This happens around 4/5 times when you run this query.
What's going on? Is this a clickhouse issue, or a logic issue, or something else I'm not considering?

Related

What is Most recent discovery in cmdb_ci table

I want to understand what is the role of 'Most recent discovery' in cmdb_ci table. What are the scenarios when this field got updated.
This field is updated from integrations, either by ones you make for yourself or ones that come from ServiceNow.
The general intent of this field is to indicate the last time it was known to exist on your network. This allows you to do something such as retire a record in your CMDB for a computer, printer, server, etc after it's not been seen for a period of time.
This is what the Discovery Dashboard uses for Unrefreshed Devices (Beyond Last 30 Days) and so on.
An example from my side is that we had been populating our CMDB from Lansweeper as a custom integration, we populated the Most recent discovery field with tblAssets.LastSeen from Lansweeper. This was the last time Lansweeper saw the device on our network.
These are generally up to you to determine what you want to do with CMDB records not seen for a period of time.

DNS Resolution with 2 A records

So I'm a Windows / Network admin, have been for 2 years, but today I had a question that I didn't really know the answer to.
Say I do a nslookup, and the query retrieves 2 A records.
Which A records does say, a browser use?
If we do an nslookup for google.com, we get many responses. Is there a preferred address that windows uses? Is there any deciding factors?
If you have three A records in example.com a,b,c
The first query will retrieve a.example.com
the second b.example.com
the third c.example.com, and the next will get a.example.com again.
This is known as round-robin DNS

REST Api for Infinite scrolled query results

I'm building an internal server which contains a database of customer events. The webpage which allows access to the events is going to utilize an infinite scroll/dynamic loading scheme for display of live events as well as for browsing the results of queries to the database. So, you might query the database and maybe get 200k results. The webpage would display the 'first' 50 and allow you to scroll and scroll and scroll to see more and more results (loading perhaps 50 more at time).
I'm supposed to be using a REST api for the database access (a C# server). I'm unsure what the API should be so it remains RESTful. I've come up with 3 options. The question is, are any of them RESTful and which is most RESTful(is there such a thing -- if not I'll pick one of the RESTful).
Option 1.
GET /events?query=asdfasdf&first=1&last=50
This simply does the query and specifies the range of results to return. The server, unable to keep state, would have to requery the database each time (though perhaps utilizing the first/last hints to stop early) the infinite scroll occurs. Seems bad and there isn't any feedback about how many results are forthcoming.
Option 2 :
GET /events/?query=asdfasdf
GET /events/details?id1=asdf&id2=qwer&id3=zxcv&id4=tyui&...&id50=vbnm
This option first does a query which then returns the list of event ids but no further details. The webpage simply has the list of all the ids(at least it knows the count). The webpage holds onto the event id list and as infinite scroll/dynamic load is needed, makes another query for the event details of the specified ids. Each id is would nominally be a guid, so about 36 characters per id (plus &id##= for 41 characters). At 50 queries per hit, the URL would be quite long, 2000+ characters. The URL limit mentioned elsewhere on SO is around 2k. Maybe if I limit it to 40 ids per query this would be fine. It'd be nice to simply have a comma separated list instead of all the query parameters. Can you make a query parameter like ?ids=qwer,asdf,zxcv,wert,sdfg,rtyu,gfhj, ... ,vbnm ?
Option 3 :
POST /events/?query=asdfasdf
GET /events/results/{id}?first=1&last=50
This would post the query to the server and cause it to create a results resource. The ID of the results resource would be returned and would then be used to get blocks of the query results which in turn contain the event details needed for the webpage. The return from the POST XML could contain the number of records and other useful information besides the ID. Either the webpage would have to later delete the resource when the query page closed or the server would have to clean them up once they expire (days or weeks later).
I am concerned at Option 1, while RESTful is horrible for the server. I'm not sure requesting so many simultaneous resources, like the second GET in Option 2 is really RESTful or practical(seems like there has to be a better way). I'm not sure Option 3 is RESTful at all or if it is, its sort of cheating the REST thing by creating state via a POST(or should that be PUT).
Option 3 worked out fine. It required the server to maintain the query results and there was a bit of debate about how many queries (from various users) should simultaneously be saved as there would be no way to know when a user was actually done with a query.

Report generated that shows the IP addresses of customers/visitors

Is it possible to detect a visitor's IP address and store it on the magento administrator so that the admin can be able to view who visited their store
This is already possible. Take a look at Customers->Online Customers. This log is cleared after 15 minutes by default but you can increase the value from System->Configuration->Customer Configuration->Online Customers Options. Don't make the value too big because it may affect performance.
EDIT (correction)
All the access to you website is stored in the table log_visitor_info including the IP address. This table is not cleaned up after X minutes.

Oracle: performance about filtering results from remote view

I have a remote database A which has a view v_myview. I am working on a local database, which has a dblink to access v_myview on databse A. If I query the view like this :
select * from v_myview # dblink ;
it returns half million rows. I just want to get some specific rows from the view,e.g., to get rows with id=123, my query is
select * from v_myview # dblink where id=123;
This works as expected. Here comes my question, when I run this query, will remote database generates the half million rows first then from there to find rows with id=123? or the remote view applies my filter first then query the DB without retrieving the half million rows first? how do I know that. Thank you!
Oracle is free to do either. You'd need to look at the query plan to see whether the filtering is being done locally or remotely.
Presumably, in a case as simple as the one you present, the optimizer would expect it to be more efficient to send the filter to the remote server rather than pulling half a million rows over the network only to filter them locally. That calculation may be different if the optimizer expects the unfiltered query to return a single row rather than half a million rows and it may be different if the query gets more complicated doing something like joining to a local table or calling a function on the local server.

Resources