Get raw Low Cardinality values in Clickhouse - clickhouse

Is there a way to retrieve the underlying values of LowCardinality types in Clickhouse? I would also need to retrieve a mapping (in a separate query) of the underlying values to the logical values. I've tried using lowCardinalityIndices and lowCardinalityKeys but it appears that indices -> keys returned by those functions are a many to many mapping.
Thank you!

Your question does not make sense.
Column with LowCardinality does not have a single dictionary. Each part has multiple dictionaries for a single LowCardinality column. That's why your observe this lowCardinalityIndices/lowCardinalityKeys behaviour.

Related

How to handle the situation of saving to Elasticsearch logs whose structure are very diverse?

My log POCO has several fixed properties, like user id, timestamp, with a flexible data bag property, which is a JSON representation of any kind of extra information I'd like to add to the log. This means the property names could be anything within this data bag, bringing me 2 questions:
How can I configure the mapping so that the data bag property, which is of type string, would be mapped to a JSON object during the indexing, instead of being treated as a normal string?
With the data bag object having arbitrary property names, meaning the overall document type could have a huge number of properties inside, would this hurt the search performance?
For the data translation from string to JSON you can use ingest pipeline with JSON processor:
https://www.elastic.co/guide/en/elasticsearch/reference/master/json-processor.html
It depends of you queries. If you'll use the "free text search" - yes, the huge number of fields will slow the query. If you you'll use query like "field":"value" - no, there is no problem with the fields number in the searches. Additional information about query optimization you cold find here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/tune-for-search-speed.html#search-as-few-fields-as-possible
And the question is: what you meen, when say "huge number"? 1000? 10000? 100000? As part of optimization i recommend to use dynamic templates with the definition: each string field automatically ingest into the index as "keyword" and not text + keyword. This setting decrease the number of fields to half.

Record sorting in RethinkDB when using between()

If I filter records in a query using RethinkDB's between() operator on the primary key/index, do I need to explicitly sort the resulting record set by primary key, or is the resulting record set guaranteed to be sorted?
You need to explicitly order it. (This is because the data is sharded, so which means it's faster to return it in an unspecified order, so that's the default if you don't request an ordering.)

Is there a way to efficiently find all unique values for a field in MongoDB?

Consider a collection of Users:
{ name: 'Jeff' }
{ name: 'Joel' }
Is there a way to efficiently get all the unique values for name?
User.pluck(:name).uniq
To return
[ 'Jeff', 'Joel' ]
I think this would get the whole collection, so it would be inefficient.
However, if there is an index on name, is there a way to get all the unique values without getting all the documents?
Or is there another way to efficiently get the unique names?
As indicated in the comments, you can efficiently get the unique values of a field over all docs in a collection using distinct.
The documentation specifically mentions that indexes are used when possible, and that they can cover the distinct query. This means that only the supporting index needs to be loaded into memory to get the results.
When possible, db.collection.distinct() operations can use indexes.
Indexes can also cover db.collection.distinct() operations. See
Covered Query for more information on queries covered by indexes.
In Ruby, you would perform your distinct query as:
User.distinct(:name)

Lucene equivalent of SQL Server's ORDER BY [duplicate]

I got my lucene index with a field that needs to be sorted on.
I have my query and I can make my Sort object.
If I understand right from the javadoc I should be able to do query.SetSort(). But there seems to be no such method...
Sure I'm missing something vital.
Any suggestions?
There are actually two important points. First, the field must be indexed. Second, pass the Sort object into the overloaded search method.
Last time I looked, the docs didn't do a very good job of pointing out the indexing part, and certainly didn't explain why this is so. It took some digging to find out why.
When a field is sortable, the searcher creates an array with one element for each document in the index. It uses information from the term index to populate this array so that it can perform sorting very quickly. If you have a lot of documents, it can use a lot of memory, so don't make a field sortable unless there is a need.
One more caveat: a sortable field must have no more than one value stored in each field. If there are multiple values, Lucene doesn't know which to use as the sort key.
It looks like the actual method you want is e.g. Searcher.search(Query query, Filter filter, int n, Sort sort). setSort is a method of Sort.

Having a string array field as an index, good or bad?

My gut feel is that setting a string (with array elements) field as an index on a table will be bad for performance (where the bulk of the operations done on a table are inserts and updates - the table holds transactional data and its current size is approximately 20 mil records).
The string extends a type with 4 array elements, where all of them aren’t always populated. I need to justify why not to set this field as one of the indexes. I’ve tried searching for answers, reading Kimberley Tripps blog, going through best practises re indexes on MSDN (which only mentions indexes are best on numerics first, then string fields), etc. But none of these mention indexing the table on a field that is of an array type. What reasons can I give to justify not indexing on the string-array field. And if my gut feel is totally wrong and indexes work well on array fields, why so?
A Memo or Container field cannot be part of an index in AX.
Furthermore, columns consisting of the ntext, text, or image data types cannot be specified as columns for an index in SQL Server.
Let's say you have an extended data type ArrElement with 3 additional array elements ArrElement2, ArrElement3, ArrElement4. Creating an index with a field of the ArrElement type in AX will effectively create an index with 4 fields (ArrElement, ArrElement2, ArrElement3, and ArrElement4 - in that order) in SQL Server. You cannot change the order of the array elements in the index, but in my opinion there's really nothing wrong in having such an index if it really serves your purpose. Hope that answers your question.
As #10p noted adding say Dimension as the only field, will create an index of all the array elements: Dimension, Dimension2_, Dimension3_ (which are the names of the SQL table fields).
The value of such an index will depend on the queries performed. If only Dimension[3] is queried, then the index is of no value because Dimension[1] and Dimension[2] is not known.
This could be solved by creating an index for each of the array elements, for example:
Dim1Idx: Dimension[1] (maybe append more fields)
Dim2Idx: Dimension[2] (maybe append more fields)
Dim3Idx: Dimension[3] (maybe append more fields)
Individual array elements can be selected by using the combo-box on the index field.
The value of such indexes should be weighted against the added cost of insertion (and update, if the array values are changed).

Resources