Is Log a compressed table engine in Clickhouse - clickhouse

I have a Log table and also a MergeTree table. In the system.columns table, it has a column, data_compressed_bytes, showing bytes compressed for each column for each table. I can see that the MergeTree table showing values under the column but for the Log table, the column shows all zeros.
Log
┌─database─┬─table──┬─name───────────┬─type─────┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─marks_bytes─┐
│ default │ logs │ log_time │ DateTime │ 0 │ 0 │ 0 │
│ default │ logs │ start_time │ DateTime │ 0 │ 0 │ 0 │
MergeTree
┌─database─┬─table─────┬─name────────┬─type─────┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─marks_bytes─┐
│ default │ logs_m │ log_date │ Date │ 1221802 │ 20000000 │ 19536 │
│ default │ logs_m │ log_time │ DateTime │ 25181624 │ 40000000 │ 19536 │
So, I am wondering if it means that columns in engine type Log are actually compressed or not.
ClickHouse documentation states that TinyLog is compressed but not sure about Log and I don't see that in the system.columns table.

Log engine compresses column data as well as TinyLog.
Quotes from the doc:
TinyLog The simplest table engine, which stores data on a disk. Each
column is stored in a separate compressed file.
Log differs from TinyLog in that a small file of “marks” resides with
the column files.
The information about compressed and decompressed sizes of a column is not reflected into system.columns table because Log is a quite simple engine (unlike MergeTree) and doesn't store a lot of metainformation about own column files (it only maintains sizes.json file with compressed column sizes).
So, it is possible to set system.columns.data_compressed_bytes for Log's columns, but at the same time system.columns.data_uncompressed_bytes will be zero and it may look questionable.

Related

Export Variable in Linux Terminal with Multiple Strings Terraform

I have this variable in Terraform called "example". The variable is a type of "list(strings)" and I need to export that variable in my terminal, when running Terraform plan (i cannot put the value in code, so I have to export it):
export VAR_example=""xyz", "abc", "123""
but when I run this^, it gives me error:
│ Error: Invalid number literal
│
│ on <value for var.example> line 1:
│ (source code not available)
│
│ Failed to recognize the value of this number literal.
╵
╷
│ Error: Extra characters after expression
│
│ on <value for var.example> line 1:
│ (source code not available)
│
│ An expression was successfully parsed, but extra characters were found after it.
Then, I tried exporting it like this:
export VAR_example='"xyz", "abc", "123"'
Then, I got the error:
╷
│ Error: Extra characters after expression
│
│ on <value for var.example> line 1:
│ (source code not available)
│
│ An expression was successfully parsed, but extra characters were found after it.
Now, I am not sure how to export it at all. Any solution would be helpful.
Since it is a list of strings, you could provide it this way as well [1]:
terraform apply -var='examples=["xyz", "abc", "123"]'
However, since an environment variable needs to be used you could set it like:
export TF_VAR_examples='["xyz", "abc", "123"]'
Note that you need to prefix any environment variables with TF_ in order for Terraform to pick them up [2].
[1] https://www.terraform.io/language/values/variables#variables-on-the-command-line
[2] https://www.terraform.io/cli/config/environment-variables#tf_var_name

Golang cannot write file into directory

Situation:
I'm trying to write a file into a directory, like shown as follows:
func (p *Page) Save() error {
filepath := DerivePath(p.Title)
fmt.Println(filepath)
content, _ := json.MarshalIndent(p, "", " ")
err := ioutil.WriteFile(filepath, content, 0600)
return err
}
Problem:
The following error occurs in line 5:
open data/Testpage.json: The system cannot find the path specified.
I already tried to create the file before writing with os.Create, but it doesn't work either.
Loading from the data directory works just fine. Only writing new files into the directory fails.
Additional information:
My project structure is as follows:
│ .gitignore
│ .project
│
├───bin
│ main.exe
│
├───data
│ Welcome.json
│
├───pkg
│ └───windows_amd64
│ page.a
│
├───src
│ ├───main
│ │ main.go
│ │
│ └───page
│ page.go
│ page_test.go
│
└───templates
view.html
As mentioned above, reading data/Welcome.json works just fine (by using io/ioutils.ReadFile).
The source is available on https://gitlab.com/thyaris/Wiki.
Executing D:\GitWorkspaces\Wiki\wiki>go test -v page writes the following output:
=== RUN TestSave
data/Testpage.json
--- FAIL: TestSave (0.00s)
page_test.go:15: open data/Testpage.json: The system cannot find the path specified.
page_test.go:19: 'Testpage.json' was not created
=== RUN TestLoadPage
--- FAIL: TestLoadPage (0.00s)
page_test.go:26: Error while loading
page_test.go:32: File content did not match
=== RUN TestDelete
--- PASS: TestDelete (0.00s)
FAIL
exit status 1
FAIL page 0.094s
Your problem here is that the test engine is not running your executable with the working directory you expect. Instead of using the working directory defined by your shell or IDE, it is setting it to the source directory of the code being tested. (I had this bite me too once, long ago :) I had almost forgotten about that...)
The simple solution is to change DerivePath so that you can set the prefix externally, then change it to a the path you need at the beginning of your tests. There are other (possibly better?) solutions of course.

Is there a guide to using USBccgp.sys?

After days of combing through the Windows USBccgp documentation, I can't figure out for the life of me how to write the appropriate .INF file for my device.
This diagram shows what I'm trying to configure. This is a simplified version of what's at https://msdn.microsoft.com/en-us/library/windows/hardware/ff540207.aspx
┌─────────────┐ ┌─────────────┐
│ Application │ │ Application │
├─────────────┤ ├─────────────┤
│ WinUSB.dll │ │ WinUSB.dll │
└─────┬───────┘ └─────┬───────┘
User │ │
───────┼──────────────────┼─────────
Kernel │ │
┌─────┴───────┐ ┌─────┴───────┐
│ WinUSB.sys │ │ WinUSB.sys │
└─────┬───────┘ └─────┬───────┘
┌─────┴───────┐ ┌─────┴───────┐
│ interface 0 │ │ interface 1 │
└─────┬───────┘ └─────┬───────┘
└─────────┬────────┘
┌──────┴──────┐
│ USBccgp.sys │
└──────┬──────┘
┌─────┴─────┐
│ My Device │
└───────────┘
Basically, given the vendor and product id of my device, and the desired configuration (which unfortunately is not Configuration 1), and the interface numbers, and the endpoint numbers (which are all bulk transfer endpoints), it ought to be trivial to write the INF files for USBccgp.sys and WinUSB.sys to make all of this work so that I don't have to write a byte of driver code.
Are there any recipes for doing this, and/or some sample INF files? Google and Microsoft's own documentation have failed me here.

How do I get schema / column names from parquet file?

I have a file stored in HDFS as part-m-00000.gz.parquet
I've tried to run hdfs dfs -text dir/part-m-00000.gz.parquet but it's compressed, so I ran gunzip part-m-00000.gz.parquet but it doesn't uncompress the file since it doesn't recognise the .parquet extension.
How do I get the schema / column names for this file?
You won't be able "open" the file using a hdfs dfs -text because its not a text file. Parquet files are written to disk very differently compared to text files.
And for the same matter, the Parquet project provides parquet-tools to do tasks like which you are trying to do. Open and see the schema, data, metadata etc.
Check out the parquet-tool project
parquet-tools
Also Cloudera which support and contributes heavily to Parquet, also has a nice page with examples on usage of parquet-tools. A example from that page for your use case is
parquet-tools schema part-m-00000.parquet
Checkout the Cloudera page. Using the Parquet File Format with Impala, Hive, Pig, HBase, and MapReduce
Parquet CLI:
parquet-cli is a light weight alternative to parquet-tools.
pip install parquet-cli //installs via pip
parq filename.parquet //view meta data
parq filename.parquet --schema //view the schema
parq filename.parquet --head 10 //view top n rows
This tool will provide basic info about the parquet file.
UPDATE (Alternatives):
If you wish to do this using a GUI tool then checkout this answer - View Parquet data and metadata using DBeaver
DuckDB CLI
DuckDB has CLI tool (prebuilt binaries for linux, windows, macOS) that can be used to query parquet data from command line.
PS C:\Users\nsuser\dev\standalone_executable_binaries> ./duckdb
Connected to a transient in-memory database.
Read Parquet Schema.
D DESCRIBE SELECT * FROM READ_PARQUET('C:\Users\nsuser\dev\sample_files\userdata1.parquet');
OR
D SELECT * FROM PARQUET_SCHEMA('C:\Users\nsuser\dev\sample_files\userdata1.parquet');
┌───────────────────┬─────────────┬──────┬─────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
├───────────────────┼─────────────┼──────┼─────┼─────────┼───────┤
│ registration_dttm │ TIMESTAMP │ YES │ │ │ │
│ id │ INTEGER │ YES │ │ │ │
│ first_name │ VARCHAR │ YES │ │ │ │
│ salary │ DOUBLE │ YES │ │ │ │
└───────────────────┴─────────────┴──────┴─────┴─────────┴───────┘
more on DuckDB described here.
If your Parquet files are located in HDFS or S3 like me, you can try something like the following:
HDFS
parquet-tools schema hdfs://<YOUR_NAME_NODE_IP>:8020/<YOUR_FILE_PATH>/<YOUR_FILE>.parquet
S3
parquet-tools schema s3://<YOUR_BUCKET_PATH>/<YOUR_FILE>.parquet
Hope it helps.
If you use Docker you can also run parquet-tools in a container:
docker run -ti -v C:\file.parquet:/tmp/file.parquet nathanhowell/parquet-tools schema /tmp/file.parquet
Apache Arrow makes it easy to get the Parquet metadata with a lot of different languages including C, C++, Rust, Go, Java, JavaScript, etc.
Here's how to get the schema with PyArrow (the Python Apache Arrow API):
import pyarrow.parquet as pq
table = pq.read_table(path)
table.schema # pa.schema([pa.field("movie", "string", False), pa.field("release_year", "int64", True)])
See here for more details about how to read metadata information from Parquet files with PyArrow.
You can also grab the schema of a Parquet file with Spark.
val df = spark.read.parquet('some_dir/')
df.schema // returns a StructType
StructType objects look like this:
StructType(
StructField(number,IntegerType,true),
StructField(word,StringType,true)
)
From the StructType object, you can infer the column name, data type, and nullable property that's in the Parquet metadata. The Spark approach isn't as clean as the Arrow approach.
Maybe it's capable to use a desktop application to view Parquet and also other binary format data like ORC and AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewer for details.
It supports complex data type like array, map, etc.
If you are using R, the following wrapper function on functions existed in arrow library will work for you:
read_parquet_schema <- function (file, col_select = NULL, as_data_frame = TRUE, props = ParquetArrowReaderProperties$create(),
...)
{
require(arrow)
reader <- ParquetFileReader$create(file, props = props, ...)
schema <- reader$GetSchema()
names <- names(schema)
return(names)
}
Example:
arrow::write_parquet(iris,"iris.parquet")
read_parquet_schema("iris.parquet")
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
Since it is not a text file, you cannot do a "-text" on it.
You can read it easily through Hive even if you do not have the parquet-tools installed, if you can load that file to a Hive table.

Test output using curses

I'm trying to make a test using rspec for my simple othello app using curses. I need to make sure that the terminal prints an output as below:
To make the output more interactive, I use curses. When I check the output, it gives me a combination of ANSI sequence code (I guess):
[?1049h[1;23r(B[m[4l[?7h[?1h=[39;49m[?1h=[39;49m(B[m[H[2J[4;43H(B[0;1m[32m
┌───┬───┬───┬───┬───┬───┬───┬───┐[5;43H
│ │ │ │ │ │ │ │ │[6;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[7;43H
│ │ │ │ │ │ │ │ │[8;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[9;43H
│ │ │ │ │ │ │ │ │[10;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[11;43H
│ │ │ │ │ │ │ │ │[12;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[13;43H
│ │ │ │ │ │ │ │ │[14;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[15;43H
│ │ │ │ │ │ │ │ │[16;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[17;43H
│ │ │ │ │ │ │ │ │[18;43H
├───┼───┼───┼───┼───┼───┼───┼───┤[19;43H
│ │ │ │ │ │ │ │ │[20;43H
└───┴───┴───┴───┴───┴───┴───┴───┘[39m(B[m[20;43H[?[23;1H[?1049l[?1l>
How should I test if the result is like this?
You can redirect the output of a curses application to the standard output because that is the default for initscr (it helps if the application does not ask for input). If you capture the output into a file, you can do whatever analysis you need.
For the simple example given, curses is painting the screen once, using ANSI escape sequences for cursor movement (the parts ending with "H"), color (the ones with "m"), and a few others.
I said simple - if there were large blank areas on the screen, then curses would jump over those, and the residue without the escape sequences would look less like the actual screen. But since your lines are close together, it is less costly to just write the text with a few blanks as needed.
If you limit yourself to simple examples like that, then all you need to do is to strip the escape sequences out and compare the text. For a more general approach, you might consider making a screen-dump utility using winnstr (so that your application could dump the text from the screen as curses shows it).

Resources