I am trying to convert a yaml file into a toml file in python3.
My plan is to use toml.dumps() which expects a dictionary, then write to an output file.
I need to be able to match the input toml requirements for a tool that I need to plug into.
This tool expects inline tables at certain instances like this:
[states.Started]
outputs = { on = true, running = false }
[[states.Started.transitions]]
inputs = { go = true }
target = "Running"
I understand how to generate the tables [] and array of tables [[]] but I am having a hard time figuring out how to create the inline tables.
TOML documentation says that inline tables are the same as tables. So for example, with the array of tables above (states.Started.Transitions) I figured inputs would be a table within the overall list, however the TOML format will break it into separate tables at the output.
Can anyone help me figure out how to configure my dictionary to output the inline table?
EDIT***
I am not sure I fully understand what I am doing wrong. Here is my code
table = {'a':5,'b':3}
inline_table = toml.TomlDecoder().get_empty_inline_table()
inline_table['Values'] = table
encoder = toml.TomlPreserveInlineDictEncoder()
toml_config = toml.dumps(inline_table,encoder = encoder)
however this does not create an inline table, but a regular table in the output.
[Values]
a = 5
b = 3
This tool expects inline tables at certain instances like this
Then it is not a TOML tool. As you discovered yourself, inline tables are semantically equivalent to normal tables, so if the tool conforms to the TOML specification, it must not care whether the tables are inline or not. If it does, this is not a TOML question.
That being said, you can create the dicts that should become inline tables via
TomlDecoder().get_empty_inline_table()
The returned object is a dict subclass and can be used as such.
When you finished creating your structure, use TomlPreserveInlineDictEncoder for dumping and don't ask why this is suddenly called InlineDict instead of InlineTable.
Related
I used parquet of pyarrow ro read the meta data of parquet by this code:
from pyarrow import parquet
p_file = parquet.ParquetFile("v-c000.gz.parquet")
for rg_idx in range(p_file.metadata.num_row_groups):
rg = p_file.metadata.row_group(rg_idx)
for col_idx in range(rg.num_columns):
col = rg.column(col_idx)
print(col)
and got in the output: has_dictionary_page: False (for all the row group)
but according to my checks all the column chanks in all of row group are PLAIN_DICTIONARY encoded. furthermore I checked statistics about the dictionary and saw all the key and value over it. attaching part of it:
How is that possible that there is no dictionary page?
My best guess is that you are running into PARQUET-1547 which is described a bit more in this question.
In summary, some parquet readers did not write the dictionary_page_offset field correctly. Those parquet readers have workarounds in place to recognize the invalid write. However, parquet-cpp (which is used by pyarrow) does not have such a workaround in place.
I am using a software, pc/mrp, which appears to have a built-in Visual Fox Pro editor for FRX files. It also has an external usage of an ef file. Based on some usage of Google, the report designer seems standard, not custom. The ef file usage may be a custom thing. Now, I need to find a way to get access to a value from a SQL statement inside the report. The statement needs to run per-line in the report.
EF:
This file has sections:
~in~
~out~
In these sections, I can run code, but if there is a ~perline~ type section, I don't know how to access it. I can use the ~in~ to try to create a relationship between the databases, as shown in the following example:
~IN~
THISAREA = SELECT()
USE PARTMAST ORDER BYPARTNO IN 0
SELECT (THISAREA)
SET RELATION TO PARTNO INTO PARTMAST ADDITIVE
GO TOP
~OUT~
USE IN SELECT("SALES")
But, for this I don't know how to join the databases. I have two databases (A,B) I need to connect them based on two fields (pono,line). If (A.pono and a.line) = (B.pono and B.line) then they would be linked. Is this possible?
Report Designer:
The other way I see this working is to do the query inside the report designer. Inside report properties is a variable tab. I can use this to assign to variables using expressions. I need:
SELECT field from B where B.pono = pono and B.line = line; INTO ARRAY varArray;
But, it gives me an error, likely because this is trying to create a new variable as opposed to actually assigning to the variable in the report. I tried editing a field inside the designer to use the preceeding code as well, but that also failed.
Is there a way using the report designer or the ef file to grab the data I need per line?
The sample code you show is doing something like a join with the SET RELATION command. To use SET RELATION, there has to be an index on the relevant field (expression) in the child table. So, if your table B has an index on PONO + LINE (or, if those are numeric, STR(PONO, length) + STR(LINE, length)), you can SET RELATION TO PONO + LINE INTO B, again, using the more complicated expression if necessary.
I am trying to combine many tables that has a name that matches a patterns.
So far, I have extracted the table names from #shared and have the table names in a list.
What I haven't being able to do is to loop this list and transform in a table list that can be combined.
e.g. Name is the list with the table names:
Source = Table.Combine( { List.Transform(Name, each #shared[_] )} )
The error is:
Expression.Error: We cannot convert a value of type List to type Text.
Details:
Value=[List]
Type=[Type]
I have tried many ways but I am missing some kind of type transformation.
I was able to transform this list of tables names to a list of tables with:
T1 = List.Transform(Name, each Expression.Evaluate(_, #shared))
However, the Expression.Evaluate feels like an ugly hack. Is there a better way for this transformation?
With this list of tables, I tried to combine them with:
Source = Table.Combine(T1)
But I got the error:
Expression.Error: A cyclic reference was encountered during evaluation.
If I extract the table from the list with the index (e.g T1{2}) it works. So in this line of thinking, I would need some kind o loop to append.
Steps illustrating the problem.
The objective is to append (Tables.Combine) every table named T_\d+_Mov:
After filtering the matching table names in a table:
Converted to a List:
Converted the names in the list to the real tables:
Now I just need to combine them, and this is where I am stuck.
It is important to not that I don't want to use VBA for this.
It is easier to recreate the query from VBA scanning the ThisWorkbook.Queries() but it would not be a clean reload when adding removing tables.
The final solution as suggested by #Michal Palko was:
CT1 = Table.FromList(T1, Splitter.SplitByNothing(), {"Name"}, null, ExtraValues.Ignore),
EC1 = Table.ExpandTableColumn(CT1, "Name", Table.ColumnNames(CT1{0}[Name]) )
where T1 was the previous step.
The only caveat is that the first table must have all columns or they will be skiped.
I think there might be easier way but given your approach try to convert your list to table (column) and then expand that column:
Alternatively use Table.Combine(YourList)
I have two different pipe-delimited data files. One is larger than the other. I'm trying to selectively remove data from the large file (we'll call it file A), based on the data contained in the small file (file B). File A contains all of the data, and file B contains only a portion of the data from file A.
I want a function or existing program that removes all of the data contained within file B from file A. I had in mind a function like this:
Pseudo-code:
while !eof(fileB) {
criteria = readLine(fileB);
lineToRemove = searchForLine(criteria, fileA);
deleteLine(lineToRemove, fileA);
}
However, that solution seems very inefficient to me. File A has 23,000 lines in it, and file B has 17,000. And the data contained within file B is literally scattered throughout file A.
If there is a program that can do this, I'd prefer it over code. I'm not picky about the code either. C++ is my strong language, but this data file is going to get converted into a SQL database in the near future so I'm good with SQL/PHP code as well.
Load the two tables into SQL, whatever the database. Doing this sort of manipulation is what databases are designed for. Then you can execute the command:
delete from A
where A.criteria = (select B.criteria from B)
However, I would put the data into Staging tables, and then create and populate the data that I want in SQL. Something like:
create table A ( . . . )
insert into A
select *
from StagingA
where A.criteria not in (select B.criteria from StagingB)
(Here I've used "*" and an insert without a column list. In practice, you should have the list of columns.)
I am trying to extract some part of string and store it to hbase in columns.
Files Content :
msgType1 Person xyz has opened Internet:www.google.com from IP:192.123.123.123 for duration 00:15:00
msgType2 Person xyz denied for opening Internet:202.x.x.x from IP:192.123.123.123 reason:unautheticated
msgType1 Person xyz has opened Internet:202.x.x.x from IP:192.123.123.123 for duration 00:15:00
pattern of messages corresponding to msgType is fixed. Now i am trying to store person name, destination , source , duration etc in hbase.
I am trying to to wrtie script in PIG to do this task.
But i am stuck at extracting part.(extracting IP or website name from 'Internet:202.x.x.x' token inside string).
I tried Regular expression but its not working for me. Regex alway throw this error :
ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.REGEX_EXTRACT as multiple or none of them fit. Please use an explicit cast.
is there any other way to extract these value and store it to hbase in PIG or other than PIG?
How do you use the REGEX_EXTRACT function ? Have you seen the REGEX_EXTRACT_ALL function ? According to the documentation (http://pig.apache.org/docs/r0.9.2/func.html#regex-extract-all), it should be like this :
test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS (key:chararray, value:chararray);
test = FOREACH test GENERATE FLATTEN(REGEX_EXTRACT_ALL (value, '(\\S+):(\\S+)')) as (match1:chararray, match2:chararray);
DUMP test;
My file is like that :
1,a:b
2,c:d
3,
I know it's easy to be lazy and not take the step, but you really should use a user-defined function here. Pig is good as a data flow language and not much else, so in order to get the full power out of it, you are going to need to use a lot of UDFs to go through text and do more complicated operations.
The UDF will take a single string as a parameter, then return a tuple that represents (person, destination, source, duration). To use it, you'll do:
A = LOAD ...
...
B = FOREACH A GENERATE MyParseUDF(logline);
...
STORE B INTO ...
You didn't mention what your HBase row key was, but be sure that's the first element in the relation before storing it.