I am trying to pass a .csv file that came to me via FTPs, it comes in a binary format (I think) and I want to pass it to some format where I can edit the columns and rows.
I use Anypoint Studio version: 7.12.1, FTPS connector 1.7.1, Runtime version 4.4.0 EE
read XML:
<ftps:read doc:name="Read" doc:id="34e4139b-606e-4cd0-8650-4e3b2bab01d7" config-ref="FTPS_Config_OneHub" path="test_Dec.csv"/>
I'm doing it with a transform message:
<ee:transform doc:name="Transform Message" doc:id="165835a6-da27-442a-82bd-a3a28552611f" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/csv
---
read(payload, "application/string")]]></ee:set-payload>
</ee:message>
</ee:transform>
But I get the following error:
""You called the function 'AnonymousFunction' with these arguments:
1: Array ([{"PK\u0003\u0004\u0014\u0000\b\b\b\u0000": "q",column_1: "^",column_2: ""}, ...)
2: String ("application/string")
But it expects arguments of these types:
1: String | Binary
2: String
3: Object
4| read(payload, "application/string")
^^^^
Trace:
at anonymous::main (line: 4, column: 1)" evaluating expression: "%dw 2.0
output application/csv
---
read(payload, "application/string")"."
The payload with the file looks like this:
PK (\!T [Content_Types].xmlµSËnÂ0ü•È×*6ôPUEƒMCÇ©ô\{“Xø%¯¡ð÷]8”R‰
ƒM†õ9Ç!PEƒMEƒMõà$òEƒMÁEƒEƒMMÒ†äd¦cêD”j);·£ÑPÁgð¹ÎEƒMEƒM'OÐÊ•ÍÕEƒMãî¾H7LÆh’™R‰µ×G¢EƒMEƒMõ^'EƒMEƒM°
Thank you very much!
The input file is not a CSV. It is zip file. You should first decompress it and then see if it has a CSV file inside. You can use the Compression Module in Mule to decompress it.
Related
I'm trying to inspect a CSV file and there are no findings being returned (I'm using the EMAIL_ADDRESS info type and the addresses I'm using are coming up with positive hits here: https://cloud.google.com/dlp/demo/#!/). I'm sending the CSV file into inspect_content with a byte_item as follows:
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
In looking at the supported file types, it looks like CSV/TSV files are inspected via Structured Parsing.
For CSV/TSV does that mean one can't just sent in the file, and needs to use the table attribute instead of byte_item as per https://cloud.google.com/dlp/docs/inspecting-structured-text?
What about for XSLX files for example? They're an unspecified file type so I tried with a configuration like so, but it still returned no findings:
byte_item: {
type: :BYTES_TYPE_UNSPECIFIED,
data: File.open('/xxxxx/dlptest.xlsx', 'rb').read
}
I'm able to do inspection and redaction with images and text fine, but having a bit of a problem with other file types. Any ideas/suggestions welcome! Thanks!
Edit: The contents of the CSV in question:
$ cat ~/Downloads/dlptest.csv
dylans#gmail.com,anotehu,steve#example.com
blah blah,anoteuh,
aonteuh,
$ file ~/Downloads/dlptest.csv
~/Downloads/dlptest.csv: ASCII text, with CRLF line terminators
The full request:
parent = "projects/xxxxxxxx/global"
inspect_config = {
info_types: [{name: "EMAIL_ADDRESS"}],
min_likelihood: :POSSIBLE,
limits: { max_findings_per_request: 0 },
include_quote: true
}
request = {
parent: parent,
inspect_config: inspect_config,
item: {
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
}
}
dlp = Google::Cloud::Dlp.dlp_service
response = dlp.inspect_content(request)
The CSV file I was testing with was something I created using Google Sheets and exported as a CSV, however, the file showed locally as a "text/plain; charset=us-ascii". I downloaded a CSV off the internet and it had a mime of "text/csv; charset=utf-8". This is the one that worked. So it looks like my issue was specifically due the file being an incorrect mime type.
xlsx is not yet supported. Coming soon. (Maybe that part of the question should be split out from the CSV debugging issue.)
I am trying to deserialize a file saved as a protobuf through the CLI (seems like the easiest thing to do). I would prefer not to use protoc to compile, import it into a programming language and then read the result.
My use case: A TensorFlow lite tool has output some data in a protobuf format. I've found the protobuf message definition in the TensorFlow repo too. I just want to read the output quickly. Specifically, I am getting back a tflite::evaluation::EvaluationStageMetrics message from the inference_diff tool.
I assume that the tool outputs a protobuf message in binary format.
protoc can decode the message and output in text format. See this option:
--decode=MESSAGE_TYPE Read a binary message of the given type from
standard input and write it in text format
to standard output. The message type must
be defined in PROTO_FILES or their imports.
While Timo Stamms answer was instrumental, I still struggled with the paths to get protoc to work in a complex repo (e.g. TensorFlow).
In the end, this worked for me:
cat inference_diff.txt | \
protoc --proto_path="/Users/ben/butter/repos/tensorflow/" \
--decode tflite.evaluation.EvaluationStageMetrics \
$(pwd)/evaluation_config.proto
Here I pipe the binary contents of the file containing protobuf (inference_diff.txt in my case, generated by following this guide), and specify the fully qualified protobuf message (which I got by combining the package tflite.evaluation; and the message name, EvaluationStageMetrics), the absolute path of the project for the proto_path (which is the project root/ TensorFlow repo), and also the absolute path for the file which actually contains the message. proto_path is just used for resolving imports, where as the PROTO_FILE (in this case, evaluation_config.proto), is used to decode the file.
Example Output
num_runs: 50
process_metrics {
inference_profiler_metrics {
reference_latency {
last_us: 455818
max_us: 577312
min_us: 453121
sum_us: 72573828
avg_us: 483825.52
std_deviation_us: 37940
}
test_latency {
last_us: 59503
max_us: 66746
min_us: 57828
sum_us: 8992747
avg_us: 59951.646666666667
std_deviation_us: 1284
}
output_errors {
max_value: 122.371696
min_value: 83.0335922
avg_value: 100.17548828125
std_deviation: 8.16124535
}
}
}
If you just want to get the numbers in a rush and can't be bothered to fix the paths, you can do
cat inference_diff.txt | protoc --decode_raw
Example output
1: 50
2 {
5 {
1 {
1: 455818
2: 577312
3: 453121
4: 72573828
5: 0x411d87c6147ae148
6: 37940
}
2 {
1: 59503
2: 66746
3: 57828
4: 8992747
5: 0x40ed45f4b17e4b18
6: 1284
}
3 {
1: 0x42f4be4f
2: 0x42a61133
3: 0x40590b3b33333333
4: 0x41029476
}
}
}
While trying to pass simple configuration in build.gradle.ktl file for running jmeter gradle. plugin (https://github.com/jmeter-gradle-plugin/jmeter-gradle-plugin/wiki/Getting-Started)
I am trying to pass simple configuration and my build.gradle.kts file looks like below
jmeter {
jmTestFiles = [file("src/test/jmeter/test2.jmx")] //if jmx file is not
in the default location
jmSystemPropertiesFiles= [file("src/test/jmeter/jmeter.properties")]
//to add additional system properties
enableExtendedReports = true //produce Graphical and CSV reports
}
I am encountering following error
Script compilation errors:
Line 7: jmTestFiles = [file("src/test/jmeter/test2.jmx")]
^ Type mismatch: inferred type is Array but (Mutable)List! was expected
Line 7: jmTestFiles = [file("src/test/jmeter/test2.jmx")]
^ Unsupported [Collection literals outside of annotations]
Line 8: jmSystemPropertiesFiles=
[file("src/test/jmeter/jmeter.properties")]
^ Type mismatch: inferred type is Array but (Mutable)List! was expected
Line 8: jmSystemPropertiesFiles=
[file("src/test/jmeter/jmeter.properties")]
^ Unsupported [Collection literals outside of annotations]
4 errors
I figured out the syntax had to be adopted to kotlin format below
jmSystemPropertiesFiles = mutableListOf(file("src/test/jmeter/jmeter.properties"))
I am using Telegraf to get logs information from Apache NiFi, for this task I am using this config:
[[inputs.tail]]
## files to tail.
files = ["/var/log/nifi/nifi-app.log"]
## Read file from beginning.
from_beginning = true
#name_override = "nifi_app"
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "grok"
grok_patterns = [ "%{DATE:date} %{TIME:time} %{WORD:EventType} \[%{GREEDYDATA:NifiTask} %{NOTSPACE:Thread}\] %{NOTSPACE:NifiEventType} %{GREEDYDATA:EventText} %{NUMBER:EventDuration} %{WORD:EventDurationUnits}" ]
When I try to start telegraf it give me this error:
Error parsing /etc/telegraf/telegraf.conf, toml: line 10: parse error
The pattern I wrote was tested in a Grok debugger with this text:
2018-08-02 10:53:16,976 INFO [Heartbeat Monitor Thread-1]
o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats
in 11863 nanos
These are the results of some testing:
grok_patterns = ["\[%{GREEDYDATA:NifiTask}\]"] ==> toml: line 10: parse error
grok_patterns = ["[%{GREEDYDATA:NifiTask}]"] ==> Invalid data format: grok
grok_patterns = ['\[%{GREEDYDATA:NifiTask}\]'] ==> Invalid data format: grok
grok_patterns = ["\\[%{GREEDYDATA:NifiTask}\\]"] ==> Invalid data format: grok
grok_patterns = ['[%{GREEDYDATA:NifiTask}]'] -> Invalid data format: grok
The first option for me is the right one, but doesn't works, and the problem seems to be the way the bracket is being escaped.
How is possible to solve this issue?
To fix the issue about escaping bracket, a "partial" solution is to change double quote by simple quote, with this way, in my case (telegraph version 1.13.4) the bracket is correctly escaped by \
There was more than one problem:
First problem: the grok dataformat is added to Telegraf in the 1.8 release (ref), so I must use a nightly install until this version is released.
Second problem: how to escape the brackets, there are problems doing it in a regular way, so what I finally did was to put this part in a custom pattern file, this way it works perfectly.
I am writing a script will perform various tasks with DSV or positional files. These tasks varies and are like creating an DB table for the file, or creating a shell script for parsing it.
As I have idealized my script would receive a "descriptor" as input to perform its tasks. It then would parse this descriptor and perform its tasks accordingly.
I came up with some ideas on how to specify the descriptor file, but didn't really manage to get something robust - probably due my inexperience in ruby.
It seems though, the best way to parse the descriptor would be using ruby language itself and then somehow catch parsing exceptions to turn into something more relevant to the context.
Example:
The file I will be reading looks like (myfile.dsv):
jhon,12343535,27/04/1984
dave,53245265,30/03/1977
...
Descriptor file myfile.des contains:
FILE_TYPE = "DSV"
DSV_SEPARATOR = ","
FIELDS = [
name => [:pos => 0, :type => "string"],
phone => [:pos => 1, :type => "number"],
birthdate => [:pos => 2, :type => "date", :mask = "dd/mm/yyyy"]
]
And the usage should be:
ruby script.rb myfile.des --task GenerateTable
So the program script.rb should load and parse the descriptor myfile.des and perform whatever tasks accordingly.
Any ideas on how to perform this?
Use YAML
Instead of rolling your own, use YAML from the standard library.
Sample YAML File
Name your file something like descriptor.yml, and fill it with:
---
:file_type: DSV
:dsv_separator: ","
:fields:
:name:
:pos: 0
:type: string
:phone:
:pos: 1
:type: number
:birthdate:
:pos: 2
:type: date
:mask: dd/mm/yyyy
Loading YAML
You can read your configuration back in with:
require 'yaml'
settings = YAML.load_file 'descriptor.yml'
This will return a settings Hash like:
{:file_type=>"DSV",
:dsv_separator=>",",
:fields=>
{:name=>{:pos=>0, :type=>"string"},
:phone=>{:pos=>1, :type=>"number"},
:birthdate=>{:pos=>2, :type=>"date", :mask=>"dd/mm/yyyy"}}}
which you can then access as needed to configure your application.