Precision gets lost for big number.
I am using tail input plugin to read file and data inside a file is in json format.
Below is the configuration
files = ["E:/Telegraph/MSTCIVRRequestLog_*.json"]
from_beginning = true
name_override = "tcivrrequest"
data_format = "json"
json_strict = true
files = ["E:/Telegraph/output.json"]
data_format = "json"
Input file contains
Expected Output
Actual Output
Number 959011990586458245 converted into 959011990586458200(check last few digits).
Already Tried Below things but not worked
json_string_fields = ["RequestId"]
string = [""RequestId""]"
precision = "1s"
json_int64_fields = ["RequestId"]
character_encoding = "utf-8"
json_strict = true

I was able to reproduce this with the json parser as well. My suggestion would be to move to the json_v2 parser with a config like the following:
files = ["metrics.json"]
data_format = "json_v2"
path = "RequestId"
type = "int"
I was able to get a result as follows:
file RequestId=959011990586458245i 1651181595000000000
The newer parser is generally more accurate and flexible for simple cases like the one you provided.


Trying to write a custom rule

I'm trying to write a custom rule for gqlgen. The idea is to run it to generate Go code from a GraphQL schema.
My intended usage is:
name = "gql-gen-foo",
schemas = ["schemas/schema.graphql"],
visibility = ["//visibility:public"],
"name" is the name of the rule, on which I want other rules to depend; "schemas" is the set of input files.
So far I have:
_go_context = "go_context",
_go_rule = "go_rule",
def _gqlgen_impl(ctx):
go = _go_context(ctx)
args = ["run --config"] + [ctx.attr.config]
inputs = ctx.attr.schemas,
outputs = [ctx.actions.declare_file(],
arguments = args,
progress_message = "Generating GraphQL models and runtime from %s" % ctx.attr.config,
executable = go.go,
_gqlgen = _go_rule(
implementation = _gqlgen_impl,
attrs = {
"config": attr.string(
default = "gqlgen.yml",
doc = "The gqlgen filename",
"schemas": attr.label_list(
allow_files = [".graphql"],
doc = "The schema file location",
executable = True,
def gqlgen(**kwargs):
tags = kwargs.get("tags", [])
if "manual" not in tags:
kwargs["tags"] = tags
My immediate issue is that Bazel complains that the schemas are not Files:
expected type 'File' for 'inputs' element but got type 'Target' instead
What's the right approach to specify the input files?
Is this the right approach to generate a rule that executes a command?
Finally, is it okay to have the output file not exist in the filesystem, but rather be a label on which other rules can depend?
Instead of:
inputs = ctx.attr.schemas,
inputs = ctx.files.schemas,
Is this the right approach to generate a rule that executes a command?
This looks right, as long as gqlgen creates the file with the correct output name (outputs = [ctx.actions.declare_file(]).
generated_go_file = ctx.actions.declare_file( + ".go")
# ..
outputs = [generated_go_file],
args = ["run", "...", "--output", generated_go_file.short_path],
# ..
Finally, is it okay to have the output file not exist in the filesystem, but rather be a label on which other rules can depend?
The output file needs to be created, and as long as it's returned at the end of the rule implementation in a DefaultInfo provider, other rules will be able to depend on the file label (e.g. //my/package:foo-gqlgen.go).

Is it possible to read pdf/audio/video files(unstructured data) using Apache Spark?

For example, I have thousands of pdf invoices and I want to read data from those and perform some analytics on that. What steps must I do to process unstructured data?
Yes, it is. Use sparkContext.binaryFiles to load files in binary format and then use map to map value to some other format - for example, parse binary with Apache Tika or Apache POI.
val rawFile = sparkContext.binaryFiles(...
val ready = ( here parsing with other framework
What is important, parsing must be done with other framework like mentioned previously in my answer. Map will get InputStream as an argument
We had a scenario where we needed to use a custom decryption algorithm on the input files. We didn't want to rewrite that code in Scala or Python. Python-Spark code follows:
from pyspark import SparkContext, SparkConf, HiveContext, AccumulatorParam
def decryptUncompressAndParseFile(filePathAndContents):
'''each line of the file becomes an RDD record'''
global acc_errCount, acc_errLog
proc = subprocess.Popen(['custom_decrypt_program','--decrypt'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(unzippedData, err) = proc.communicate(input=filePathAndContents[1])
if len(err) > 0: # problem reading the file
acc_errLog.add('Error: '+str(err)+' in file: '+filePathAndContents[0]+
', on host: '+ socket.gethostname()+' return code:'+str(returnCode))
return [] # this is okay with flatMap
records = list()
iterLines = iter(unzippedData.splitlines())
for line in iterLines:
#sys.stderr.write('Line: '+str(line)+'\n')
values = [x.strip() for x in line.split('|')]
records.append( (... extract data as appropriate from values into this tuple ...) )
return records
class StringAccumulator(AccumulatorParam):
''' custom accumulator to holds strings '''
def zero(self,initValue=""):
return initValue
def addInPlace(self,str1,str2):
return str1.strip()+'\n'+str2.strip()
def main():
global acc_errCount, acc_errLog
acc_errCount = sc.accumulator(0)
acc_errLog = sc.accumulator('',StringAccumulator())
binaryFileTup = sc.binaryFiles(args.inputDir)
# use flatMap instead of map, to handle corrupt files
linesRdd = binaryFileTup.flatMap(decryptUncompressAndParseFile, True)
df = sqlContext.createDataFrame(linesRdd, ourSchema())
The custom string accumulator was very useful in identifying corrupt input files.

lsyncd cant use dynamic backup suffix

I wan't to use lsyncd to create backups of the modified files using as a suffix a date/time string.
If I set the suffix option (in the lsyncd.conf file) using lua, the date/time is computed once, when I start the daemon, and is not updated at each sync iteration.
This leads to the creation of only one backup file per each modified file (with the same suffix) and I wish for the creation of a new backup file per modification per file.
The config file I use is the following:
-- global settings
settings {
delay = 5,
maxProcesses = 5,
statusFile = "<STATUS_FILE_PATH>",
logfile = "<LOG_FILE_PATH>",
insist = true
-- target nodes
nodes = {
{ source = "/home/<USER>/sync", target = "<TARGET_IP>:/home/<USER>/sync"},
-- execution
time ="*t")
datetime = (time.year .. time.month .. .. time.hour .. time.min .. time.sec)
for _, node in ipairs(nodes) do
sync {
source = node.source,
target =,
rsync = {
compress = true,
checksum = true,
perms = true,
rsh = "/usr/bin/ssh -i /home/<USER>/.ssh/id_dsa -o StrictHostKeyChecking=no",
times = true,
verbose = true,
_extra = { "--backup", "--suffix=" .. datetime },
If i try to pass the date function of bash in the suffix option, like this:
_extra = { "--backup", "--suffix=_$(date +\"%Y%m%d%H%M%S\")" },
it is converted to a string without computing the value, leading to a backup file with a name like this:
testfile.txt_$(date +"%Y%m%d%H%M%S")
I am limited to using the 2.1.4 version of lsyncd.
Is it possible to create dynamic backup file suffixes?
Not tested. But try this
--suffix=`date +"%F"`

ruby-nmap how to create hash output instead of xml

I want to use the ruby-nmap gem to do a port scan on a number of instances. Here's what I'm currently using:
Nmap::Program.scan do |nmap|
nmap.syn_scan = true
nmap.service_scan = true
nmap.os_fingerprint = true
nmap.xml = 'scan.xml'
nmap.verbose = true
# address[:public_ip] is my target
nmap.targets = address[:public_ip]
It creates an xml file, however I would prefer it gives me json or a hash as output and does not write this to a file. Is there any easy way to do this without just reading the xml file it creates?

Retrieve bibtex data from crossref by sending DOI from matlab: translation from ruby

I want to retrieve bibtex data (for building a bibliography) by sending a DOI (Digital Object Identifier) to from within matlab.
The crossref API suggests something like this:
curl -LH "Accept: text/bibliography; style=bibtex"
based on this source.
Another example from here suggests the following in ruby:
open("","Accept" => "text/bibliography; style=bibtex"){|f| f.each {|line| print line}}
Although I've heard ruby rocks I want to do this in matlab and have no clue how to translate the ruby message or interpret the crossref command.
The following is what I have so far to send a doi to crossref and retrieve data in xml (in variable retdat), but not bibtex, format:
doi = '10.1038/nrd842';
fetchurl = sprintf(URL_PATTERN,doi);
numinputs = 1;
www =;
is = www.openStream;
%Read stream of data
isr =;
br =;
%Parse return data
retdat = [];
next_line = toCharArray(br.readLine)'; %First line contains headings, determine length
%Loop through data
while ischar(next_line)
retdat = [retdat, 13, next_line];
tmp = br.readLine;
next_line = toCharArray(tmp)';
if strcmp(next_line,'M END')
next_line = [];
%Cleanup java objects
Help translating the ruby statement to something matlab can send using a script such as that posted to establish the communication with crossref would be greatly appreciated.
Additional constraints include backward compatibility of the code (back at least to R14) :>(. Also, no use of ruby, since that solves the problem but is not a "matlab" solution, see here for how to invoke ruby from matlab via system('ruby script.rb').
You can easily edit urlread for what you need. I won't post my modified urlread function code due to copyright.
In urlread, (mine is at C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\urlread.m), as the least elegant solution:
Right before "% Read the data from the connection." I added:
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
The answer from user2034006 lays the path to a solution.
The following script works when urlread is modified:
doi = '10.1038/nrd842';
fetchurl = sprintf(URL_PATTERN,doi);
method = 'post';
params= {};
[string,status] = urlread(fetchurl,method,params);
The modification in urlread is not identical to the suggestion of user2034006. Things worked when the line
in urlread was replaced with
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
