I wan't to use lsyncd to create backups of the modified files using as a suffix a date/time string.
If I set the suffix option (in the lsyncd.conf file) using lua, the date/time is computed once, when I start the daemon, and is not updated at each sync iteration.
This leads to the creation of only one backup file per each modified file (with the same suffix) and I wish for the creation of a new backup file per modification per file.
The config file I use is the following:
-- global settings
settings {
delay = 5,
maxProcesses = 5,
statusFile = "<STATUS_FILE_PATH>",
logfile = "<LOG_FILE_PATH>",
insist = true
}
-- target nodes
nodes = {
{ source = "/home/<USER>/sync", target = "<TARGET_IP>:/home/<USER>/sync"},
}
-- execution
time = os.date("*t")
datetime = (time.year .. time.month .. time.day .. time.hour .. time.min .. time.sec)
for _, node in ipairs(nodes) do
sync {
default.rsync,
source = node.source,
target = node.target,
rsync = {
compress = true,
checksum = true,
perms = true,
rsh = "/usr/bin/ssh -i /home/<USER>/.ssh/id_dsa -o StrictHostKeyChecking=no",
times = true,
verbose = true,
_extra = { "--backup", "--suffix=" .. datetime },
}
}
end
If i try to pass the date function of bash in the suffix option, like this:
_extra = { "--backup", "--suffix=_$(date +\"%Y%m%d%H%M%S\")" },
it is converted to a string without computing the value, leading to a backup file with a name like this:
testfile.txt_$(date +"%Y%m%d%H%M%S")
I am limited to using the 2.1.4 version of lsyncd.
Is it possible to create dynamic backup file suffixes?
Not tested. But try this
--suffix=`date +"%F"`
Related
Precision gets lost for big number.
I am using tail input plugin to read file and data inside a file is in json format.
Below is the configuration
[inputs.tail]]
files = ["E:/Telegraph/MSTCIVRRequestLog_*.json"]
from_beginning = true
name_override = "tcivrrequest"
data_format = "json"
json_strict = true
[[outputs.file]]
files = ["E:/Telegraph/output.json"]
data_format = "json"
Input file contains
{"RequestId":959011990586458245}
Expected Output
{"fields":{"RequestId":959011990586458245},"name":"tcivrrequest","tags":{},"timestamp":1632994599}
Actual Output
{"fields":{"RequestId":959011990586458200},"name":"tcivrrequest","tags":{},"timestamp":1632994599}
Number 959011990586458245 converted into 959011990586458200(check last few digits).
Already Tried Below things but not worked
json_string_fields = ["RequestId"]
[[processors.converter]]
[processors.converter.fields]
string = [""RequestId""]"
precision = "1s"
json_int64_fields = ["RequestId"]
character_encoding = "utf-8"
json_strict = true
I was able to reproduce this with the json parser as well. My suggestion would be to move to the json_v2 parser with a config like the following:
[[inputs.file]]
files = ["metrics.json"]
data_format = "json_v2"
[[inputs.file.json_v2]]
[[inputs.file.json_v2.field]]
path = "RequestId"
type = "int"
I was able to get a result as follows:
file RequestId=959011990586458245i 1651181595000000000
The newer parser is generally more accurate and flexible for simple cases like the one you provided.
Thanks!
I'm trying to write a custom rule for gqlgen. The idea is to run it to generate Go code from a GraphQL schema.
My intended usage is:
gqlgen(
name = "gql-gen-foo",
schemas = ["schemas/schema.graphql"],
visibility = ["//visibility:public"],
)
"name" is the name of the rule, on which I want other rules to depend; "schemas" is the set of input files.
So far I have:
load(
"#io_bazel_rules_go//go:def.bzl",
_go_context = "go_context",
_go_rule = "go_rule",
)
def _gqlgen_impl(ctx):
go = _go_context(ctx)
args = ["run github.com/99designs/gqlgen --config"] + [ctx.attr.config]
ctx.actions.run(
inputs = ctx.attr.schemas,
outputs = [ctx.actions.declare_file(ctx.attr.name)],
arguments = args,
progress_message = "Generating GraphQL models and runtime from %s" % ctx.attr.config,
executable = go.go,
)
_gqlgen = _go_rule(
implementation = _gqlgen_impl,
attrs = {
"config": attr.string(
default = "gqlgen.yml",
doc = "The gqlgen filename",
),
"schemas": attr.label_list(
allow_files = [".graphql"],
doc = "The schema file location",
),
},
executable = True,
)
def gqlgen(**kwargs):
tags = kwargs.get("tags", [])
if "manual" not in tags:
tags.append("manual")
kwargs["tags"] = tags
_gqlgen(**kwargs)
My immediate issue is that Bazel complains that the schemas are not Files:
expected type 'File' for 'inputs' element but got type 'Target' instead
What's the right approach to specify the input files?
Is this the right approach to generate a rule that executes a command?
Finally, is it okay to have the output file not exist in the filesystem, but rather be a label on which other rules can depend?
Instead of:
ctx.actions.run(
inputs = ctx.attr.schemas,
Use:
ctx.actions.run(
inputs = ctx.files.schemas,
Is this the right approach to generate a rule that executes a command?
This looks right, as long as gqlgen creates the file with the correct output name (outputs = [ctx.actions.declare_file(ctx.attr.name)]).
generated_go_file = ctx.actions.declare_file(ctx.attr.name + ".go")
# ..
ctx.actions.run(
outputs = [generated_go_file],
args = ["run", "...", "--output", generated_go_file.short_path],
# ..
)
Finally, is it okay to have the output file not exist in the filesystem, but rather be a label on which other rules can depend?
The output file needs to be created, and as long as it's returned at the end of the rule implementation in a DefaultInfo provider, other rules will be able to depend on the file label (e.g. //my/package:foo-gqlgen.go).
I can't get my Spark job to stream "old" files from HDFS.
If my Spark job is down for some reason (e.g. demo, deployment) but the writing/moving to HDFS directory is continuous, I might skip those files once I up the Spark Streaming Job.
val hdfsDStream = ssc.textFileStream("hdfs://sandbox.hortonworks.com/user/root/logs")
hdfsDStream.foreachRDD(
rdd => logInfo("Number of records in this batch: " + rdd.count())
)
Output --> Number of records in this batch: 0
Is there a way for Spark Streaming to move the "read" files to a different folder? Or we have to program it manually? So it will avoid reading already "read" files.
Is Spark Streaming the same as running the spark job (sc.textFile) in CRON?
As Dean mentioned, textFileStream uses the default of only using new files.
def textFileStream(directory: String): DStream[String] = {
fileStream[LongWritable, Text, TextInputFormat](directory).map(_._2.toString)
}
So, all it is doing is calling this variant of fileStream
def fileStream[
K: ClassTag,
V: ClassTag,
F <: NewInputFormat[K, V]: ClassTag
] (directory: String): InputDStream[(K, V)] = {
new FileInputDStream[K, V, F](this, directory)
}
And, looking at the FileInputDStream class we will see that it indeed can look for existing files, but defaults to new only:
newFilesOnly: Boolean = true,
So, going back into the StreamingContext code, we can see that there is and overload we can use by directly calling the fileStream method:
def fileStream[
K: ClassTag,
V: ClassTag,
F <: NewInputFormat[K, V]: ClassTag]
(directory: String, filter: Path => Boolean, newFilesOnly: Boolean):InputDStream[(K, V)] = {
new FileInputDStream[K, V, F](this, directory, filter, newFilesOnly)
}
So, the TL;DR; is
ssc.fileStream[LongWritable, Text, TextInputFormat]
(directory, FileInputDStream.defaultFilter, false).map(_._2.toString)
Are you expecting Spark to read files already in the directory? If so, this is a common misconception, one that took me by surprise. textFileStream watches a directory for new files to appear, then it reads them. It ignores files already in the directory when you start or files it's already read.
The rationale is that you'll have some process writing files to HDFS, then you'll want Spark to read them. Note that these files much appear atomically, e.g., they were slowly written somewhere else, then moved to the watched directory. This is because HDFS doesn't properly handle reading and writing a file simultaneously.
val filterF = new Function[Path, Boolean] {
def apply(x: Path): Boolean = {
println("looking if "+x+" to be consider or not")
val flag = if(x.toString.split("/").last.split("_").last.toLong < System.currentTimeMillis){ println("considered "+x); list += x.toString; true}
else{ false }
return flag
}
}
this filter function is used to determine whether each path is actually the one preferred by you. so the function inside the apply should be customized as per your requirement.
val streamed_rdd = ssc.fileStream[LongWritable, Text, TextInputFormat]("/user/hdpprod/temp/spark_streaming_output",filterF,false).map{case (x, y) => (y.toString)}
now you have to set the third variable of filestream function to false, this is to make sure not only new files but also consider old existing files in the streaming directory.
I really am stumped - I've spent almost two hours searching for an answer to a ridiculously simple question: how can I continually keep two local files in sync on my mac? I investigated various tricks involving rsync, then settled on using lsyncd.
But for the life of me, I can't figure out how I can get lsyncd to sync two specific files. Is this even supported in the API? It was not clear in the documentation whether or not I could use rsync in this manner; I assume that lsyncd is passing CLI options which are preventing this. My configuration is as follows:
sync = {
default.rsync,
source = "/Users/username/Downloads/test1.txt",
target = "/Users/username/Downloads/test2.txt",
rsync = {
binary = "/usr/local/bin/rsync",
archive = "true"
}
}
It just says 'nothing to sync'. Help?
This had worked for me:
sync {
default.rsync,
source = "/Users/username/Downloads/",
target = "/Users/username/Downloads/",
rsync = {
binary = "/usr/bin/rsync",
archive = "true",
_extra = {
"--include=test1.txt",
"--exclude=*"
}
}
}
You have to use the include/exclude feature of lsyncd, which did not come out of the box. You have to use _extra field to set them.
In lsynd you can do like this
settings {
logfile = "/var/log/lsyncd.log",
statusFile = "/var/log/lsyncd-status.log",
statusInterval = 20,
nodaemon = true
}
sync {
default.rsync,
source="/srcdir/",
target="/dstdir/",
rsync = {
archive = true,
compress = true,
whole_file = false,
_extra = { "--include=asterisk", "--exclude=*" },
verbose = true
},
delay=5,
log=all,
}
After start lsyncd i have next
root#localhost:/srcdir# ls
12 aster asterisk
root#localhost:/dstdir# ls
asterisk
As a test i wrote a .NET script which recursively looks in C:\$Recycle.Bin and i'd like to delete files after they been in there for X days.
I decided to check the access time but access time isn't updated on move. How do i check if a file has been in there for X period of time? (i'm using windows 7)
This c# version may help:
var Shl = new Shell(); Folder Recycler = Shl.NameSpace(10);
FI = Recycler.Items().Item(0);
string FileName = Recycler.GetDetailsOf(FI, 0);
string FilePath = Recycler.GetDetailsOf(FI, 1);
string RecyleDate = Recycler.GetDetailsOf(FI, 2);