RSync Filter Rules

RSync Filter Rules - yaml

So my problem goes on like this:
I have a yml file at this directory /srv/PvP/plugins/Essentials/userdata/USERNAME.yml
The file contains information such as this:
timestamps:
login: 1379189230018
lastteleport: 1379188566255
logout: 1379188894740
ipAddress: *.*.*.*
lastlocation:
world: skyworld
x: 2.878462237122215
y: 101.0
z: 134.80091939768792
yaw: 0.0
pitch: 0.0
nickname: §bAmir
money: '101980.0'
logoutlocation:
world: skyblock
x: -305.81015336936576
y: 187.50846552474954
z: -446.69999998807907
yaw: -222.72388
pitch: 13.428226
I want to rsync the data in this file to a different directory:
/srv/SB/plugins/Essentials/userdata/USERNAME.yml
But I ONLY want to sync the money line. Is there a way to do this with rsync?
Also if this helps there are around 10K files in the userdata directory.

no, rsync is a file based tool.
you could use grep to extract some lines with a regex-based approach (you might want to use a tool that understands yaml here) and save the result to some file and then rsync that file.

Related

Untar specific files using lambda

I'm using a lambda function to untar files. The lambda is supposed to untar files and once it's done it moves the package to an archive folder.
Code below
def untar_file(zip_key,source_bucket,source_path,file):
zip_obj = s3_resource.Object(bucket_name=source_bucket,key=zip_key)
buffer = BytesIO(zip_obj.get()["Body"].read())
with tarfile.open(fileobj=buffer, mode=('r:gz')) as z:
for filename in z.getmembers():
s3_resource.meta.client.upload_fileobj(
z.extractfile(filename),
Bucket=source_bucket,
Key=source_path+f'/{d1}/{filename}.csv'
)
copy_objects (zip_key,source_bucket,source_path,file)
I want to only untar specific files in the package. Can I specify which file to not untar? Just to avoid the lambda timeout

Figured it out with a simple if statement.
zip_obj = s3_resource.Object(bucket_name=source_bucket, key=zip_key)
buffer = BytesIO(zip_obj.get()["Body"].read())
with tarfile.open(fileobj=buffer, mode=('r:gz')) as z:
for filename in z.getmembers():
if any(word not in str(filename) for word in ['text']):
print(filename)
s3_resource.meta.client.upload_fileobj(
z.extractfile(filename),
Bucket=source_bucket,
Key=source_path+f'/{d1}/{filename}.csv'
)
print ('uploaded')

Trying to get all paths in a YAML file

I've got an input YAML file (test.yml) as follows:
# sample set of lines
foo:
x: 12
y: hello world
ip_range['initial']: 1.2.3.4
ip_range[]: tba
array['first']: Cluster1
array2[]: bar
The source contains square brackets for some keys (possibly empty).
I'm trying to get a line by line list of all the paths in the file, ideally like:
foo.x: 12
foo.y: hello world
foo.ip_range['initial']: 1.2.3.4
foo.ip_range[]: tba
foo.array['first']: Cluster1
array2[]: bar
I've used the yamlpaths library and the yaml-paths CLI, but can't get the desired output. Trying this:
yaml-paths -m -s =foo -K test.yml
outputs:
foo.x
foo.y
foo.ip_range\[\'initial\'\]
foo.ip_range\[\]
foo.array\[\'first\'\]
Each path is on one line, but the output has all the escape characters ( \ ). Modifying the call to remove the -m option ("expand matching parent nodes") fixes that problem but the output is then not one path per line:
yaml-paths -s =foo -K test.yml
gives:
foo: {"x": 12, "y": "hello world", "ip_range['initial']": "1.2.3.4", "ip_range[]": "tba", "array['first']": "Cluster1"}
Any ideas how I can get the one line per path entry but without the escape chars? I was wondering if there is anything for path querying in the ruamel modules?

Your "paths" are nothing more than the joined string representation of the keys (and probably indices) of the
mappings (and potentially sequences) in your YAML document.
That can be trivially generated from data loaded from YAML with a recursive function:
import sys
import ruamel.yaml
yaml_str = """\
# sample set of lines
foo:
x: 12
y: hello world
ip_range['initial']: 1.2.3.4
ip_range[]: tba
array['first']: Cluster1
array2[]: bar
"""
def pathify(d, p=None, paths=None, joinchar='.'):
if p is None:
paths = {}
pathify(d, "", paths, joinchar=joinchar)
return paths
pn = p
if p != "":
pn += '.'
if isinstance(d, dict):
for k in d:
v = d[k]
pathify(v, pn + k, paths, joinchar=joinchar)
elif isinstance(d, list):
for idx, e in enumerate(d):
pathify(e, pn + str(idx), paths, joinchar=joinchar)
else:
paths[p] = d
yaml = ruamel.yaml.YAML(typ='safe')
paths = pathify(yaml.load(yaml_str))
for p, v in paths.items():
print(f'{p} -> {v}')
which gives:
foo.x -> 12
foo.y -> hello world
foo.ip_range['initial'] -> 1.2.3.4
foo.ip_range[] -> tba
foo.array['first'] -> Cluster1
array2[] -> bar

While Anthon's answer certainly produces the output you were after, I think your question was specifically about how to get the yaml-paths command to produce the desired output. I'll address that original question.
As of version 3.5.0, the yamlpath project's yaml-paths command supports a --noescape option which removes the escape symbols from output. Using your input file and the new option, you may find this output more to your liking:
$ yaml-paths --nofile --expand --keynames --noescape --values --search='=~/.*/' test.yml
foo.x: 12
foo.y: hello world
foo.ip_range['initial']: 1.2.3.4
foo.ip_range[]: tba
foo.array['first']: Cluster1
array2[]: bar
Note:
Using the --values option includes the value with each YAML Path.
For interest, I changed the --search expression to match every node in the input file rather than only the "foo" data.
The default output (without setting --noescape) produces YAML Paths which can be used as direct input into other YAML Path parsers and processors; setting --noescape changes this to render human-friendly paths which may not work as downstream YAML Path input.
Disclaimer: I am the author of the yamlpath project. Should you ever run into issues or have questions about it, please visit the project's GitHub project site and engage me via Issues (bugs and feature requests) or Discussions (questions). Thank you!

Snakemake, how to change output filename when using wildcards

I think I have a simple problem but I don't how to solve it.
My input folder contains files like this:
AAAAA_S1_R1_001.fastq
AAAAA_S1_R2_001.fastq
BBBBB_S2_R1_001.fastq
BBBBB_S2_R2_001.fastq
My snakemake code:
import glob
samples = [os.path.basename(x) for x in sorted(glob.glob("input/*.fastq"))]
name = []
for x in samples:
if "_R1_" in x:
name.append(x.split("_R1_")[0])
NAME = name
rule all:
input:
expand("output/{sp}_mapped.bam", sp=NAME),
rule bwa:
input:
R1 = "input/{sample}_R1_001.fastq",
R2 = "input/{sample}_R2_001.fastq"
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
run:
shell("bwa mem {params.ref} {input.R1} {input.R2} | samtools sort > {output.mapped}")
The output file names are:
AAAAA_S1_mapped.bam
BBBBB_S2_mapped.bam
I want the output file to be:
AAAAA_mapped.bam
BBBBB_mapped.bam
How can I or change the outputname or rename the files before or after the bwa rule.

Try this:
import pathlib
indir = pathlib.Path("input")
paths = indir.glob("*_S?_R?_001.fastq")
samples = set([x.stem.split("_")[0] for x in paths])
rule all:
input:
expand("output/{sample}_mapped.bam", sample=samples)
def find_fastqs(wildcards):
fastqs = [str(x) for x in indir.glob(f"{wildcards.sample}_*.fastq")]
return sorted(fastqs)
rule bwa:
input:
fastqs = find_fastqs
output:
mapped = "output/{sample}_mapped.bam"
params:
ref = "refs/AF086833.fa"
shell:
"bwa mem {params.ref} {input.fastqs} | samtools sort > {output.mapped}"
Uses an input function to find the correct samples for rule bwa. There might be a more elegant solution, but I can't see it right now. I think this should work, though.
(Edited to reflect OP's edit.)

Unfortunately, I've also had this problem with filenames with the following logic: {batch}/{seq_run}_{index}_{flowcell}_{lane}_{read_orientation}.fastq.gz.
I think that the core problem is that none of the individual wildcards are unique. Also, not all values for all wildcards can be combined; seq_run1 was run on lane1, not lane2. Therefore, expand() does not work.
After multiple attempts in Snakemake (see below), my solution was to standardize input with mv / sed / rename. Removing {batch}, {flowcell} and {lane} made it possible to use {sample}, a unique combination of {seq_run} and {index}.
What did not work (but it could be worth to try for others in the same situation):
Adding the zip argument to expand()
Renaming output using the following syntax:
output: "_".join(re.split("[/_]", "{full_filename}")[1,2]+".fastq.gz"

Triggering Lambda on s3 video upload?

I am testing adding a watermark to a video once uploaded. I am running into an issue where lamdba wants me to specify which file to change on upload. but i want it to trigger when any (really, any file that ends in .mov, .mp4, etc.) file is uploaded.
To clarify, this all works manually in creating a pipeline and job.
Here's my code:
require 'json'
require 'aws-sdk-elastictranscoder'
def lambda_handler(event:, context:)
client = Aws::ElasticTranscoder::Client.new(region: 'us-east-1')
resp = client.create_job({
pipeline_id: "15521341241243938210-qevnz1", # required
input: {
key: File, #this is where my issue
},
output: {
key: "CBtTw1XLWA6VSGV8nb62gkzY",
# thumbnail_pattern: "ThumbnailPattern",
# thumbnail_encryption: {
# mode: "EncryptionMode",
# key: "Base64EncodedString",
# key_md_5: "Base64EncodedString",
# initialization_vector: "ZeroTo255String",
# },
# rotate: "Rotate",
preset_id: "1351620000001-000001",
# segment_duration: "FloatString",
watermarks: [
{
preset_watermark_id: "TopRight",
input_key: "uploads/2354n.jpg",
# encryption: {
# mode: "EncryptionMode",
# key: "zk89kg4qpFgypV2fr9rH61Ng",
# key_md_5: "Base64EncodedString",
# initialization_vector: "ZeroTo255String",
# },
},
],
}
})
end
How do i specify just any file that is uploaded, or files that are a specific format? for the input: key: ?
Now, my issue is that i am using active storage so it doesn't end in .jpg or .mov, etc., it just is a random generated string (they have reasons for doing this). I am trying to find a reason to use active storage and this is my final step to making it work like other alternatives before it.

The extension field is Optional. If you don't specify anything in it, the lambda will be triggered no matter what file is uploaded. You can then check if it's the type of file you want and proceed.

Bash printing unnecessary things on new session since MacOS update

The following line in /etc/bashrc_Apple_Terminal
shell_session_history_enable() {
(umask 077; touch "$SHELL_SESSION_HISTFILE_NEW") <<< THIS LINE
HISTFILE="$SHELL_SESSION_HISTFILE_NEW"
SHELL_SESSION_HISTORY=1
}
is printing something like this on every new session.
/Users/me/.bash_sessions/717F6632-A946-44EE-8A27-2547EDDD09E9.historynew Stats {
dev: 16777220,
mode: 33152,
nlink: 1,
uid: 501,
gid: 20,
rdev: 0,
blksize: 4096,
ino: 1406878,
size: 0,
blocks: 0,
atimeMs: 1502801769000,
mtimeMs: 1502801769000,
ctimeMs: 1502801769000,
birthtimeMs: 1502801769000,
atime: 2017-08-15T12:56:09.000Z,
mtime: 2017-08-15T12:56:09.000Z,
ctime: 2017-08-15T12:56:09.000Z,
birthtime: 2017-08-15T12:56:09.000Z }
Closest thing as to when is since last MacOS update.
What's an elegant way to solve this without changing this file I don't really want to change?

This post answers my question
How to deactivate bash_history stats print when opening a new terminal window on my mac?
I didn't entertain the possibility that there was an alias for touch, but indeed this was the case.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

RSync Filter Rules - yaml

no, rsync is a file based tool. you could use grep to extract some lines with a regex-based approach (you might want to use a tool that understands yaml here) and save the result to some file and then rsync that file.

Related

Untar specific files using lambda

Trying to get all paths in a YAML file

Snakemake, how to change output filename when using wildcards

Triggering Lambda on s3 video upload?

Bash printing unnecessary things on new session since MacOS update

Categories

Resources