Combining local and remote ZFS snapshoting [zfs_autobackup] - snapshot

I was searching for a simple way of managing my local and remote ZFS snapshots and decided to give zfs_autobackup a try.
My goals are to keep a local set of snapshots taken at specific times and send them to a remote machine.
zfs set autobackup:local=true tank/data
After selecting the source dataset, I created a cron file as follows
0 8-20 * * 1-5 /usr/local/bin/zfs-autobackup local --keep-source 12
5 20 * * 1-5 /usr/local/bin/zfs-autobackup local --keep-source 1d1w
10 20 * * 5 /usr/local/bin/zfs-autobackup local --keep-source 1w1m
0 0 1 * * /usr/local/bin/zfs-autobackup local --keep-source 1m1y
Which doesn't behave the way I expected, deleting older snapshots.
I also wonder which will be the best way to send the snapshots to the remote server, does it make any sense to define another dataset?
zfs set autobackup:remote=true tank/data
Any ideas?

im the author if zfs-autobackup.
The answer of Ser is correct: use one zfs command instead of 4. And use conmas to seperate the rules.
Also zfs-backup already keeps local and remote snapshots. So you can just send over the snapshots created by the cronjob. (maby not name them "local", its confusing in that case)
So use the same command as in your cronjob but add the target dataset and --ssh-target.
(also checkout the documention, it explains everything)

Related

Prometheus: Prevent starting new time series on label change

Assuming the following metric:
cpu_count{machine="srv1", owner="Alice", department="ops"} 8
cpu_count{machine="srv1", owner="Bob", department="ops"} 8
I'd like to be able to prevent starting a new time series on owner change. It should still be considered the same instance, but I would like to be able to look up by owner.
I don't particularly care if it matches only on my_metric{owner=~"Box"} or on both my_metric{owner=~"Box"} and my_metric{owner=~"Alice"}, I just need to make sure it does not count twice on my_metric{machine=~"srv1"} or my_metric{department=~"ops"}.
I'm willing to accept that using labels to group instances in this manner is not the correct approach, but what is?
When you add the label "owner" to this kind of metric I think you're trying to accomplish a kind of "asset management" which could be done better with some other tool developed specific to this goal. Prometheus isn't a suitable tool to keep the information of who is using each machine in your company.
Said that, every time the owner of a machine changes you could workaround this issue deleting the old data series using the REST API executing something like this:
curl --silent --user USER:PASS --globoff --request POST "https://PROMETHEUS-SERVER/api/v1/admin/tsdb/delete_series?match[]={machine='srv1',owner='Bob'}"
If you can change the code, it would be better to have a metric dedicated to the ownership:
# all metrics are identified a usual
cpu_count{machine="srv1", department="ops"} 8
# use an info metrics to give details about owner
machine_info{machine="srv1", owner="Alice", department="ops"} 1
You can still aggregate the information id you need it:
cpu_count * ON(machine,department) machine_info
That way, the owner is not polluting all your metrics. Still, you will have issues when changing the owner of a machine while waiting for the older metric to disappear (5 minutes before staleness).
I have not tried it but a solution could be to use the time at which the ownership changed (if you can provide it) as a metric value - epoch time in seconds.
# owner changed at Sun, 08 Mar 2020 22:05:53 GMT
machine_info{machine="srv1", owner="Alice", department="ops"} 1583705153
# Previous owner Sat, 01 Feb 2020 00:00:00 GMT
machine_info{machine="srv1", owner="Alice", department="ops"} 1580515200
And then use the following expression to get the latest owner whenever you need the current owner - only useful when owner has changed within the last 5 minutes:
machine_info == ON(machine,department) BOOL (max(machine_info) BY(machine,department) )
Quite a mouthful but it would approach what you want.

Unable to update GeoJSON files in an application using APScheduler on Heroku

I have 2 GeoJSON files in my application. I have written a Python job using APScheduler to update the 2 GeoJSON files based on the changes in the database. The job is configured to run once every 24 hours. Currently, I get the confirmation message that the new GeoJSON file was created but it crashes immediately after printing this log statement. I am not sure if we can write into the Heroku container, is that the reason for the job to crash?
What alternatives do I have to make it work? One of the things that I would be trying is to write the output of APScheduler to Amazon S3. Any suggestions in this regards would be of great help.
I have another job that is to update a couple of fields in the DB which works fine.
Also, this works fine locally. It replaces the existing GeoJSON based on the changes in the database.
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.schedulers.background import BackgroundScheduler
import psycopg2
from UnoCPI import sqlfiles
import os
import Project_GEOJSON,Partner_GEOJSON
sched = BlockingScheduler()
sched1 = BackgroundScheduler()
# Initializing the sql files
sql = sqlfiles
# Schedules job_function to be run on the third Friday
# of June, July, August, November and December at 00:00, 01:00, 02:00 and 03:00
# sched.add_job(YOURRUNCTIONNAME, 'cron', month='6-8,11-12', day='3rd fri', hour='0-3')
#sched.scheduled_job('cron', day_of_week='mon-sun', hour=23)
# #sched.scheduled_job('cron', month='1,6,8', day='1', hour='0')
# #sched.scheduled_job('interval', minutes=5)
#sched1.add_job(generateGEOJSON,'cron', day_of_week='mon-sun', hour=20)
def generateGEOJSON():
os.system(Partner_GEOJSON)
os.system(Project_GEOJSON)
def scheduled_job():
print('This job is ran every day at 11pm.')
# print('This job is ran every 1st day of the month of January, June and August at 12 AM.')
# print('This job is ran every minute.')
global connection
global cursor
try:
# CAT STAGING
connection = psycopg2.connect(user="heroku cred",
password="postgres password from heroku",
host="heroku host",
port="5432",
database="heroku db",
sslmode="require")
if connection:
print("Postgres SQL Database successful connection")
cursor = connection.cursor()
# create a temp table with all projects start and end dates
cursor.execute(sql.start_and_end_dates_temp_table_sql)
# fetch all community partners to be set to inactive
cursor.execute(sql.comm_partners_to_be_set_to_inactive)
inactive_comm_partners = cursor.fetchall()
print("Here is the list of all projects to be set to inactive", "\n")
# loop to print all the data
for i in inactive_comm_partners:
print(i)
# fetch all community partners to be set to active
cursor.execute(sql.comm_partners_to_be_set_to_active)
active_comm_partners = cursor.fetchall()
print("Here is the list of all projects to be set to active", "\n")
# loop to print all the data
for i in active_comm_partners:
print(i)
# UPDATE PROJECT STATUS TO ACTIVE
cursor.execute(sql.update_project_to_active_sql)
# UPDATE PROJECT STATUS TO COMPLETED
cursor.execute(sql.update_project_to_inactive_sql)
# UPDATE COMMUNITY PARTNER WHEN TIED TO A INACTIVE PROJECTS ONLY TO FALSE(INACTIVE)
cursor.execute(sql.update_comm_partner_to_inactive_sql)
# UPDATE COMMUNITY PARTNER WHEN TIED TO A BOTH ACTIVE
# and / or INACTIVE or JUST ACTIVE PROJECTS ONLY TO TRUE(ACTIVE)
cursor.execute(sql.update_comm_partner_to_active_sql)
# drop all_projects_start_and_end_date temp table
cursor.execute(sql.drop_temp_table_all_projects_start_and_end_dates_sql)
except (Exception, psycopg2.Error) as error:
print("Error while connecting to Postgres SQL", error)
finally:
# closing database connection.
if connection:
connection.commit()
cursor.close()
connection.close()
print("Postgres SQL connection is closed")
sched.start()
sched1.start()
I am not sure if we can write into the Heroku container
You can, but your changes will be periodically lost. Heroku's filesystem is dyno-local and ephemeral. Every time your dyno restarts, changes made to the filesystem will be lost. This happens frequently (at least once per day) and unpredictably.
One of the things that I would be trying is to write the output of APScheduler to Amazon S3
That's exactly what Heroku recommends doing with generated files and user uploads:
AWS Simple Storage Service, e.g. S3, is a “highly durable and available store” and can be used to reliably store application content such as media files, static assets and user uploads. It allows you to offload your entire storage infrastructure and offers better scalability, reliability, and speed than just storing files on the filesystem.
AWS S3, or similar storage services, are important when architecting applications for scale and are a perfect complement to Heroku's ephemeral filesystem.

Limit size tinyproxy logfile

I am working on setting tinyproxy on Centos 6.5 server in cloud. I have installed it successfully. However, because of cloud limitation in terms of size, we want to limit logfile (/var/log/tinyproxy.log) size. I need to configure log file so that it could keep information of last hour logs. For example, If now were 5.30 PM, so file must contain only data from 4.30 PM. I have read tinyproxy documentation and couldn't find logfile limit parameter. I'd be very thankful if somebody gave me a clue how to do that. Thanks.
I don't believe Tinyproxy has a feature for limiting log size, but it would be pretty simple to write a script for this separately.
An example script using Python, running automatically every hour using Linux crontab:
import os
import shutil
# Remove Old Logs
os.remove(/[DESTINATION])
# Copy Logs to Storage
copyfile(/var/log/tinyproxy.log, /[DESTINATION])
# Remove Primary Logs
os.remove(/var/log/tinyproxy.log)
(This is just an example. You may have to clear tinyproxy.log instead of deleting it. You may even want to set it up so you copy the old logs one more time, so that you don't end up with only 1-2 minutes of logs when you need them.)
And add this to crontab using crontab -e (make sure you have the right permissions to edit the log file!). This will run your script every hour, on the hour:
01 * * * * python /[Python Path]/logLimit.py
I found crontab very useful for this task.
30 * * * * /usr/sbin/logrotate /etc/logrotate.d/tinyproxy
It rotates my log file every hour.

Redis sync fails. Redis copy keys and values works

I have two redis instances both running on the same machine on win64. The version is the one from https://github.com/MSOpenTech/redis with no amendments and the binaries are running as per download from github (ie version 2.6.12).
I would like to create a slave and sync it to the master. I am doing this on the same machine to ensure it works before creating a slave on a WAN located machine which will take around an hour to transfer the data that exists in the primary.
However, I get the following error:
[4100] 15 May 18:54:04.620 * Connecting to MASTER...
[4100] 15 May 18:54:04.620 * MASTER <-> SLAVE sync started
[4100] 15 May 18:54:04.620 * Non blocking connect for SYNC fired the event.
[4100] 15 May 18:54:04.620 * Master replied to PING, replication can continue...
[4100] 15 May 18:54:28.364 * MASTER <-> SLAVE sync: receiving 2147483647 bytes from master
[4100] 15 May 18:55:05.772 * MASTER <-> SLAVE sync: Loading DB in memory
[4100] 15 May 18:55:14.508 # Short read or OOM loading DB. Unrecoverable error, aborting now.
The only way I can sync up is via a mini script something along the lines of :
import orm.model
if __name__ == "__main__":
src = orm.model.caching.Redis(**{"host":"source_host","port":6379})
dest = orm.model.caching.Redis(**{"host":"source_host","port":7777})
ks = src.handle.keys()
for i,k in enumerate(ks):
if i % 1000 == 0:
print i, "%2.1f %%" % ( (i * 100.0) / len(ks))
dest.handle.set(k,src.handle.get(k))
where orm.model.caching.* are my middleware cache implementation bits (which for redis is just creating a self.handle instance variable).
Firstly, I am very suspicious of the number in the receiving bytes as that is 2^32-1 .. a very strange coincidence. Secondly, OOM can mean out of memory, yet I can fire up a 2nd process and sync that via the script but doing this via redis --slaveof fails with what appears to be out of memory. Surely this can't be right?
redis-check-dump does not run as this is the windows implementation.
Unfortunately there is sensitive data in the keys I am syncing so I can't offer it to anybody to investigate. Sorry about that.
I am definitely running the 64 bit version as it states this upon startup in the header.
I don't mind syncing via my mini script and then just enabling slave mode, but I don't think that is possible as the moment slaveof is executed, it drops all known data and resyncs from scratch (and then fails).
Any ideas ??
I have also seen this error earlier, but the latest bits from 2.8.4 seem to have resolved it https://github.com/MSOpenTech/redis/tree/2.8.4_msopen

Neo4j 2.0.0 - Poor performance for dev/test in a virtual machine

I have Neo4j server running inside a virtual machine using Ubuntu 13.10 and I am accessing via REST using Cypher queries. The virtual machine has 4 GB of memory allocated to it.
I've changed the open file count to 40000, set the initial JVM heap to 1G and my neo4j.properties file is as follows:
neostore.nodestore.db.mapped_memory=250M
neostore.relationshipstore.db.mapped_memory=100M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=100M
keep_logical_logs=3 days
node_auto_indexing=true
node_keys_indexable=id
I've also updated sysctl based on the Neo4j Linux tuning guide:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Since I am testing queries, the basic routine is to run my suite of tests and then delete all of the nodes and run them all again. At the start of each test run, the database has 0 nodes in it. My suite of tests of about 100 queries is taking 22 seconds to run. Basic parameterized creates such as:
CREATE (x:user { email: {param0},
name: {param1},
displayname: {param2},
id: {param3},
href: {param4},
object: {param5} })
CREATE x-[:LOGIN]->(:login { password: {param6},
salt: {param7} } )
are currently taking over 170ms to execute (and that's the average, first query time is 700ms). During a test run, the CPU in the VM never exceeds 50% and memory usage is at a steady 1.4Gb.
Why would creating a single node in an empty database take 170ms? At this point unit testing is becoming almost impossible since it is so slow. This is my first time trying to tune Neo4j so I'm not really sure how to figure out where the problem is or what changes should be made.
Additional Details
I'm using Go 1.2 to make REST calls to the cypher endpoint (http://localhost:7474/db/data/cypher) of a locally installed Neo4j instance. I'm setting the request headers for content-type to "application/json", accept to "application/json" and "X-Stream" to true. I always return either an array of maps or nothing depending on the query.
It seems like the creates are the problem and are taking forever. For example:
2014/01/15 11:35:51 NewUser took 123.314938ms
2014/01/15 11:35:51 NewUser took 156.101784ms
2014/01/15 11:35:52 NewUser took 167.439442ms
2014/01/15 11:35:52 ValidatePassword took 4.287416ms
NewUser creates two new nodes and one relationship and is taking 167ms, while ValidatePassword is a read-only operation and it completes in 4ms. Also note that the three calls to NewUser are identical parameterized queries. While the creates are the big problem, I'm also a little concerned that Neo4j is taking 4ms to just find a labeled node when there are only 100 nodes in the database.
I do not restart the server in between test runs or delete the database. I issue a single delete all nodes query MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r at the end of the test run. Running the same test suite multiple times back to back does not improve the query times.
Are your 100 queries all the same only with different parameters, or actually 100 different queries?
What you see is actually setup work. The parser has to load the parsing rules initially that takes a few ms. Also new queries that have not been seen are compiled, planned and put in the query cache.
So the first query always takes a bit longer. But as you parametrize all subsequent ones should be fast.
Can you confirm that?
I think you see the transactional overhead of flushing the transaction to disk.
Did you try to batch more requests into one? I.e. with the transactional endpoint? Or the /db/data/batch (but I'd rather use the new tx-endpoint /db/data/transaction).
Did you create an index for your lookup property for your validate query?
Can you do me a favor and test your create query without a label? I found some perf issues when testing that myself earlier this week.
Just ran a test with curl
for i in `seq 1 10`; do time curl -i -H content-type:application/json -H accept:application/json -H X-Stream:true -d #perf_test.json http://localhost:7474/db/data/cypher; done
I'm getting between 16 and 30ms per request externally including starting curl
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8; stream=true
Access-Control-Allow-Origin: *
Transfer-Encoding: chunked
Server: Jetty(9.0.5.v20130815)
{"columns":[],"data":[]}
real 0m0.016s
user 0m0.005s
sys 0m0.005s
Perhaps it is rather the VM (disk or network) or the cross-vm communication?
Did another test with ab and 1000 requests for both endpoints, got a mean of about 5 ms both times.
https://gist.github.com/jexp/8452037

Resources