Realm file size is too large - swift2

I'm trying to integrate Realm into my project and noticed an issue. I've seen other posts on this, but they were a little over a year ago and have been resolved..
When adding objects to Realm, things are file. But when removing objects, they get removed from the DB, but the file size is still large. If I open the realm file in TextEdit, I can see raw text of old records. Why aren't they getting fully deleted?
Take a look at this screenshot. Zero files in the Realm DB, but the file size is 23 mb.
Thanks.

as bcamur sad,
the Realm file will maintain its size on disk to efficiently reuse
that space for future objects
but there is also written
The extra space will eventually be reused by future writes, or may be
compacted — for example by calling
Realm().writeCopyToPath(_:encryptionKey:).
and
call invalidate to tell Realm that you no longer need any of the
objects that you’ve read from the Realm so far, which frees us from
tracking intermediate versions of those objects. The Realm will update
to the latest version the next time it is accessed

I've also realised that my Realm file size grows too big (and it never decreased) and the solution for me was to initialise my Realm db in the following way:
class RealmManager {
static let shared = RealmManager()
private var realm: Realm?
private init() {
let config = Realm.Configuration(schemaVersion: 1, shouldCompactOnLaunch: { totalBytes, usedBytes in
// totalBytes refers to the size of the file on disk in bytes (data + free space)
// usedBytes refers to the number of bytes used by data in the file
// Compact if the file is over 100MB in size and less than 50% 'used'
let oneHundredMB = 100 * 1024 * 1024
return (totalBytes > oneHundredMB) && (Double(usedBytes) / Double(totalBytes)) < 0.5
})
do {
// Realm is compacted on the first open if the configuration block conditions were met.
realm = try Realm(configuration: config)
} catch let error {
// handle error compacting or opening Realm
print(error)
}
}
}
The key is to add shouldCompactOnLaunch block to my Configuration and please, note that the compaction operation will not be completed till you have another process accessing your realm database (e.g.: an opened db in Realm Studio).
For more information, you can check the following link:
https://realm.io/docs/swift/latest/#compacting-realms

Realm holds on to that space to use later on for new objects:
You can also delete all objects stored in a Realm. Note the Realm file will maintain its size on disk to efficiently reuse that space for future objects.
See this part of documentation

Swift Version 3.0.1
For compact your DB:
func compactRealm() {
let defaultURL = Realm.Configuration.defaultConfiguration.fileURL!
let defaultParentURL = defaultURL.deletingLastPathComponent()
let compactedURL = defaultParentURL.appendingPathComponent("default-compact.realm")
autoreleasepool {
let realm = try! Realm()
try! realm.writeCopy(toFile: compactedURL)
try! FileManager.default.removeItem(at: defaultURL)
try! FileManager.default.moveItem(at: compactedURL, to: defaultURL)
}
}

Related

Laravel tagging overhead leaving behind significantly large reference sets using redis

I am using Laravel 9 with the Redis cache driver. However, I have an issue where the internal standard_ref and forever_ref map that Laravel uses to manage tagged cache exceed more than 10MB.
This map consists of numerous keys, 95% of which have already expired/decayed and no longer exist; this map seems to grow in size and has a TTL of -1 (never expire).
Other than "not using tags", has anyone else encountered and overcome this? I found this in the slow log of Redis Enterprise, which led me to realize this is happening:
I checked the key/s via SCAN and can confirm it's a massive set of cache misses. It seems highly inefficient and expensive to constantly transmit 10MB back and forth to find one key within the map.
This quickly and efficiently removes expired keys from the SET data-type that laravel uses to manage tagged cache.
use Illuminate\Support\Facades\Cache;
function flushExpiredKeysFromSet(string $referenceKey) : void
{
/** #var \Illuminate\Cache\RedisStore $store */
$store = Cache::store()->getStore();
$lua = <<<LUA
local keys = redis.call('SMEMBERS', '%s')
local expired = {}
for i, key in ipairs(keys) do
local ttl = redis.call('ttl', key)
if ttl == -2 or ttl == -1 then
table.insert(expired, key)
end
end
if #expired > 0 then
redis.call('SREM', '%s', unpack(expired))
end
LUA;
$store->connection()->eval(sprintf($lua, $key, $key), 1);
}
To show the calls that this LUA script generates, from the sample above:
10:32:19.392 [0 lua] "SMEMBERS" "63c0176959499233797039:standard_ref{0}"
10:32:19.392 [0 lua] "ttl" "i-dont-expire-for-an-hour"
10:32:19.392 [0 lua] "ttl" "aa9465100adaf4d7d0a1d12c8e4a5b255364442d:i-have-expired{1}"
10:32:19.392 [0 lua] "SREM" "63c0176959499233797039:standard_ref{0}" "aa9465100adaf4d7d0a1d12c8e4a5b255364442d:i-have-expired{1}"
Using a custom cache driver that wraps the RedisTaggedCache class; when cache is added to a tag, I dispatch a job using the above PHP script only once within that period by utilizing a 24-hour cache lock.
Here is how I obtain the reference key that is later passed into the cleanup script.
public function dispatchTidyEvent(mixed $ttl)
{
$referenceKeyType = $ttl === null ? self::REFERENCE_KEY_FOREVER : self::REFERENCE_KEY_STANDARD;
$lock = Cache::lock('tidy:'.$referenceKeyType, 60 * 60 * 24);
// if we were able to get a lock, then dispatch the event
if ($lock->get()) {
foreach (explode('|', $this->tags->getNamespace()) as $segment) {
dispatch(new \App\Events\CacheTidyEvent($this->referenceKey($segment, $referenceKeyType)));
}
}
// otherwise, we'll just let the lock live out its life to prevent repeating this numerous times per day
return true;
}
Remembering that a "cache lock" is simply just a SET/GET and Laravel is responsible for many of those already on every request to manage it's tags, adding a lock to achieve this "once per day" concept only adds negligible overhead.

Why does RBD snap id start from 4?

I'm a newbee Ceph developer, and recenlty reading code of snapshots. From
pg_pool_t::add_unmanaged_snap, it's obvious that the first RBD snapshot id should
start from 2, but in reality, it starts from 4, I wonder whether there are some organisms
in RBD snap, which increments snap_seq, could anyone help me?
Thanks in advance!
Below is
the code of pg_pool_t::add_unmanaged_snap.
void pg_pool_t::add_unmanaged_snap(uint64_t& snapid)
{
ceph_assert(!is_pool_snaps_mode());
if (snap_seq == 0) {
// kludge for pre-mimic tracking of pool vs selfmanaged snaps. after
// mimic this field is not decoded but our flag is set; pre-mimic, we
// have a non-empty removed_snaps to signifiy a non-pool-snaps pool.
removed_snaps.insert(snapid_t(1));
snap_seq = 1;
}
flags |= FLAG_SELFMANAGED_SNAPS;
snapid = snap_seq = snap_seq + 1;
}
Following screenshot is the process of rbd snapshot creation on a totally new rbd pool. Obviously, snapshot id here starts from 4
rbd snapshot creation on a totally new rbd pool

C# NEST Bulk api failing with System.IO.IOException [duplicate]

This question already has an answer here:
Elasticsearch bulk insert with NEST returns es_rejected_execution_exception
(1 answer)
Closed 5 years ago.
I am trying to bulk insert data from SQL to ElasticSearch index. Below is the code I am using and total number of records is around 1.5 million. I think it something to do with connection setting but I am not able to figure it out. Can someone please help with this code or suggest better way to do it?
public void InsertReceipts
{
IEnumerable<Receipts> receipts = GetFromDB() // get receipts from SQL DB
const string index = "receipts";
var config = ConfigurationManager.AppSettings["ElasticSearchUri"];
var node = new Uri(config);
var settings = new ConnectionSettings(node).RequestTimeout(TimeSpan.FromMinutes(30));
var client = new ElasticClient(settings);
var bulkIndexer = new BulkDescriptor();
foreach (var receiptBatch in receipts.Batch(20000)) //using MoreLinq for Batch
{
Parallel.ForEach(receiptBatch, (receipt) =>
{
bulkIndexer.Index<OfficeReceipt>(i => i
.Document(receipt)
.Id(receipt.TransactionGuid)
.Index(index));
});
var response = client.Bulk(bulkIndexer);
if (!response.IsValid)
{
_logger.LogError(response.ServerError.ToString());
}
bulkIndexer = new BulkDescriptor();
}
}
Code works fine but takes around 10 mins to complete. When I try to increase batch size, it fails with below error:
Invalid NEST response built from a unsuccessful low level call on
POST: /_bulk
Invalid Bulk items: OriginalException: System.Net.WebException: The
underlying connection was closed: An unexpected error occurred on a
send. ---> System.IO.IOException: Unable to write data to the
transport connection: An existing connection was forcibly closed by
the remote host. ---> System.Net.Sockets.SocketException: An existing
connection was forcibly closed by the remote host
A good place to start is with batches of 1,000 to 5,000 documents or, if your documents are very large, with even smaller batches.
It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1KB documents is very different from one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size.
I had a similar problem. My problem was solved by adding following code, before the ElasticClient connection is established:
System.Net.ServicePointManager.Expect100Continue = false;

AWS multipart upload from inputStream has bad offfset

I am using the Java Amazon AWS SDK to perform some multipart uploads from HDFS to S3. My code is the following:
for (int i = startingPart; currentFilePosition < contentLength ; i++)
{
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - currentFilePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(s3Name)
.withUploadId(currentUploadId)
.withPartNumber(i)
.withFileOffset(currentFilePosition)
.withInputStream(inputStream)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
currentFilePosition += partSize;
inputStream.close();
lastFilePosition = currentFilePosition;
}
However, the uploaded file is not the same as the original one. More specifically, I am testing on a test file, which has about 20 MB. The parts I upload are 5 MB each. At the end of each 5MB part, I see some extra text, which is always 96 characters long.
Even stranger, if I add something stupid to .withFileOffset(), for example,
.withFileOffset(currentFilePosition-34)
the error stays the same. I was expecting to get other characters, but I am getting the EXACT 96 extra characters as if I hadn't modified the line.
Any ideas what might be wrong?
Thanks,
Serban
I figured it out. This came from a stupid assumption on my part. It turns out, the file offset in ".withFileOffset(...)" tells you the offset where to write in the destination file. It doesn't say anything about the source. By opening and closing the stream repeatedly, I am always writing from the beginning of the file, but to a different offset. The solution is to add a seek statement after opening the stream:
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
inputStream.seek(currentFilePosition);

ContainerLaunchContext.setResource() missing of hadoop yarn

http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
I am try to make the example work well from the above link.but I can't compile the code below
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(512);
amContainer.setResource(capability);
// Set the container launch content into the
// ApplicationSubmissionContext
appContext.setAMContainerSpec(amContainer);
amContainer is ContainerLaunchContext and my hadoop version is 2.1.0-beta.
I did some investigation. I found there's no method "setResource" in ContainerLaunchContext
I have 3 question about this
1) the method has been removed or something?
2) if the method has been removed, how can I do now?
3) is there any doc about yarn, because I found the doc in website is very easy, I hope I can get a manual or something. for example,
capability.setMemory(512);
I don't know it's 512k or 512M according comments in code.
This is actually proper solution to the question. Previous answer might cause incorrect execution !!!
#Dyin I couldn't fit it in the comment ;) Validated for 2.2.0 and 2.3.0
Driver setting up resources for AppMaster:
ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
ApplicationId appId = appContext.getApplicationId();
appContext.setApplicationName(this.appName);
// Set up the container launch context for the application master
ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(amMemory);
appContext.setResource(capability);
appContext.setAMContainerSpec(amContainer);
Priority pri = Records.newRecord(Priority.class);
pri.setPriority(amPriority);
appContext.setPriority(pri);
appContext.setQueue(amQueue);
// Submit the application to the applications manager
yarnClient.submitApplication(appContext); // this.yarnClient = YarnClient.createYarnClient();
In ApplicationMaster this is how you should specify resources for containers (workers).
private AMRMClient.ContainerRequest setupContainerAskForRM() {
// setup requirements for hosts
// using * as any host will do for the distributed shell app
// set the priority for the request
Priority pri = Records.newRecord(Priority.class);
pri.setPriority(requestPriority);
// Set up resource type requirements
// For now, only memory is supported so we set memory requirements
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(containerMemory);
AMRMClient.ContainerRequest request = new AMRMClient.ContainerRequest(capability, null, null,
pri);
return request;
}
Some run() or main() method in your AppMaster
AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
resourceManager = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
resourceManager.init(conf);
resourceManager.start();
for (int i = 0; i < numTotalContainers; ++i) {
AMRMClient.ContainerRequest containerAsk = setupContainerAskForRM();
resourceManager.addContainerRequest(containerAsk); //
}
Launching containers
You can use the original answer solution (java cmd), but it's just a cherry on top. It should work anyway.
You can set memory available to ApplicationMaster via commend. As such:
// Set the necessary command to execute the application master
Vector<CharSequence> vargs = new Vector<CharSequence>(30);
...
vargs.add("-Xmx" + amMemory + "m"); // notice "m" indicating megabytes, you can use also -Xms combined with -Xmx
... // transform vargs to String commands
amContainer.setCommands(commands);
This should solve your problem. As for the 3 questions. Yarn is rapidly evolving software. My advice forget documentation, get source code and read it. This will answer a lot of your questions.

Resources