Implementation of Unsaved Changes Detection - algorithm

It seems like three ways to approach detecting unsaved changes in a text/image/data file might be to:
Update a boolean flag every time the user makes a change or saves, which would result in a lot of unnecessary updates.
Keep a cached copy of the original file and diff the two every time a save operation needs to be checked.
Keep a stack of all past operations and push/pop operations as needed, resulting in a lot of extra memory usage.
In general, how do commercial applications detect whether unsaved changes exist and what are the advantages/disadvantages of each approach? I ran into this issue while writing a custom application that has special saving behavior and would like to know if there is a known best practice.

As long as you need an undo/redo system, you need that stack of past operations. To detect in wich state the document is, an item of the stack is set to be the 'saved state'. Current stack node is not that item, the document is changed.
You can see an example of this in Qt QUndoStack( http://doc.qt.nokia.com/stable/qundostack.html ) and its isClean() and setClean()
For proposition 1, updating a boolean is not something problematic and take little time.

This depends on the features you want and the size/format of the files, I guess.
The first option is the simplest, and it gives you just what you want with minial overhead.
The second option has the advantage that you can detect when changes have been manually reverted, so that there is no real change after all (although that probably doesn't happen all too often). On the other hand it is much more costly to make a diff just to check if anything was modified. You probably don't want to do that everytime the user presses a key.
The third option gives the ability to provide an undo-history. You could limit the number of items in that history by grouping changes together that were made consecutively (without moving the cursor in between), or something like that.

Related

Which one is the acceptable convention for undo / redo behavior?

Suppose this is a mathematical application that manipulates waveforms. User opens a waveform file, and edits it.
Now user amplifies waveform using the application toolbox. Amplification may take a long time. Then they undo it. And then they redo it again.
For redo, how should the application behave?
Replace result of amplification that was performed before and was held in memory internally by the application.
Re-run the time-consuming amplification procedure again?
This problem expands to every functionality.
Thanks :-)
Undo-redo actions are typically implemented using a stack of commands and the position of the last executed command. After an undo, you move that position one step backward without removing the last command of the stack (now one past the last executed command).
Whether you store data along with your commands is up to you. However, in the situation you describe, for the undo to work, you need to either reverse the last command perfectly (and you may not be able to), or keep the previous state in memory. So keeping the next state, after the amplification, should not cost you anything in this particular case. However, you do need to properly discard it if the user does not redo that command.
As #eddiewould pointed out in a comment, a command can be inherently destructive (so unreversible) and allowing to undo it (by keeping the previous state) can be unfeasable, for example because of memory restrictions. In such a case, you should inform the user beforehand that going back will be impossible.

How the UNDO and REDO feature in any TEXT EDITOR is implemented? [duplicate]

Part of my project is to write a text editor that is used for typing some rules, compiling my application and running it. Writing compiler was end and release beta version. In the final version we must add undo and redo to the text editor. I use a file and save it periodically for the text editor. How to design undo and redo to my text editor? What is changed in the structure of persistent of file?
You can model your actions as commands, that you keep in two stacks. One for undo, another for redo. You can compose your commands to create more high-level commands, like when you want to undo the actions of a macro, for example; or if you want to group individual keystrokes of a single word, or phrase, in one action.
Each action in your editor (or a redo action) generates a new undo command that goes into the undo stack (and also clears the redo stack). Each undo action generates the corresponding redo command that goes into the redo stack.
You can also, as mentioned in the comments by derekerdmann, combine both undo and redo commands into one type of command, that knows how to undo and redo its action.
There are basically two good ways to go about it:
the "Command" design pattern
using only OO over immutable objects, where everything is just immutable objects made of immutable objects made themselves of immutable objects (this is less common but wonderfully elegant when done correctly)
The advantage of using OO over immutable objects over the naive command or the naive undo/redo is that you don't need to think much about it: no need to "undo" the effect of an action and no need to "replay" all the commands. All you need is a pointer to a huge list of immutable objects.
Because objects are immutable all the "states" can be incredibly lightweight because you can cache/reuse most objects in any state.
"OO over immutable objects" is a pure jewel. Probably not gonna become mainstream before another 10 years that said ; )
P.S: doing OO over immutable objects also amazingly simplifies concurrent programming.
If you don't want anything fancy, you can just add an UndoManager. Your Document will fire an UndoableEdit every time you add or remove text. To undo and redo each change, simply call those methods in UndoManager.
The downside of this is UndoManager adds a new edit each time the user types something in, so typing "apple" will leave you with 5 edits, undoable one at a time. For my text editor, I wrote a wrapper for edits that stores the time it was made in addition to text change and offset, as well as an UndoableEditListener that concatenates new edits to previous ones if there is only a short period of time between them (0.5 seconds works well for me).
This works well for general editting, but causes problems when a massive replace is done. If you had a document with 5000 instances of "apple" and you wanted to replace this with "orange", you'd end up with 5000 edits all storing "apple", "orange" and an offset. To lower the amount of memory used, I've treated this as a separate case to ordinary edits and am instead storing "apple", "orange" and an array of 5000 offsets. I haven't gotten around to applying this yet, but I know that it'll cause some headaches when multiple strings match the search condition (eg. case insensitive search, regex search).
Wow, what a conicidence - I have literally in the last hour implemented undo/redo in my WYSIWYG text editor:
The basic idea is to either save the entire contents of the text editor in an array, or the difference between the last edit.
Update this array at significant points, i.e. every few character (check the length of the content each keypress, if its more than say 20 characters different then make a save point). Also at changes in styling (if rich text), adding images (if it allows this), pasting text, etc. You also need a pointer(just an int variable) to point at which item in the array is the current state of the editor)
Make the array have a set length. Each time you add a save point, add it to the start of the array, and move all of the other data points down by one. (the last item in the array will be forgotten once you have so many save points)
When the user presses the undo button, check to see if the current contents of the editor are the same as the latest save (if they are not, then the user has made changes since the last save point, so save the current contents of the editor (so it can be redo-ed), make the editor equal to the last save point, and make the pointer variable = 1 (2nd item in array ). If they are they same, then no changes have been made since the last save point, so you need to undo to the point before that. To do this, increment the pointer value + 1, and make the contents of the editor = the value of pointer.
To redo simply decrease the pointer value by 1 and load the contents of the array (make sure to check if you have reached the end of the array).
If the user makes edits after undoing, then move the pointed value array cell up to cell 0, and move the rest up by the same amount (you dont want to redo to other stuff once they've made different edits).
One other major catch point - make sure you only add a save point if the contents of the text editor have actually changed (otherwise you get duplicate save points and it will seem like undo is not doing anything to the user.
I can't help you with java specifics, but I'm happy to answer any other questions you have,
Nico
You can do it in two ways:
keep a list of editor states and a pointer in the list; undo moves the pointer back and restores the state there, redo moves forward instead, doing something throws away everything beyond the pointer and inserts the state as the new top element;
do not keep states, but actions, which requires that for every action you have a counteraction to undo the effects of that action
In my (diagram) editor, there are four levels of state changes:
action fragments: these are part of a larger action and not separately undoable or redoable
(e.g. moving the mouse)
actions: one or more action fragments that form a meaningful change which can be undone or redone,
but which are not reflected in the edited document as changed on disk
(e.g. selecting elements)
document changes: one or more actions that change the edited document as it would be saved to disk
(e.g. changing, adding or deleting elements)
document saves: the present state of the document is explicitly saved to disk - at this point my editor throws away the undo history, so you can't undo past a save
This is a job for the command pattern.
Here is a snippet that shows how SWT supports Undo/Redo operations. Take it as practical example (or use it directly, if your editor is based on SWT):
SWT Undo Redo
Read a book Design Patterns: Elements of Reusable Object-Oriented Software. As far as I remember, there is a pretty good example.

Design a share, re-share functionality for a website, avoiding duplication

This is an interesting interview question that I found somewhere. To elaborate more:
You are expected to design classes and data structures for some website such as facebook or linkedin where your activity can be shared and re-shared. Design should be such that it avoids redundancy and duplication.
While thinking of this problem I was stuck on "link vs copy" problem as discussed here
But since the problem states that duplication should be avoided I decided to go "link" way. This makes sharing/re-sharing easier but deleting very difficult. i.e. if the original user deletes their post all the shares should be deleted. (programmatically speaking all the objects on the pointing to the particular activity should be made null. And this is the difficult part here, i.e. to find all the pointing objects)
Wouldn't it be better to keep the shares? The original user deletes
their post, fine, it's gone. But everyone who has linked to it should
not suddenly have it disappear on them.
This could be done the way Unix handles hard links. "Deleting" just
means removing one link to an object -- an inode, in Unix terms. You
don't remove the object itself until the link count is zero.
It's not obvious from the original specification that deletion should work as you describe. It might be desired that when the original user deletes the item, it is not deleted elsewhere; in that case you don't necessarily need to track all references, just keep a reference count on each post, and remove it from the database only when the count hits zero.
If you do want the behavior you describe, it may be achievable by simply removing broken links as and when you encounter them, again relieving you of the need to track each reference. The cost of tracking and updating every reference to every post is replaced with the comparable cost of one failed lookup for each referring page. The latter case is simpler to implement, though, and the cost doesn't hit your server all at once.
In real life, I would implement all references as bidirectional anyway, because it's likely to be needed sooner or later as you add features. For example, a "like" counter seems pretty simple, but to prevent duplicate votes you need to keep track of who has liked each item, and then if you want to remove their "like" when they delete their profile, you need to keep a list of each user's outbound "likes" too.
It takes a lot of database activity to implement something like Facebook...

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info
I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)
I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).
I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.
Question
Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?
If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.
Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).
Also note that formulas (and, of course, functions) have environments so watch out for those too.
If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)
See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.
(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.

Is RegNotifyChangeKeyValue as coarse as it seems?

I've been using ReadDirectoryChangesW to monitor a particular portion of the file system. It rather nicely provides a partial pathname to the file or directory which changed along with a clue about the nature of the change. This may have spoiled me.
I also need to monitor a particular portion of the registry, but it looks as if RegNotifyChangeKeyValue is very coarse. It will tell me that something under the given key changed, but it doesn't seem to want to tell me what that something might have been. Bummer!
The portion of the registry in question is arbitrarily deep, so enumerating all the sub-keys and calling RegNotifyChangeKeyValue for each probably isn't a hot idea because I'll eventually end up having to overcome MAXIMUM_WAIT_OBJECTS. Plus I'd have to adjust the set of keys I'd passed to RegNotifyChangeKeyValue, which would be a fair amount of effort to do without enumerating the sub-keys every time, which would defeat a fair amount of the purpose.
Any ideas?
Unfortunately, yes. You probably have to cache all the values of interest to your code, and update this cache yourself whenever you get a change trigger, or else set up multiple watchers, one on each of the individual data items of interest. As you noted the second solution gets unwieldy very quickly.
If you can implement the required code in .Net you can get the same effect more elegantly via RegistryEvent and its subclasses.

Resources