Howdy,
I was wondering if I was able to alter a distinct element in an XML file saved on a Windows Phone 7 device without actually having to serialize the whole file all over again.
As mentioned in previous answers, you can't save a fragment of XML on its own without saving the whole file. If file size was a big issue, then you could split the data into separate files (perhaps alphabetically; depends on the data), so that you're only saving a smaller dataset if you make a change.
As Matt mentioned, XML serialization does not provide the best performance on WP7 devices. Kevin Marshall has a great blog post detailing different serialization approaches and the performance of each. The fastest method is binary serialization, though there's nothing stopping you serializing the XML using the binary serialization approach.
In general, no - and this has little to do with Windows Phone 7 (although I don't know whether IsolatedStorageFileStream on WP7 even supports seeking).
I don't know of any mainstream filesystems with high level abstractions (such as those used by Java and C#) which allow you to delete or insert data in the middle of a file.
I suppose theoretically if you were happy to pad with whitespace, or never change the length of the data you're using, you could just overwrite the relevant bytes - but I don't think it would be a good idea at all. Very brittle and hard to work with.
Just go for overwriting the whole file.
Related
I know basic SQL, and SQL is all I know when it comes to storing and retrieving data. I want to create 1 .exe and it should contain all ~100,000 key-value pairs (i have the data in .txt files) and maybe an extra attribute for description (this I would add myself - like a note to myself).
I also would like to write it in a new language I don't know yet; like python or C# (I have made desktop apps written in Java & VB.net all with SQL databases). So language will not be an issue and I would appreciate suggestions.
These key-value pairs might not need to be updated and I'm willing to re-compile/repackage the code to make 1 change in the data. The key is 6 letters long and 2 numbers at the end like hxnaaa01. Each of these letters represent or describe something about itself so I would also need to search for a specific letter on a specific position to get exactly what I need.
I know that regex would work well with what I need but all I mentioned is all I know. I don't know enough and I don't know what keywords to google.
I have read about XML and CSV. I don't really know what they are and I'm not sure how all of this would fit in 1 executable.
To summarize, I need:
1 executable (Windows Desktop App)
Search function ~100k KVP+1more attribute (using regex?)
no database
with GUI
ability to add a "note" to each KVP
should be fast and lightweight
1 executable (Windows Desktop App), no database
Data persistence will require either additional files, or a database. It's pretty much unavoidable, you can store data in memory, but it's only persisted for as long as it resides there.
You have another requirement: "fast and lightweight".
To achieve this requirement, you'll need to really think about your solution, what technology you use and how you can improve it in future.
Although searching through data is pretty trivial, an efficient solution is not. It requires upfront research into algorithms, data structures and general practices. (which is a rabbit hole itself).
In the case of JSON [1], you'll need to create an additional file to contain all your key/value pairs, you can use C# to create the extra file (on first launch, for example).
JSON promises to be lightweight, I tend to agree, some may not. When dealing with the filesystem, I think it can be agreed is often far from lightweight solution.
JSON is very readable though:
{
"key": "value",
"comment": "oh this is cool"
}
There's a lot of factors that play into something being fast and lightweight, so there's a need for some research on your part.
Honestly, depending on your experience, I wouldn't focus so much on the fast, I'd focus more on it working, then refactor that into something that's fast if it's too slow. [2]
And again, depending on your experience, I'd stick to opening the file, using a for/loop to find my key and do something with the data found, plus reward myself for having something that works.
TL;DR: you need either a file, or database for truly persistent storage, JSON or a remotely hosted MySQL would work. Try not to focus too much on fast before you have something that works.
https://www.json.org/json-en.html [1]
https://stackoverflow.com/a/5581595/2932298
https://stackify.com/premature-optimization-evil/ [2]
Could anybody give me pointers on how to process Switchboard dataset for training with RETURNN? I did see BlissDataset class that seems to be designed for switchboard, but it's not clear to me what I should include in the paths given in the example:
Example:
./tools/dump-dataset.py "
{'class':'BlissDataset',
'path': '/u/tuske/work/ASR/switchboard/corpus/xml/train.corpus.gz',
'bpe_file': '/u/zeyer/setups/switchboard/subwords/swb-bpe-codes',
'vocab_file': '/u/zeyer/setups/switchboard/subwords/swb-vocab'}"
The switchboard dataset has several folders with audios, i.e. swb1_d2/data/*.sph and transcripts swb1_LDC97S62/swb_ms98_transcriptions/**/*
I'm not quite sure how to proceed with this to get a dataset that can be used to train RETURNN.
At our group (RWTH Aachen University), we use the config as it was published on GitHub. As you see, this one uses ExternSprintDataset. That dataset uses
The implementation uses Sprint (publicly called RWTH ASR (RASR), see here) as an external tool (ran in a subprocess) to handle the data (feature extraction, etc). Sprint gets a Bliss XML file which describes all the segments with path to audio and audio offsets and transcriptions, and also it gets further configs for the feature extraction and maybe other things. There is an open source version of RASR which should work but it might be a bit involved to get this to work.
The BlissDataset was planned to be a simpler replacement for that. However, the implementation is incomplete. Also, you still would need to generate the Bliss XML by yourself in some way (we have used some own internal scripts to prepare that based on the official LDC data).
So, unfortunately, there is no simple way yet. Actually, I think the easiest way would be to come up with yet another custom format, which might be similar to the LibriSpeechDataset implementation, or maybe just the same, and then you could just reuse LibriSpeechDataset, or at least parts of that. That dataset implementation takes the data in some zip format which contains the transcripts in txt files and the audio in ogg or wav files. It uses librosa to do MFCC feature extraction (or also other feature types). I planned to implement that for Switchboard, and then reproduce the results, however I did not have time yet and not sure when I will get to that. But if you want to try that on your own, I will be happy to help you however I can. The starting point would be to look at LibriSpeechDataset and understand how the format of that looks like.
I have almost zero experience coding in Visual Studio, MFC, etc. But I've got several data files that were created in a now-defunct MFC application, which I need to migrate to another format.
Unfortunately there's really no good way, within the application itself, to extract the data (short of copy-pasting hundreds or even thousands of records individually). And viewing the files themselves, i.e. in a Hex Editor, has proven fruitless; even though the raw data stored by the app is text-based, the database files are encoded in some cryptic binary format.
So far I've been able to determine that the app was written using MFC and that it uses the CDocument class (or a simple derivative thereof) to store the files. I understand that CDocument-based data files have something to do with serializing the data, but I'm not sure how to make sense of the encoding.
Does anyone know enough about MFC to explain to me how CDocument actually works?
Does anyone have any ideas on how I might be able to decode these files to extract the text?
I once faced an almost identical scenario. I eventually worked out the code to deserialize the data, but it wasn't easy.
Write a small MFC application to do the work, that way you can leverage the same serialization code that the original app used. The topic of reverse engineering a data format is way too complex to answer here. It's probably not encrypted; more likely compressed.
If you're an experienced programmer you should be able to read the MFC source code, then apply that knowledge to the raw data. Not everything can be heuristically determined just by observing the raw data, but if you have an independent way of determining the actual content, it's certainly possible with sufficient work.
When I thought about resizing images and saving the new sizes parallel on the server, I came to the following question:
// Original size
DSC_18342.jpg
// New size: Use an "x" for "times"
DSC_18342_640x480px.jpg
// New size: Use the real "×" for "times"
DSC_18342_640×480px.jpg
The point is, that it's slightly easier if you got a real × instead of an x in the file name, as the unit px already contains the x, which makes it a little bit harder to read.
Question: What problems could I get in, when using the Html entity in the filename?
Sidenotes: I'm writing an open source, publicly available script, so the targeted server can be anything - therefore I'm also interested (and will vote up) edge cases, that I'm not aware off.
Thank you all!
You may have noticed, that I'm aware, that I could simply avoid it (which I'll do anyway), but I'm interested in this issue and learning about it, so please just take above example as possible case.
There are file systems that simply don't support unicode. This may be less of a problem if you make unicode support a requirement of your application.
Some consideration about different unicode file system are given in File Systems, Unicode, and Normalization.
A concluding remark (from a viewpoint of solaris file systems) is:
Complete compatibility and seamless interoperability with
all other existing Unicode file systems appears not 100%
possible due to inherent differences.
I can imagine that there will be problems especially when migrating the application. Just storing files is probably no problem but if their names are stored in a database there might be a mismatch after migration.
I'm storing a List with around 3,000 objects in Isolatedstorage using Xml serialize.
It takes too long to deserialize this and I was wondering if you have any recommendations to speed it up.
The time is tolerable to deserialize up to 500 objects, but takes forever to deserialize 3000.
Does it take longer just on the emulator and will be faster on the phone?
I did a whole bunch of searching and some article said to use a binary stream reader, but I can't find it. Whether I store in binary or xml doesn't matter, I just want to persist the List.
I don't want to look at asynchronous loading just yet...
Firstly, some good info here already, so +1 there.
I would recommend reviewing these articles, giving you some good perspective on what performance you can expect using a variety of out of the box serialisation techniques.
Windows Phone 7 Serialization: Comparison | eugenedotnet blog
WP7 Serialization Comparison
You might also consider using multiple files if you don't need to load and write everything in one hit all the time.
I would reiterate Jeff's advice that it would really be a good idea to get any substantial work you find remaining after this onto a background thread so as not to degrade the user interaction experience.
It's fairly straight forward. Here is a walkthrough I often recommend and people find concise and helpful.
Phạm Tiểu Giao - Threads in WP7
And also this, recently by Shawn Wildermuth which looks quite good too.
Shawn Wildermuth - Architecting WP7 - Part 9 of 10: Threading
Check out the binary serializer that is a part of sharpSerializer:
http://www.sharpserializer.com/en/index.html
It's very easy and works quite well.
Here's a blog that talks about using it in WP7:
http://www.eugenedotnet.com/2010/12/windows-phone-7-serialization-sharpserializer/
I am using it like (consider this psuedocode, and using the functions listed on eugenedotnet)
in App.xaml.cs:
Application_Dectivated()
{
IsolatedStorageFileStream stream = store.OpenFile(fileName, FileMode.OpenOrCreate);
Serialize(stream,(obj)MyHugeList<myObject>);
}
Application_Activated()
{
IsolatedStorageFileStream stream = store.OpenFile(fileName, FileMode.Open);
Deserialize(stream,(obj)MyHugeList<myObject>);
}
For that many items, you need to build your optimized serialization story. I see many people using simple CSV and text formats to do this.
The built-in serializers just aren't going to be fast enough.
You should really consider doing this all on a background thread for a lot of reasons, though yes you have indicated you do not want to do this.