Why is the EntryID Changing in VSTO? The MailItem is not moving folders - outlook

I'm writing some code in C# that matches a pattern from the subject and then ingests the email. To initialize my datastore, I go through the current Microsoft.Office.Interop.Outlook.Table.
while (!table.EndOfTable)
{
Row row = table.GetNextRow();
string entryId = row["EntryID"].ToString();
this.SaveInXML(entryId, row);
}
It seems pretty simple. Well, I also have an event (Application.ItemLoad) that I'm watching, too. I notice that in the event the MailItem's EntryID is completely different than the Table's EntryID. In fact, the string lengths are not even the same (See example below). Why is this? Shouldn't they be the same? The item has not moved folders, so I'd assume it's the same. Thank you, all.
Example code:
NameSpace ns = this.Folder.Application.GetNamespace("MAPI");
var mi = ns.GetItemFromID("EF0000003E65593F1D361C44AFBFA24E6F365D6E04782F00") as MailItem;
string entryId = mi.EntryID;
System.Diagnostics.Debug.WriteLine("EF0000003E65593F1D361C44AFBFA24E6F365D6E04782F00");
System.Diagnostics.Debug.WriteLine(entryId);
// Output Produced:
// EF0000003E65593F1D361C44AFBFA24E6F365D6E04782F00
// 000000003E65593F1D361C44AFBFA24E6F365D6E0700CC348F1AD97A224B9898503750437E4700000000010C0000CC348F1AD97A224B9898503750437E470000F59160590000
//
// Notice that the second WriteLine isn't even remotely close to the EntryID that I requested.

Entry identifiers come in two types: short-term and long-term.
Short-term entry identifiers are faster to construct, but their uniqueness is guaranteed only over the life of the current session on the current workstation.
Long-term entry identifiers have a more prolonged lifespan. Short-term entry identifiers are used primarily for rows in tables and entries in dialog boxes, whereas long-term entry identifiers are used for many objects such as messages, folders, and distribution lists.
Use the MailItem.EntryID property if you need to get a long-term entry identifiers.
Entry identifiers cannot be compared directly because one object can be represented by two different binary values. Use the NameSpace.CompareEntryIDs method to determine whether two entry identifiers represent the same object.

As Eugene noted, you have two kinds of entry ids - long term and short term. Even for long-term entry ids, they can be different depending on how the item was opened. Long term entry ids always start with "00000000". Short term entry ids can only be used in the current MAPI session and therefore should not be persisted to be used across different sessions.
You must treat entry id as black boxes and never compare them directly - always use Namespace.CompareEntryIDs.

Related

SNMP OID with non-unique node names

I am writing an extension to my companies existing SNMP MIB. I have a whole list of objects, with the same properties on each. I want to be able to get and set these through SNMP.
So for example, consider my object has name, desc, arg0, arg1. What I want is to be able to refer to these as:
fullpath.objects.ObjectA.name
fullpath.objects.ObjectA.desc
fullpath.objects.ObjectA.arg0
fullpath.objects.ObjectB.name
fullpath.objects.ObjectB.desc
fullpath.objects.ObjectB.arg0
However the leaf nodes appear to have to have unique names, so I am unable to define this.
I can use a SNMP table to produce:
fullpath.objects.table.name.1
fullpath.objects.table.desc.1
fullpath.objects.table.arg0.1
fullpath.objects.table.name.2
fullpath.objects.table.desc.2
fullpath.objects.table.arg0.2
But there is nowhere to look up that 2 means ObjectB. This leaves it open to user error looking up the wrong value and setting the wrong thing.
At the moment the best solution I can see is:
fullpath.objects.ObjectAName
fullpath.objects.ObjectADesc
fullpath.objects.ObjectAArg0
fullpath.objects.ObjectBName
fullpath.objects.ObjectBDesc
fullpath.objects.ObjectBArg0
which involves defining name for every object (there are 20 or so of them). The set of objects is fixed, so this is ok...just not very tidy.
Is there some way to define names for index in the table?
Is there some way of defining a container type?
Is there some way of allowing leaf nodes to be non-unique?
Any other ideas?
You should definitely use SNMP tables to accomplish what is required. This is the only way.
MIB Object names must be unique within entire MIB file.
You can easily use object of OCTET STRING type as Table index. So each byte/symbol/char of OCTET STRING value will be translated to corresponding numeric ASCII code in OID.
I ended up just using a naming convention and adding each of the settings directly into the MIB.
Not really the answer I wanted, but it means that all of the settings show up in the MIB, and that reduces the chance of users setting the wrong setting.

What kind of data structure will be best for storing a key-value pair where the value will be a String for some key and a List<String> for some keys?

For example, key 1 will have values "A","B","C" but key 2 will have value "D". If I use
Map<String, List<String>>
I need to populate the List<String> even when I have only single String value.
What data structure should be used in this case?
Map<String,List<String>> would be the standard way to do it (using a size-1 list when there is only a single item).
You could also have something like Map<String, Object> (which should work in either Java or presumably C#, to name two), where the value is either List<String> or String, but this would be fairly bad practice, as there are readability issue (you don't know what Object represents right off the bat from seeing the type), casting happens during runtime, which isn't ideal, among other things.
It does however depend what type of queries you plan to run. Map<String,Set<String>> might be a good idea if you plan of doing existence checks in the List and it can be large. Set<StringPair> (where StringPair is a class with 2 String members) is another consideration if there are plenty of keys with only 1 mapped value. There are plenty of solutions which would be more appropriate under various circumstances - it basically comes down to looking at the type of queries you want to perform and picking an appropriate structure according to that.

plone.scale annotation bloated with (usesless?) scales

While investigating a ConflictError (see this previous question) I saw a lot of persistent.mapping.PersistentMapping conflicts.
Looking at a specific one it turned out to be a PersistentMapping for plone.scale.
Turns out that a random object with just one image has 562 keys on it, no wonder why it gets a conflict error...
Some context on the object that holds this plone.scale annotation:
- dexterity content type
- one of its behaviors has an image field (plone.namedfile.field.NamedBlobImage)
The code to see it is as following:
Start a debugging instance: ./bin/instance debug
from ZODB.utils import p64
OID = 0x568428 # got from zeo client logs
mapping = app._p_jar[p64(OID)]
len(mapping) # that returns 562
The mysterious part is that only 4 keys on that persistent mapping are tuples, while the other 558 are just hashes.
A brief look at plone.scale.storage.AnnotationStorage.scale method seems to imply that there should be only one to one relation from tuples and hashes keys on the persistent mapping.
Further investigating the elements reveals that, indeed, if you look at the width and height properties from all elements there are only 4 different combinations (the ones from the tuples itself).
As a new scale is generated whenever the modified time is bigger (see the scale method pointed above) and plone.namedfield.scaling.ImageScaling.modified uses context as the source for modified, that means that at every single update of the object a new scale will be generated?
So two questions arise from the previous:
my assumption of only 4 scales are really used and the other 558 are old and useless is true?
provided a yes on the previous, shouldn't they be cleaned up then?
You may be right, but surely the correct place to report this is https://dev.plone.org/newticket

PROPERTYKEY::pid Meaning

Can someone explain what the pid field in a PROPERTYKEY structure is? Microsoft says just don't use 0 or 1 and you're fine, but this doesn't help when I need to implement IPropertyStore in my code. Is the pid supposed to be part of the key, so multiple values with the same fmtid but different pid may be present? Or should it be ignored, so GetValue should return any value with a matching fmtid, ignoring pid?
This fmtid+pid combination is historically related to OLE (yes, that's pretty old).
At that time, the fmtid (format id) was like a category, and the pid (property id) was the property identifier in the category. For example, you have here the first FMTID defined: Predefined Property Set Format Identifiers. These properties are still used for Office documents (Author, Keywords, etc.) So you had many properties per category (so few fmtid for a lot of properties), but the combination of both fmtid and pid always make the property unique across space and galaxies.
Others FMTID appeared since, you can have a look at in in Windows SDK's propkey.h: FMTID_AudioSummaryInformation, FMTID_Volume, FMTID_ShellDetails, etc...
Today, for some new properties, the FMTID does not mean anything anymore. For example, the System.Contact.Birthday has a fmtid of 176DC63C-2688-4E89-8143-A347800F25E9 and an id of 47, but the fmtid has no special meaning, and is not defined specifically, so it in fact could be used solely for the key.
So, for a given property, you must consider the key is still the combination of both (hence the structure name: PROPERTYKEY), but you could define your own properties with pid as something greater or equal to 2 (as the official doc specifies) and fmtid as a new guid if you prefer. I personally still prefer to define one common FMTID for a group of properties.

Creating an id from name and address data. Hash/Digest

My problem:
I'm looking for a way to represent a person's name and address as an encoded id. The id should contain only alpha-numeric characters, be collision-proof, and be represented in a smallest number of characters possible. My first thought was to simply use a cryptographic hash function like MD5 or SHA1, but this seems like overkill (security isn't important - doesn't need to be one-way) and I'd prefer to find something that would produce a shorter id. Does anyone know of an existing algorithm that fits this problem?
In other words, what is the best way to implement the following function so that the return value is the same consistently for the same input, collisions are unlikely, and ids are less than 20 characters?
>>> make_fake_id(fname = 'Oscar', lname = 'Grouch', stnum = '1', stname = 'Sesame', zip = '12345')
N1743123734
Application Context (for those that are interested):
This will be used for a record linkage app. Given an input name and address we search a very large database for the best match and return the database id and other data (how we do this is not important here). If there isn't a match I need to generate this psuedo/generated/derived id from the search input (entity's name and address data). Every search record should result in an output record with either a real (the actual database id resulting from a match/link) or this generated psuedo/generated/derived id. The psuedo id will be prefixed with a character (e.g. N) to differentiate it from a real id.
I know you said no to MD5 and SHA1, but I think you should consider them anyway. As well as being well studied hashing algorithms, the length gives you more protection against possible collisions. No hash is collision-proof, but the cryptographic ones generally are less collision-prone than something you couuld come up with yourself.
Use a cryptographic hash for its collision resistance, not its other qualities
Use as many bytes from the hash as you want (truncate)
convert to alpha-numeric characters
You can also truncate the alpha-numeric string instead of the hash
An easy way to do this: hash the data, encode in base64, remove all non-alpha-numeric characters, truncate.
N_HASH_CHARS = 11
import hashlib, re
def digest(name, address):
hash = hashlib.md5(name + "|" + address).digest().encode("base64")
alnum_hash = re.sub(r'[^a-zA-Z0-9]', "", hash)
return alnum_hash[:N_HASH_CHARS]
How many alpha-numeric characters should you keep? Each character gives you around 5.95 bits of entropy (log(62,2)). 11 characters give you 65.5 bits of entropy, which should be enough to avoid a collision for the first 2**32.7 users (about 7 billion).
A good solution is somewhat dependent on your application. Do you know how many users and what the set of all users is? If you provide more details you would get better help.
I agree with the other poster suggesting serial numbers. OTOH, if you really, really really want to do something else:
Create a SHA1 hash from the data, and store it in a table with a serial number field.
Then, when you get the data, calculate the hash, look it up on the table, get the serial, and that's your id. If it's not on the table, insert it.
I wonder whether you intend to "assign" these ids to the users? If so, I would expect your users to hate anything that you propose; who would want a user id of "AAAAA01"?
So, if these ids are visible to the user, then you should just let them pick what they like and check them for uniqueness (easy). If they are not visible to the user (e.g., internal primary key), then just generate them sequentially using an appropriate technique such as an Oracle Sequence or SQL Server AutoNumber (also easy).
If these ids are an attempt to detect a user that is registering more than once, then I would agree that you should consider a cryptographic hash followed by a full comparison of the registration data (name, address, etc.). However, to be usable, you will need to translate the data into a canonical form (standardized letter case, whitespace, canonical street address, etc.) before computing the hash or making the comparison. Otherwise, you will mismatch based on trivial differences.
EDIT: Now that I understand the problem space better based on your edits, I think that it is highly unlikely that your algorithm (so far) will catch most matches. Beyond my suggestion to canonicalize the inputs, I recommend that you consider an approach that results in a ranked list of a handful of possible matches (to be resolved by a human if possible) rather than an all-or-nothing attempt at a single match. In other words, I recommend a search approach rather than a lookup approach.
Is that feasible in your situation?
Well, if there's more than one person at the same address with the same name, you're toast here, (w/o adding code to detect this and add a discriminator of some kind).
but assuming that issue is not, then the street address and zip code portion of the full addresss is sufficient to guaranteee uniqueness there, so adding enough data from the name should take care of the issue...
Do you have access to a database, or other persistence mechanism, where you could generate and maintain key values for each address? Then keep the address and individual entities in two keyed dictionary structures, where the key is autogenerated for each new distinct address, person encountered... and then use the autogenerated alpha-numeric key...
You could use AAAAA01 for first person at first address,
AAAAA02 for second person at first address,
AAAAB07 for the seventh resident at the second adresss, etc.
If you donlt have any way to generate and maintain these entity-Key mappings then you need to use the full street address/Zip and fullNAme, or a hash value of the same, although the Hash value approach has a smnall chance of generating duplicates...

Resources