Power User Tips: The CRM 4.0 Metadata Cache
CRM MVP Michael Höhne is our guest blogger today.
I don’t know about you but I’m using the CRM metadata in almost all of my applications. It may be as simple as the need for an object type code and can be as complex as implementing some intellisense features. Besides some really basic programs I always find myself using the metadata service.
Of course I have created a bunch of wrapper classes for CRM and some of them implement a metadata cache. This, or probably the next article will include the entire project implementing the metadata cache. Before that I want to write about why a metadata cache is useful and what features it needs to be reusable.
The most basic implementation of a metadata cache is this:
static CrmMetadata[] RetrieveAllMetadata(MetadataService service, MetadataItems itemsToRetrieve) {
RetrieveAllEntitiesRequest request = new RetrieveAllEntitiesRequest()
{
MetadataItems = itemsToRetrieve,
RetrieveAsIfPublished = false
};
RetrieveAllEntitiesResponse response = (RetrieveAllEntitiesResponse) service.Execute(request);
return response.CrmMetadata;
}
It just retrieves the entire metadata at once, which can then be used to build internal dictionaries for fast access. This is easy and if your application doesn’t have to check for changes in the metadata you can use it without problems. Retrieving the entire metadata takes a long time though and can dramatically slow-down the startup sequence of your application. To know how long it takes, I created a simple test method calling the above code with different parameters:
MetadataItems[] itemList =
{
MetadataItems.All,
MetadataItems.EntitiesOnly,
MetadataItems.IncludeAttributes,
MetadataItems.IncludePrivileges,
MetadataItems.IncludeRelationships,
MetadataItems.IncludeAttributes | MetadataItems.IncludePrivileges,
MetadataItems.IncludeAttributes | MetadataItems.IncludeRelationships,
MetadataItems.IncludePrivileges | MetadataItems.IncludeRelationships,
MetadataItems.All
};
XmlSerializer s = new XmlSerializer(typeof(EntityMetadata[]));
foreach (MetadataItems itemsToRetrieve in itemList) {
Console.Write(itemsToRetrieve + “: “);
DateTime startTime = DateTime.Now;
CrmMetadata[] allEntities = RetrieveAllMetadata(service, itemsToRetrieve);
TimeSpan time = DateTime.Now.Subtract(startTime);
int bytes;
EntityMetadata[] entities = new EntityMetadata[allEntities.Length];
for (int i = 0; i < allEntities.Length; i++) {
entities[i] = (EntityMetadata) allEntities[i];
}
StringBuilder sb = new StringBuilder();
using (StringWriter w = new StringWriter(sb)) {
s.Serialize(w, entities);
bytes = sb.Length;
}
Console.WriteLine(“{0}ms – {1}KB”, (int) (time.Ticks / 10000), bytes >> 10);
}
All possible combinations of MetadataItems are used and the time needed to return the data is written to a console window. I’m also serializing the returned data to know about the size of the transferred data. It may not be totally correct, but helps to compare the results. The reason for adding MetadataItems.All to the top and the end of the list is to ensure that the entire CRM metadata is cached at the CRM Server and the reported times are not influenced by any initializations that had to be performed to fulfill the request. The output is quite interesting:
Performing SimpleCache benchmark.
All: 23393ms – 20918KB
EntitiesOnly: 1566ms – 499KB
IncludeAttributes: 5577ms – 16190KB
IncludePrivileges: 2034ms – 746KB
IncludeRelationships: 16582ms – 4980KB
IncludeAttributes, IncludePrivileges: 6074ms – 16437KB
IncludeAttributes, IncludeRelationships: 21584ms – 20671KB
IncludePrivileges, IncludeRelationships: 19269ms – 5227KB
All: 22042ms – 20918KB
The interesting thing is not the reported time itself, which is much higher than you get on a well-dimensioned system. I run this sample on my notebook, with CRM 4 installed natively. The interesting thing is that retrieving relationship information is a real bottleneck. There are certainly more attributes in the system than relationships, so one could expect that retrieving attributes would take longer than retrieving relationships. Instead relationships are slower by a factor of three.
The following table shows the same data in a more accurate way and adds some calculated columns:
The time column lists the time required to execute the request. The next column (-Entity) shows the time required for a call minus the time required to retrieve entities only. I added this column because the entity information is always returned, which is the reason why the MetadaItems enumeration defines IncludeAttributes, IncludeRelationships and IncludePrivileges. The “Include” means that the entity is returned with attributes, relationships or privileges. The entity itself is always returned. To measure how much the retrieval of attributes adds I’m simply deducting the time of EntitiesOnly. The same is done for the size.
As expected, the size of the attributes (15691KB) is much larger than the size of the relationships (4481KB). Using 14768/3443*15691/4481 gives 15,02. It means that returning relationships is slower than returning attributes by a factor of 15. It also means that you should really think about what metadata items are needed. Requesting all entities with MetadataItems.All seems a good choice, but if you don’t need some items, especially relationships, then don’t request them. As you can see in the last table, the metadata was returned as fast as 1.54 seconds (EntitiesOnly) and as slow as 20.73 seconds (All). Interestingly the combination of attributes and relationships was even slower than that, but when repeating the test long enough then this tiny error shouldn’t occur anymore.
The last two columns show the time required to retrieve the data compared to the amount of data returned. The last column uses the same values as the Time/Size column, but adjusts the data to use 1 for the lowest value. You find the 15,02 from the beginning and the other values show that attributes really are processed much faster than anything else. Even privileges are more than 5 times slower, but they don’t count because of the small data size (only 302KB on a freshly created organization). Entities and relationships are the slowest parts and as you always retrieve the entity information, be extremely careful with including the IncludeRelationships flag.
Of course the time needed to retrieve all entities doesn’t say too much without knowing how much time it takes to retrieve the entities one by one. So I did the same test again, but this time using the account, contact and opportunity entity only:
static EntityMetadata[] RetrieveEntities(MetadataService service, string[] entityNames, EntityItems itemsToRetrieve) {
EntityMetadata[] entities = new EntityMetadata[entityNames.Length];
for (int i = 0; i < entityNames.Length; i++) {
RetrieveEntityRequest request = new RetrieveEntityRequest()
{
LogicalName = entityNames[i],
EntityItems = itemsToRetrieve,
RetrieveAsIfPublished = false
};
RetrieveEntityResponse response = (RetrieveEntityResponse) service.Execute(request);
entities[i] = response.EntityMetadata;
}
return entities;
}
And again I used all possible combinations of metadata items. Oops, did I say metadata items? When requesting an entity with the RetrieveEntityRequest, you use EntityItems instead of MetadataItems. Both enumerations define the same members, except EntitiesOnly and EntityOnly, but there is a major difference:
namespace Microsoft.Crm.Sdk.Metadata {
[Flags]
public enum MetadataItems {
EntitiesOnly = 1,
IncludeAttributes = 2,
IncludePrivileges = 4,
IncludeRelationships = 16,
All = 23,
}
}namespace Microsoft.Crm.Sdk.Metadata {
[Flags]
public enum EntityItems {
EntityOnly = 1,
IncludeAttributes = 2,
IncludePrivileges = 4,
IncludeRelationships = 8,
All = 15,
}
}
IncludeRelationships has a value of 16 in the MetadataItems enumeration, but it’s 8 in EntityItems. Just a side note.
So lets look at the results of the second test:
Performing SimpleCache benchmark.
EntityOnly: 3121ms – 8KB
IncludeAttributes: 3411ms – 1320KB
IncludePrivileges: 1623ms – 17KB
IncludeRelationships: 3263ms – 304KB
IncludeAttributes, IncludePrivileges: 3721ms – 1329KB
IncludeAttributes, IncludeRelationships: 5567ms – 1615KB
IncludePrivileges, IncludeRelationships: 4175ms – 313KB
All: 5392ms – 1625KB
I did request all entities with all entity items before making this test, just as before. These are the times to retrieve the account, contact and opportunity metadata, meaning that three RetrieveEntityRequest objects are used. Comparing the time with the time needed to request all entities at once shows that you’re better of with a RetrieveAllEntitiesRequest when needing a lot of entities. But again there are some weird results and though I did the same test over and over with different conditions, it turned out that a request with IncludePrivileges is significantly faster than a request with EntityOnly. That doesn’t make any sense to me and maybe my system is broken, so I will do the same tests on multiple machines after returning from my vacation, which starts this Friday. Right now I don’t understand why retrieving more data from the server results in almost half the time (1623ms compared to 3121ms). The same effect doesn’t happen when adding IncludeAttributes or IncludeRelationships.
And something else is different: adding relationships to the request is not the performance killer as before.
I haven’t included the Adjusted column, because I didn’t know how to adjust. Having a negative value in the IncludePrivileges row is strange, so I leave it up to you to do similar tests and either confirm or deny my results and before making further statements I want to do some deeper research.
That was a warm up.
Now it’s time to talk about what to expect from a metadata cache. I have created a lot of them in the past and I can tell you that it’s really complex. A standalone application running on a client machine can of course request the metadata of a single organization and is happy. But when it comes to server-side applications that are multi-tenant aware and have to support multi-language, then it’s not so easy. Here is what I expect from a cache:
- It must provide an organizational cache.
- It must be capable to deal with multiple organizations to support multi-tenancy.
- It should give a choice between read-all-at-once and a lazy-loading approach.
- It should never request more data than needed. For instance, if you request an entity with attributes and relationships and the attributes were already retrieved, then only relationships will be loaded and the result is merged into the existing cache.
- It must be thread-safe. This is very important for server-side applications, where multiple users will call your code in parallel. The cache will be defined as a static member in your implementation, but static members are not thread-safe by default. Intelligent synchronizing of requests is important for a good performance.
- It must have methods to persist itself. Saving a cache to disk and loading it when the application starts again is much faster than requesting the data from CRM again.
- It must have the ability to check for updated metadata and to clear the cache if the metadata was changed. As this is not required by all applications, appropriate settings have to be available.
- Helper classes must exists for easy language management, though this can be implemented at various places. A short explanation: when reading the metadata you are running under a certain user account. This user account (systemuser) is used by CRM to retrieve the user’s language settings. All UserLocLabel properties in the returned metadata contain the localized labels for this user. When you cache the metadata and simply user UserLocLabel to access display names, then all users using a different language will be somewhat disappointed. Instead of using UserLocLabel you have extract the appropriate label from the LocLabel collection. For that to work you need to retrieve the user settings first and use the language settings as an argument into the LocLabels array. It isn’t too complex, but worth to be put into a helper class.
I’m going to implement all of the above and make it available in a follow-up article. To be honest, it’s already done, at least 90%. But I want to go back to the weird results of the above tests and do further testing before releasing it. And of course the accompanying article has to be written as well. As I’m starting my vacation this weekend, the next article will have to wait until I’m returning.
Cheers,
Michael Höhne