ApexApplication: My Pen Pal

March 11th, 2011 | Posted by kbromer in Salesforce

(How the Nonprofit Starter Pack handles unhandled Apex error messages)


If you’re like me, somewhere in your phones’ contact list is a trusty old friend ‘ApexApplication’. And if you’re even MORE like me, this friend emails you a lot, perhaps overwhelmingly often. Sure, the message syntax varies slightly, “Dear Kevin, UnexpectedException” or “Dear Kevin, FIELD_CUSTOM_VALIDATION_EXCEPTION”, but the meaning is still the same: Your code failed to the point of no recourse.

When I was a consultant, we might get a couple of these a day for a current or former client, and the remedy was easy. You’d look at the message, try and distill the behavior or function that caused the error, and pick up the phone and call the client. Maybe we would patch the code, maybe we would ask the client to further explain what they were doing, or maybe we’d throw our hands-up and write it off as a one-time issue.

When I came to the Salesforce.com Foundation, I quickly realized personally calling each and every client with an error message was probably not going to work. We have over 53,000 License records in our License Management Application (LMA) on seven different packages (the five core NPSP packages plus the old template and the template converter), and 150 different package versions in the wild.

While a sobering reminder of your own fallibility, these messages can also serve as an early warning system, and we needed a way to view errors in aggregate. Instead of spotting individual messages, we wanted to look for trends that pointed to a client in serious trouble (maybe some code or customizations on top of the NPSP was conflicting with our package), or a package with a serious flaw.

The Model

How to do this though? The ‘ah-ha!’ moment was the realization that error message are just emails, and Salesforce knows how to handle emails! Our basic architecture looks like this:

(1.User instance generates an Apex Application Error. 2. Message is sent to the email listed in the package license. 3. Package license email automatically forwards to an Apex Email Service address in the LMO. 4. LMO Apex Class picks up error email, parses the email, and creates an Error Message custom object, attaching it to the appropriate License record based on Org ID)

When a user generates an error in any of the five Nonprofit Starter Pack packages, that error is automatically forwarded to the email address associated with the DOT creator. That email address runs a rule on its incoming emails, and forwards Apex Application errors to the email address associated with our email service. (More info on this can be found here, here, here and here.) When our email service receives a valid Apex Application email, it call the associated class to process that email.

The Code

The NPSPErrorProcessor class that lives in our License Management Organization (LMO) is the meat of this design. The original version of this had delightfully long and inefficient looping statements to manually parse the email for as much information as possible to record to our new Error Message record. This worked, but was verbose, inefficient and pretty ugly. Fortunately, our Foundation fellow at the time, Akhilesh Gupta, plays around with regex for fun (seriously), and replaced my hideous FOR loops with this:

'005[A-Za-z0-9]{12}+/(00D[A-Za-z0-9]{12}+)[\\r\\n]+(.+?)[\\r\\n]+caused by:[\\s]*System\\.([^:]*)[\\s]*:(.+)[\\r\\n]+(?:(?:Class)|(?:Trigger))\\.([\\w]*)\\.([\\w]*)\\.?([\\w]*):'

Breaking Down the Regex

Loosely translated, this regex breaks out into some easily parsed groups, let’s check it out:

  • Start: “005[A-Za-z0-9]{12}+/” Start by finding 005, a user record, then the first 12 alphanumeric characters and a slash. We won’t capture this in a group, since our UserID doesn’t help us. This allows us to ignore any header info in the email.
  • Group 1: “(00D[A-Za-z0-9]{12}+)” Starting with 00D, an orgID, grab the next 12 alphanumerics and make that our first grouping. In other words, the 15 character OrgID.
  • Group 2: “(.+?)” Traps for periods, this is a throwaway grouping to make sure we’re grabbing the right stuff.
  • Group 3: “([^:]*)” after finding the ‘System\\.’ text with a period, we want to grab the text right after that. In other words, the exception type, excluding the colon. In the error message above, that would mean: “DMLException”.
  • Group 4: “(.+)” We want to grab everything between the first colon after the exception type, and a new line character. In other words, the ‘short message’. In the example above we’d start with ‘Update Failed’ and grab everything up to and including ‘[npe01__PreferredPhone__c]’.
  • Group 5: “(?:(?:Class)|(?:Trigger))” Starting with either ‘Class’ or ‘Trigger’, grab the items between that word and the next period. In this case, our namespace, or ‘npo02’ from the above example.
  • Group 6: “([\\w]*)” Now get everything after the period, but before the next one, in other words, our class name: ‘OpportunityRollups’ from above.
  • Group 7: “([\\w]*)” Same as above, after the next period. In other words, our method name: ‘rollupContacts’.
  • Finally, we have group 8, which we don’t capture in the error processor class below, but you could use in your implementation. This is the final “([\\w]*)”, which would capture the line information: ‘line 667, column 42’.
  • Let’s see it in context with the rest of the class:


    global class NPSPErrorProcessor implements Messaging.InboundEmailHandler {

    global Messaging.InboundEmailResult handleInboundEmail(Messaging.InboundEmail email, Messaging.InboundEnvelope envelope) {
    Messaging.InboundEmailResult result = new Messaging.InboundEmailresult();

    Error_Message__c em = new Error_Message__c(Message__c = email.plainTextBody);

    //convert email body to lowercase to avoid case mismatches
    string lcEmailBody = email.plainTextBody.toLowerCase().trim();

    //First, determine the error type – we’ll search the
    //body for some generic clues to determine the type
    string emErrorContext = ”;
    if (lcEmailBody.contains(‘custom_validation’))
    emErrorContext = emErrorContext + ‘Validation Error;’;
    if (lcEmailBody.contains(‘apex script unhandled trigger exception’))
    emErrorContext = emErrorContext + ‘Apex Trigger;’;
    if (lcEmailBody.contains(‘batch’))
    emErrorContext = emErrorContext + ‘Batch Apex;’;
    if (lcEmailBody.contains(‘visualforce’))
    emErrorContext = emErrorContext + ‘Visualforce;’;
    if (lcEmailBody.contains(‘apex script unhandled exception’))
    emErrorContext = emErrorContext + ‘Apex Class;’;
    //Finally, if we haven’t found anything, dump it in ‘other’
    if (emErrorContext.length() < 2) emErrorContext = 'Other;'; em.Error_Context__c = emErrorContext; string regex = '005[A-Za-z0-9]{12}+/(00D[A-Za-z0-9]{12}+)[\\r\\n]+(.+?)[\\r\\n]+caused by:[\\s]*System\\.([^:]*)[\\s]*:(.+)[\\r\\n]+(?:(?:Class)|(?:Trigger))\\.([\\w]*)\\.([\\w]*)\\.?([\\w]*):'; Pattern emailPattern = Pattern.compile(regex); Matcher emailMatcher = emailPattern.matcher(email.plainTextBody); if (emailMatcher.find()){ //These will fail if no ID is found - however, both are parents and required for em sfLma__License__c errorLicense = [select id, sfLma__Package_Version__r.sfLma__Package__r.id, sfLma__Package_Version__r.id from sflma__License__c where sfLma__Subscriber_Org_ID__c = :emailMatcher.group(1).trim() and sflma__Status__c = 'Active' and sfLma__Package_Version__r.sfLma__Package__r.Namespace__c = :emailMatcher.group(5).trim()]; em.License__c = errorLicense.id; em.Package__c = errorLicense.sfLma__Package_Version__r.sfLma__Package__r.id; // [select id from sfLma__Package__c where Namespace__c = :emailMatcher.group(5).trim()].id; em.Package_Version__c = errorLicense.sfLma__Package_Version__r.id; em.Account__c = [select id from Account where Organization_ID__c = :emailMatcher.group(1).trim()].id; em.Exception_Type__c = emailMatcher.group(3).trim(); em.Short_Message__c = emailMatcher.group(4).trim(); em.Class_Name__c = emailMatcher.group(6).trim(); em.Method_Name__c = emailMatcher.group(7).trim(); insert em; } return result; }//close handler method [/java] And, our test method: [java] static testMethod void NPSPErrorProcessorTEST(){ // Create a new email and envelope object Messaging.InboundEmail email = new Messaging.InboundEmail(); Messaging.InboundEnvelope env = new Messaging.InboundEnvelope(); //Create a dummy account string idstring = '00D123456789987'; Account a = new Account(Name = 'TestAccount', Organization_ID__c = idstring); insert a; //Create new LMA package instance so we can check namespace matching sfLma__Package__c p = new sfLma__Package__c(Namespace__c = 'test1', Name = 'TestPackage'); insert p; //Create a new Version sfLma__Package_Version__c v = new sfLma__Package_Version__c(Name = 'VersionTest', sfLma__Package__c = p.id); insert v; //Create a new License sfLma__License__c l = new sfLma__License__c(sfLma__Seats__c = 1, sfLma__Status__c = 'Active', sfLma__Subscriber_Org_ID__c = idstring, sfLma__Package_Version__c = v.id); insert l; //Create a dummy error message using the test org and package email.plainTextBody = 'Apex script unhandled exception by user/organization: 005400000017nGe/' + idstring + '\n' + 'Scheduled job Nightly Opportunity Rollup threw unhandled exception.' + 'apex script unhandled trigger exception ' + 'visualforce ' + 'custom_validation\n' + 'caused by: System.TestException: Attempted to schedule too many concurrent batch jobs in this org (limit is 5).\n' + 'Class.test1.OpportunityRollups.rollupAllContacts: line 839, column 23\n' + 'Class.test1.OpportunityRollups.rollupAll: line 794, column 3\n' + 'Class.test1.SCHED_OppRollup.execute: line 5, column 9\n' + 'External entry point'; email.fromAddress ='test@test.com'; email.subject = 'Fwd: Developer script exception from TestAccount: Nightly Opportunity Rollup : Attempted to schedule too many concurrent batch jobs in this org (limit is 5).'; NPSPErrorProcessor npspep = new NPSPErrorProcessor(); Test.startTest(); Messaging.InboundEmailResult result = npspep.handleInboundEmail(email, env); Test.stopTest(); Error_Message__c em = [select e.Short_Message__c, e.Package_Version__c, e.Class_Name__c, e.Method_Name__c, e.Package__c, e.Org_Name__c, e.Message__c, e.Exception_Type__c, e.Error_Context__c, e.Account__c From Error_Message__c e where e.Exception_Type__c = 'TestException']; system.assertEquals(em.Package__c, p.id); system.assertEquals(em.Exception_Type__c, 'TestException'); system.assertEquals(em.Account__c, a.id); system.assert(em.Error_Context__c.contains('Batch Apex')); system.assert(em.Error_Context__c.contains('Apex Class')); system.assert(em.Error_Context__c.contains('Visualforce')); system.assert(em.Error_Context__c.contains('Validation Error')); system.assertEquals(em.Class_Name__c, 'OpportunityRollups'); system.assertEquals(em.Method_Name__c, 'rollupAllContacts'); system.assertEquals(em.Package_Version__c, v.id); } }//Close class [/java] Breaking Down the Class Code

    We use the string instance method contains() to look for the error context (Validation, Batch Apex, VF, etc.), and then use our regex function to parse the rest of the email. We grab the license ID by looking up the org ID on our License records. The Package and Package Version come from the License record, as does Account, if valid. An exception type is parsed to provide metadata about the Error Message, as well as the short message, and the class and method causing the error. Our newly generated Error Message looks like this (note, this error message is not from the email above, but rather from the email listed in its ‘Message’ field, which contains the original full-text email):

    Analyzing Our Data

    These Error Message object now allow us to generate useful dashboards for near-real-time monitoring of the health of our package eco-system. Each morning, I check for spikes in Error Messages from the day/night before, and if visible, attempt to address the situation as appropriate. Our Error Message dashboard provides me with all the information I need to quickly assess our package eco-system health.

    If we look at Error Messages by Date we see a few interesting things. The initial spike on 1/21 was the load of all existing Apex Application emails that had been piling up in the inbox of our DOT user. The real win came on the spike on 2/10. We noticed a sudden surge in errors in our opportunityRollup class (a particular complex piece of code that aggregates opportunity/donation totals to various non-parent objects, like contacts and households). We were able to reach out to the two organizations that were generating 90%+ of these errors, help them identify the root cause, and provide solutions where needed. (Custom validations + bulk data import = Lots of Errors).

    We can also look at most frequent offending classes and packages to identify areas of our code that could use a little love, better documentation, or better usage instructions. It’s also easy to see the most common problems our users have. Some interesting facts:

  • 35% of our errors are caused by validation rules on objects we’re trying to update, but who have records that don’t conform to the validation rules in the organization instance.
  • The Households package is our worst offender, accounting for 65% of all NPSP errors. It’s a complex package that is particularly susceptible to validation errors. Version 2.0.2 was also subject to an internal SFDC error that’s in the process of being fixed now.
  • 21% of errors are caused by old versions of our Contacts & Organizations Packages that have not yet been upgraded.
  • The vast majority of errors occur in our OpportunityRollups class, followed a very distance second by our custom ContactMerge class functionality.
  • Conclusion

    As you can see there’s lots of interesting information to discern from the data we now have using Inbound Email Services. Being able to automatically accept and parse this information in near real-time and have Salesforce drive our visualizations using simple dashboards provides a level of visibility into our eco-system previously near-impossible without serious custom charting and coding. Email handlers provide an incredible array of flexibility, providing a common endpoint into your system for a variety of automated information from any standardized source (not to mention the SOAP and REST APIs!)

    In addition, being able to bind the results into an existing data structure, in this case our LMA, gives us the opportunity to cross-reference data we wouldn’t otherwise have. The power of being able to quickly spot organizations that may be in trouble and identify a fix pro-actively, without them filing a support case or picking up the phone, is the testament to the power of the Cloud. Imagine calling your users and saying: “It seems like you might be having a problem, I can help you fix that right now.”

    There’s an array of email you receive on a daily basis that would be useful to view in aggregation. I’d love to hear about your creative uses for the Inbound Email Services, or other interesting force.com hacks to help you manage your own instance, LMA/LMO, or entire eco-system.

    You can follow any responses to this entry through the RSS 2.0 Both comments and pings are currently closed.