2010/01/10

Asynchronous fetch in Core Data

Who of you Cocoa devs likes Core Data? Hands up... Whoa, I can see many hands up there. I share the same sentiment. For those of you who don't know what Core Data is don't bother reading further but here is a short introduction taken out from the docs:

The Core Data framework provides generalized and automated solutions to common tasks associated with object life-cycle and object graph management, including persistence. Its features include:t
- Built-in management of undo and redo beyond basic text editing
- Automatic validation of property values to ensure that individual values lie within acceptable ranges and that combinations of values make sense
- Change propagation, including maintaining the consistency of relationships among objects
- Grouping, filtering, and organizing data in memory and in the user interface
- Automatic support for storing objects in external data repositories
- Optional integration with Cocoa bindings to support automatic user interface synchronization

Fetching objects from persistent stores! That sounds nice. How does a typical fetch look like? Again taken out from the docs:

NSManagedObjectContext *context = <#Get the context#>;

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:@"<#Entity name#>" inManagedObjectContext:context];
[fetchRequest setEntity:entity];

NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"<#Sort key#>" ascending:YES];
NSArray *sortDescriptors = [[NSArray alloc] initWithObjects:sortDescriptor, nil];
[fetchRequest setSortDescriptors:sortDescriptors];

NSPredicate *predicate = [NSPredicate predicateWithFormat:@"<#Predicate string#>",
<#Predicate arguments#>];
[request setPredicate:predicate];

NSError *error;
NSArray *fetchedObjects = [context executeFetchRequest:fetchRequest error:&error];
if (fetchedObjects == nil) {
// Handle error
}

[fetchRequest release];
[sortDescriptor release];
[sortDescriptors release];

Unfortunately Core Data currently doesn't have asynchronous fetch support built in. So if you have a large data set and/or have a complicated fetch request it takes time until it fetches your objects from the persistent stores. But most importantly the current thread blocks while the context executes the fetchRequest. If that thread is the main thread then your UI will hang and gives the impression to the user that the app has frozen. We don't want that, do we?

I've created IZManagedObjectContext, a subclass of NSManagedObjectContext, that extends it with asynchronous fetch feature. Let me modify the above snippet and show you have to use it:

IZManagedObjectContext *context = <#Get the context#>;

NSFetchRequest *fetchRequest = [[NSFetchRequest alloc] init];
NSEntityDescription *entity = [NSEntityDescription entityForName:@"<#Entity name#>" inManagedObjectContext:context];
[fetchRequest setEntity:entity];

NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"<#Sort key#>" ascending:YES];
NSArray *sortDescriptors = [[NSArray alloc] initWithObjects:sortDescriptor, nil];
[fetchRequest setSortDescriptors:sortDescriptors];

NSPredicate *predicate = [NSPredicate predicateWithFormat:@"<#Predicate string#>",
<#Predicate arguments#>];
[request setPredicate:predicate];

[context executeFetchRequestAsynchronously:fetchRequest delegate:delegate];

When the fetch is complete the the delegate will be notified with
- (void)managedObjectContext:(IZManagedObjectContext *)context fetchCompletedForRequest:(NSFetchRequest *)request withResults:(NSArray *)results error:(NSError *)error;

Note that the difference between the two fetch methods is that the asynchronous one does NOT block the thread so your app looks snappy. That makes us happy.

If you are anxious to try it out and don't want to know the juicy details how it gets the job done then just head over here. I have licensed it under the BSD license.

How does it work, you ask?

It follows these guidelines. It fetches the objectIDs (which are immutable and safe to pass across thread boundaries) that satisfy the predicate of the fetch request on a separate thread using an NSOperation on a separate NSManagedObjectContext instance using the same persistentStoreCoordinator. Then it passes those objectIDs back the the main thread and issues a new fetch request with a predicate that asks for those specific objects with the objectIDs that we received. But I said it doesn't block the main thread? Well, I wasn't telling the truth. For my defense the second fetch should have only O(n) complexity if the executeFetchRequest:error: is optimized for fetching objectIDs. But in most of the cases this second fetch will execute faster than the original fetch request which had a more "complicated" predicate. And you really should be using a fetch limit when dealing with large data sets anyway.

I haven't included any example project that shows IZManagedObjectContext in action. You just have to take my word for it that it works. I use it in the latest alpha build of OpenMaps and haven't encountered any problems, yet. If you find bugs in it then please do notify me.
Reblog this post [with Zemanta]

6 comments:

d said...

This couldn't be better timed. I'm amazed that Google caught your post so quickly!

I just finished reading through Apple's Core Data multi-threaded guidelines, looking at NSURLConnection's initWithRequest:delegate:, and thinking to myself that this was how it should be done!

I'm looking to implement real time search as you type. Looking at your code I was first thinking of cancelling any previous fetches if a new key is pressed, but doing that would just discard, not interrupt, outstanding fetches.

Any thoughts on how not to avoid congestion from multiple, simultaneous fetches? Seems like the best approach would be to set a timer for, say, a 1/2 second before firing the async fetch. The timer is invalidated if another key is pressed before the 1/2 second is up.

Zsombor Szabo said...

I suggest you to cancel any fetch if the user modifies the search string and issue a new fetch when there is no editing activity after e.g. 0.5 seconds.

d said...

I think this problem is a little subtle. I wonder what those AJAX smarties do for their asynchronous network queries?

You could easily imagine a situation that puts a heavy load on the database, which is already a problem for the iPhone. E.g. I have a database of about 60000 rows and my queries on the ARM can take several sluggish seconds. Thus, if cut-off time is 0.5s and query time averages 5s, then a user could trigger up to 10 concurrent queries. (Note, [NSOperation cancel] does nothing unless the operation regularly checks isCanceled.)

I think this may also be challenging on the iPhone with the conventional table search paradigm using the NSFetchedResultsController. Early typing of one or a few keys will typically return larger result sets, which means passing around large arrays of object IDs. Using the conventional executeFetchRequest, almost all the objects would remain faulted.

An array of, say, 60000 object IDs might require more memory than the handful of in-memory objects from an array of 60000 NSManagedObjects that are mostly faulted?? Putting a limit on the query, though, is a reasonable option - although unfortunately requiring special handling to notify the user of truncated result set.

d said...

Hi Zsombor, David here again. I hope you don't mind all the attention with your module. It's great work!

I was able to use your code quite easily. However, I have some interesting restrictions. I want to perform asynchronous search for an iPhone app using UITableView, UISearchDisplayController, their data source delegates, and NSFetchedResultsController. In theory, it should be straightforward to overlay a searchable interface on Core Data objects using these classes.

In my case, I have a "main" entity with about 8400 instances. My goal is to write a full text search for the text associated with each of these. Therefore I've created an indexed searchWord entity that contains associations between a word and the main entity that contains that word. There are about 57000 unique words linked to these 8400 main instances. As you type, I retrieve all the main instances that have searchWords beginning with the prefix typed in the searchBar. (E.g. the mail app provides this kind of text search.)

Anyway, I've been surprised despite much model tweaking, messing around with predicates, and discussions elsewhere (http://stackoverflow.com/questions/1774369/how-to-optimize-core-data-query-for-full-text-search) that Core Data's performance isn't cutting it. Thus, my desire to implement an async fetch.

Because I'm using the NSFetchedResultsController I can't call executeFetchRequestAsynchronously on the moc. But that didn't hurt me too badly. I modified your code to pass the objectIds back. I then gave a fetchRequest to the fetchedResultsController to retrieve the objects from the objectIds.

The good news is that it works! The bad news is that the async code is significantly slower. Of course some overhead is expected because the fetch must be performed a second time to retrieve the objects. But that extra hit was worse than I was hoping. Typically on the iPhone 3GS the async search time was twice the time for blocking code. Although the user interface didn't block -- which is worth something!

You can find a spreadsheet at http://drop.io/asyncCoreData . The increase in running time versus blocked fetches and the increase in running time as a function of number of rows returned are both linear... which we would expect if the query was properly optimized and retrieving objects by objectId is O(N). Unfortunately, in practice it's still slow.

An interesting example of this sluggish performance for searching large Core Data stores is Apple's iTunes search interface on the iPhone. (There's a search bar hidden above the first row.) I have over 6000 songs and the search is asynchronous, but appears snappy, through some tricks. Even if you've chosen the "Songs" tab, the interface will search through artists, albums, and songs. The implementation must be complicated, but the upshot is that when typing the first few characters you can immediately see the matches for artist or album and the song matches display after some delay further down the table. Essentially Apple tricks you into thinking you're getting a snappy result by showing easy matches first (artist and album) and filling in the matches for song titles later.

Ultimately I'm just going to have to implement a limit on my query, but I'm really not happy about that. From a UI experience, it's lame.

Andy The Geek said...

This is a great idea. Any tips on extending the NSFetchedResultsController to make use of the async fetchrequests?

Zsombor Szabo said...

I can't think of a way how to do this with NSFetchedResultsController. It handles fetching internally.

Post a Comment