August 13th, 2010

SQL Server Compact + Full-Text Search?

SQL Server Compact is a great in-process database engine: it’s free, lightweight, and xcopy deployable. However, since one of its selling points is its small footprint, Microsoft excludes from it many features that the full version of SQL Server has. One such feature is full-text search (FTS), which helps you build search engine capabilities into your applications.

So if SQL Compact doesn’t support full-text search, how do you implement it in your SQL Compact application? More specifically, how did we do it in Glo?

Plus, Glo holds textual information not only in relational tables, but also in lots of files external to the database. How does Glo retrieve search results from all of those files?

Glo’s search engine uses Lucene.NET, which is a C# port of the open-source Lucene search engine written by Doug Cutting. Lucene is an information-retrieval API that helps you build and read FTS indexes programmatically. Essentially, it can make any text searchable within your application.

There’s nothing wrong with executing LIKE ‘%term%’ queries on database tables for simple search functionality, although it’s likely to be slow. But in the same way that searching the index in the back of a book is faster than searching every page in the book, searching FTS indexes is faster than searching your entire database. A LIKE ‘%term%’ query that starts with a wildcard will usually force a table scan or at least an index scan, and will take much longer than an index seek.

A few years ago, a man asked me how it’s possible for Google to return so many results so quickly when he searches the Internet. “How can Google search the whole Internet so fast?” I explained to him that executing a search doesn’t search the Internet at all, it instead reads a set of pre-built index files, which, like the index at the back of the book, hold the answers all in one place.

Lucene uses the same principle to get search results. In fact, retrieving search results from FTS indexes is straightforward; getting the proper and latest data into the indexes is the part that takes creativity. I’ll expound on how Glo accomplishes that in a future post.

Props to my co-worker friend Adam Hill, who encouraged me to check Lucene out.

May 4th, 2010

No Clustered Indexes in SQL Server Compact

Contrary to what my copy of “Microsoft SQL Server 2005 Compact Edition” (Dhingra & Swanson, ©2008 SAMS Publishing) says twice on page 513, SQL Server Compact Edition only supports NONCLUSTERED indexes.

Clustered indexes aren’t always absolutely necessary, but for some queries they increase retrieval speed significantly. Having spent several weeks tuning Glo’s content database for performance, I was a little discouraged to read about this lack of support:

The reference above is from 3.5 SP1, and the SP2 docs don’t allude to a change in this regard. Fortunately most (but not all) of Glo’s queries run extremely fast — sub-millisecond in places.

May 4th, 2010

Glo: Bible of the Year

Glo won the 2010 Christian Book Award in the Bible category:

As one who has been on the Glo team from the beginning, it’s gratifying to me to hear that Glo is out there making a difference for people. It’s a testimony that technology continues to be a vehicle for spreading God’s word.

March 1st, 2010

Lesson Learned: WCF vs. NOT

For the first ~18 months of Glo’s development, it self-hosted a fairly large set of WCF services through which we pumped the data coming from SQL Compact over named pipes. Data access & entity-mapping layers sat behind five WCF endpoints used by the Presenters for data retrieval.

The reasoning behind this architecture was simple: it would be nice to leverage the same code for both the desktop and Silverlight apps, so let’s build it as a WCF service, then just point to a different URI to connect to the service based on the platform.

This worked fine until we started hitting WCF limits. Of course it’s simple to adjust MaxReceivedMessageSize, MaxItemsInObjectGraph, MaxArrayLength etc defaults, but we also saw we needed to code around perf issues when returning big data chunks. I dealt with tweaking limits for awhile, but after failing to fix WCF errors saying I had exceeded MaxNameTableCharCount, it made me start thinking about bailing on it altogether. The services appeared to be too big & had caused me one too many headaches.

I did tests to gauge potential benefit of eliminating WCF, and was sort of amazed to see the overhead they had imposed. Serialization & transport issues aside, I had attended Juval Lowy’s lecture on “Every Class a WCF Service” and came away thinking perf would not be terribly impacted.

So I spent a couple days replacing the services with a layer that loads data entities directly. This boosted backend performance by over 30%, and naturally eliminated associated headaches.

Moral of the story: even if I have to replicate Glo’s data layer as online WCF services for a web version, removing them from the desktop version was definitely worth it. LESSON LEARNED.

February 3rd, 2010

Retaining Your Glo Data and Settings When Re-formatting

Glo stores your journal data, bookmarks, notes, reading plans, user tags and personal settings in a variety of formats (XML, binary, etc), but in a single location. To retain that data through a reformat or reinstall, you just need to save that folder somewhere, like onto a memory stick, before reformatting or uninstalling. After your machine is set up again, before running Glo, put the folder back where Glo can find it.

Find the folder here on Vista or Windows 7:

C:\Users\[current user]\AppData\Local\glo\

Find it here on XP:

C:\Documents and Settings\[current user]\Local Settings\Application Data\glo\

Of course, substitute ‘[current user]‘ for your own user account name.

This ‘glo’ folder contains a few sub-folders and loose files. Save the ‘glo’ folder onto a USB stick or wherever is convenient to archive it. After reformatting and reinstalling, drop the folder back where it was, or in whichever path the new [current user] is using.