Postings tagged with Openismus
Direct Read Access for Evolution Addressbooks in LibreOffice
With Tristan just landing his work on direct-read-access for EDS 3.8 we at Openismus started to look for use-cases of DRA outside the customer work we did. A first project that came into mind was LibreOffice: It provides a connectivity driver for Evolution's address books. Basically this component is to support standard letters. Now mass printing letters isn't exactly dominated by the address book's reading performance, but to the user the addressbook appears like any other database, so it surely can used for more interesting tasks. Also the EDS function LibreOffice uses for accessing addressbooks got deprecated with EDS 3.8, so we gave it a try.
Downloading and building LibreOffice was a simple walk thanks to LibreOffice's great build instructions. Somewhat interested in build systems I was positively surprised when seeing LibreOffice's sophisticated build system that's entirely based on Autoconf and pure GNU make. The relevant code was easy to find and understand, so I patched it quickly. Getting environment variables and D-Bus configuration right for testing was the biggest effort. After being sufficiently confident about the code, I did a quick push to a new gerrit topic branch. Shortly later the code was reviewed and merged. Nice experience.
So what do you get from this patch: Accessing Evolution address books from LibreOffice should be more efficient now, as we eliminated a few layers of complexity for reading. This also should reduce worries about GLib main loops or threads interacting badly with LibreOffice, as direct-read-access skips D-Bus and just runs the backend drivers.
Still the database wizard doesn't list all available Evolution address books: There are local, LDAP and Groupwise address books, but Google address books are missing for instance. Additionally only the first address book of each backend is available. Looking at the database wizard now to fix this, and getting a first impression of the older code's complexity.
The Movie Database, Grilo and mock testing
This year at Openismus had a strong focus on media files and related metadata. So it isn't surprising that we also got in touch with Grilo. Grilo is a framework for discovering media files and decorating it with relevant metadata. One missing piece was background information about movies, like story summaries, artwork, information about artists. With the support of our customers we decided to change that.
Now an obvious choice for getting information on movies is Amazon's incredibly comprehensive Internet Movie Database. Sadly there doesn't seem to be any official web API that could be used by open source software. If I missed something, please tell me in the comments.
The Movie Database plugin for Grilo
Anyway, with IMDb out of reach we had to look for an alternative - and we've found a great one: The Movie Database. This database is a community driven project, built with the needs of media players in mind. It comes with a great and easy to use API. Like IMDb it seems to know just each and every relevant and irrelevant movie ever made. Some entertaining stuff like movie trivia, goofs and quotes are missing, but for most movies the database knows the IMDb id - so all the entertainment is just one click away.
Now that we've found a good movie database Jens started implementing a TMDb plugin for Grilo, and a first version of the plugin was released with grilo-plugins 0.2.2. I've later finished it while integrating it with our customer's project. Murray took care of examples and documentation.
The plugin implements a metadata resolver: It takes a media object, checks for a title or the unique movie id and then fills the media object with information it found on TMDb. Biggest obstacle when using the plugin is the need for an API key. It should be easy to get. The database operator hope these keys will help them to deal with misbehaving clients.
Another issue is related to Grilo's architecture. Grilo doesn't permit its sources to improve any metadata. So if you interpolated a movie title from the filename and pass a media object with that title to the TMDb plugin, the plugin is not permitted to replace your usually ugly title with the pretty and official title it found in the database. To work around that issue you would not resolve the TMDb data in a single step. Instead you'd do a first resolve operation to just retrieve the TMDb's movie id. In a second step you'd delete your own title, but keep the resolved movie id and let the plugin work with that data:
/* Create dummy media from filename */
media = grl_media_new ();
grl_media_set_title (media, "the_last_unicorn_720p");
/* Ask grilo to resolve the TMDb id */
metadata_key_tmdb_id = grl_registry_lookup_metadata_key (registry, "tmdb-id");
keys = grl_metadata_key_list_new (metadata_key_tmdb_id, GRL_METADATA_KEY_INVALID);
grl_source_resolve_sync (source, media, keys, options, &error);
/* Exercise: Release memory and handle possible errors */
/* Use the resolved id to get full metadata */
movie_id = grl_data_get_string (GRL_DATA (media), metadata_key_tmdb_id);
if (movie_id) {
grl_data_remove (GRL_DATA (media), GRL_METADATA_KEY_TITLE);
grl_source_resolve_sync (source, media, keys, options, &error);
/* ... */
}
Behind your back the same number of network requests is needed for both approaches.
Testing the Plugin
At Openismus we have a mantra we strongly believe in:
No code is done before it has proper tests.
Or more radically:
It doesn't work if it isn't tested.
In the case of the TMDb plugin this commitment caused us a bit of work. Grilo didn't have any infrastructure for testing network based plugins yet. Also if we'd do those tests we'd like to run them offline:
- it works without API keys
- it saves resources and is much quicker
- it permits testing of obscure errors
- it is more reliable by removing countless points of failure
So we took the challenge and also implemented a mock testing framework for
Grilo. It became handy that Grilo already routes all its network access
through a dedicated abstraction, so that it can implement request throttling.
So we just hooked into that layer and introduced a few environment variables
that influence behavior of that layer: First you'd set GRL_NET_CAPTURE_DIR
to some folder name and then let your plugin perform a few interesting
operations. In a next step you'd edit the recorded files and rename them nicely
to suite your needs. Essential in that work is a generated key-value file of
the name grl-net-mock-data-$PID.ini which maps URLs to captured responses.
Once done you can set GRL_NET_MOCKED to point to that file. Grilo will then
stop doing real network requests and answer all requests with the information
this file provides.
Details about testing with the Grilo mock framework are in the reference manual. A few examples for using the framework can be found in Grilo's test folder.
Media Discovery with QtGStreamer
Earlier this year we at Openismus proposed a Qt based project that would utilize GStreamer for handling media files. Especially we were interested in using the GstDiscoverer class which provides a really nice and easy to use API for discovering properties of media files, such as the container format and the audio and video formats, but also more interesting things like EXIF information, when used with photos.
Now combining code from different worlds with their different paradigms isn't exactly fun. The resulting code often is a disgusting Frankenstein monster not fitting at any place, unless you wrap one of the libraries to match the project's preferred code style. Luckily in the case of Qt and GStreamer Collabora's George Kiagiadakis created QtGStreamer and therefore did most of the hard work already. Still that library didn't support our beloved GstDiscoverer class yet. So we had the choice: Use something different, or wrap that thing. Now we love doing free software, also we use GstDiscoverer with great success in the Rygel UPnP AV/DLNA Media Server already, and in the end the media files shall get played via GStreamer in the end. So we decided to just wrap that class for QtGStreamer.
Doing that work actually was surprisingly easy: A few loose ends here (#680235), a bit of nitpicking there (#680233, #GB680237). Biggest effort was doing the regression tests. This tests also demonstrate how easy the wrapped GstDiscoverer is to use. Synchronous media discovery is done like that:
QGst::DiscovererPtr discoverer = QGst::Discoverer::create(QGst::ClockTime::fromSeconds(1));
QGst::DiscovererInfoPtr info;
try {
info = discoverer->discoverUri("file:///home/mathias/blockbuster.ogv");
} catch(const QGlib::Error &error) {
qWarning("Discovery failed: %s", qPrintable(error.message()));
// ...maybe also check error.domain() and .code()
}
You also can try asynchronous discovery if you have a Qt build that integrates GMainLoop:
QGst::DiscovererPtr discoverer = QGst::Discoverer::create(QGst::ClockTime::fromSeconds(1));
// Connect C++ member methods to the signals
QGlib::connect(discoverer, "starting", this, &DiscovererTest::onStartingDiscovery);
QGlib::connect(discoverer, "discovered", this, &DiscovererTest::onUriDiscovered);
QGlib::connect(discoverer, "finished", this, &DiscovererTest::onDiscoveryFinished, QGlib::PassSender);
discoverer->start();
QEventLoop loop;
loop.exec();
Usually only X11 builds match that requirement, but it should be possible to just hook QEventDispatcherGlib into your own application if needed.
The discovered data is accessible by the various attributes and methods of QGst::DiscovererInfo:
QGst::DiscovererInfoPtr info = ...;
qDebug() << info->uri();
qDebug() << info->tags();
qDebug() << info->duration();
// ...
Q_FOREACH(const QGst::DiscovererVideoInfoPtr &info, info->videoStreams()) {
...
}
Sadly our customer wasn't that much a fan of Qt as we thought, so we didn't have much use of our own for this work yet. This situation also delayed finishing the last few bits of that patches. Luckily Murray just took the time recently to do that last bits of work, and to get the patches merged. The code is in the git repository now and should get released with QtGStreamer 0.10.3. So whenever your Qt application needs to discover media file properties you also can use QtGStreamer now.
Performance and Memory Usage of Evolution Data Server
Openismus asked me to perform some benchmarks on Evolution Data Server. We wanted to track the progress of recent performance improvements and identify possible improvements. Therefore, I tested these versions of EDS:
- EDS 3.5.2 from June 4th 2012 - the latest development release.
- EDS 3.4.2 from May 14th 2012 - the latest stable release.
- EDS 3.2.3 from January 9th 2012 - shipped with Ubuntu 12.04 LTS.
- EDS 2.32.3 from April 21th 2011, - the last stable release of the GNOME 2 series.
- Maemo Fremantle's EDS fork from June 23th 2010, which was carefully optimized for the Nokia N900.
The code is in a phonebooks-benchmarks repository on Gitorious with a full auotools build, and with a script to build and test all these versions of EDS. So you can try this too, and you can help to correct or improve the benchmarks. See below for details.
EDS offers various APIs to access the address book. The traditional interface was EBook, which has served us well in GNOME 2 and Maemo 5. However, some APIs such as batch saving are missing in the upstream version. Also its asynchronous API doesn't follow the conventions established later by GIO. To overcome these EBook shortcomings, and to make efficient use of GDBus, the EBookClient API was added in EDS 3.2. We can even use the backend EDataBook API, and that lets us measure the overhead imposed by using D-Bus.
I tested the back-ends with different numbers of contacts. For each benchmark, and for each contact count, we create an entirely new private database. D-Bus communication was moved to a private D-Bus session. To avoid swapping, the maximum virtual memory size was limited to 2 GiB per ulimit command. This limit probably caused the crash of Maemo 5's EDS in the test with 12,800 contacts, but I have not checked that yet.
Contact Saving
These benchmarks simply store a list of parsed contacts in the address book. This simulates use cases such as the initial import of contacts upon migration or synchronization.
To avoid possible side effects from lazy parsing, we retrieve the attribute list for each contact before starting the benchmark. With EBook from Maemo 5 and EBookClient since EDS 3.4, contacts are saved in batches of 3,200 contacts. This partitioning was needed to deal with resource limitations in the file backend. All other EDS variants must save their contacts one by one.

EBook, EBookClient, EDataBook implementation
As expected, the effort for contact saving grows quickly when not using a batch API. This is because a new database transaction must be created and committed for each contact. Things look much better when using the batch saving API which was available in Maemo 5 already, and was recently added to EBookClient:

Batch saving performance of EDS 3.4+ is just excellent: Although slowly growing with the number of contacts, it remains below a threshold of 3 ms per contact even for 12,800 contacts. That growing effort can be accounted to growing attribute indices. The initial peak (until 50 contacts for Maemo 5, and until 400 contacts for EDS 3.4+) can be accounted to database setup cost.
In terms of performance there is no difference between using EBookClient or EDataBook (which avoids D-Bus).
Contact Fetching
A very basic, but essential, benchmark is fetching all contacts. To get a first impression I just fetched all contacts without any constraints.

EBook, EBookClient, EDataBook implementation
Contact fetching performance decreased significantly during the EDS 3 series and then got better again: Fetching all contacts with 3.4 takes about 133% of the time that EDS 2.32 needs and even 225% of Maemo 5's time. With EDS 3.5 contact loading time is improving again, making the EBook version of EDS 3.5 comparable in performance to EDS 2.32. Git archeology and quick testing identifies Christophe Dumez's bugfix as the relevant change. Apparently the file backend didn't make optimal use of Berkeley DB transactions.
Still there is significant room for improvement, because:
- simple contact fetching with EBook 3.5 still takes 175% of the time Maemo 5 needs.
- EBookClient 3.5 is still 20% slower than EBook 3.5, and 64% slower than EDataBook.
This basic test shows already that the chosen client-server architecture of EDS causes a non-ignorable overhead.
It would be absolutely worth investigating how Maemo 5 reaches its far superior performance: After all it even beats EDataBook. I remember having spent significant time on avoiding vCard parsing and string copying. I also remember having replaced the rather inefficient GList with GPtrArray at several places. Some of the ideas have been ported upstream during Openismus' work for Intel. Apparently there are more gems to recover.
Fetching by UID
Fetching contacts by UID simulates lazy loading of contact lists: Initially, we only fetch contact IDs. We only fetch the full contacts when the device becomes idle, or when contacts become visible on screen. This approach is needed because even the fastest implementation (Maemo 5) needs several seconds to fetch any contact list of realistical size on mobile hardware. Another useful optimization we implemented on the Nokia N9 is fetching of partial contacts, that only contain relevant information, like for instance the person's name. EDS doesn't support this optimization.
As a first test we fetch contacts one-by-one, without using any kind of batch API:

EBook, EBookClient, EDataBook implementation
The good news is that this chart shows quite constant performance for each client.
The bad news is that contact fetching is pretty slow: 3.9 ms per contact, as seen with EDS 3.5, translates roughly to 390 ms to fetch only 100 contacts on this hardware. Considering that typical mobile devices are roughly 10 times slower than my notebook, these numbers are disappointing. Especially if you consider that EDS 2.32 was about 4 times, and Maemo 5 even about 13 times faster. This are entirely different worlds. It should be investigated what causes this significant performance regression from EDS 2.32 to EDS 3.2+. One also should try to port the performance fixes of Maemo 5.
The performance reachable under ideal circumstances is shown by the EDataBook client. This only needs about 50 µs (0.05 ms) to fetch one contact by its id. Directly accessing the address book via EDataBook is about two orders of magnitude faster than the current EBookClient. That's the goal that EDS can, and should, aim for. Apparently a significant amount of time is spent on performing D-Bus communication, whereas the requested task can be performed by the backend within almost no time.
However, this data was acquired by fetching the contacts one by one. We can surely do better by using a batch API. That should drastically reduce the overhead caused, for instance, by D-Bus. But neither EBook or EBookClient provide an API to fetch contacts by lists of contact IDs. The thing that comes closest is filtering the contacts by a disjunction of UID field tests:
(or (is "id" "uid1") (is "id" "id2") ...)
So I tried that. The results for such queries, using batches of UIDs, look like this:

EBook, EBookClient, EDataBook implementation
This chart speaks for itself. To remain responsive and appear fluid while scrolling, applications should render at 60 frames per second. To reach that framerate newly visible contacts must be fetched and rendered within less than 16 ms. EDS apparently cannot meet that performance goal even on desktop computers. Considering the huge performance differences between client-server access and direct access, as seen when fetching contacts one by one, it seems very worthwhile to add dedicated support for fetching multiple contacts by UID. The most obvious approach would be adding a batch API in the spirit of e_book_client_add_contacts(). Another solution would be adding more fast paths to the query processing code.
Filtering
EBook, EBookClient, EDataBook implementation, the queries used
Contact filtering is relatively efficient when using fields such as last-name, for which indices are hit. Still, the D-Bus overhead is noticeable: EDataBook needs less than 60% of EBook's or EBookClient's time.

The times to match long prefixes and suffixes look quite similar when hitting indices.
The behavior of EBook for short name prefixes is a bit strange. The EBook API is now deprecated, but it could still be worthwhile to identify the issue causing this strange behavior, so that it can be avoided in the future:

Interestingly, there seem to be no functional database indices for email addresses or phone numbers in more recent versions of EDS:

The behavior of Maemo 5's EDS is a bit surprising, as I know that Rob spent significant amounts of time on adding Berkeley DB based indices to that EDS version.
It might be worth optimizing index usage in EDS, because prefix and suffix searches are commonly used in mobile applications. Prefix searches need to be fast, for quick feedback during auto completion. Suffix searches need to be fast, for instance to provide instant caller identification.
Memory Usage
Memory is cheap those days. Still, especially on embedded devices, you should keep a close eye on the memory consumption of your software. Obviously, memory consumption grows with the number of contacts used:

It's nice to see how memory consumption has reduced from release to release. It's also good to see that EBookClient seems to use slightly less memory than EBook.
You might miss the graphs for Maemo 5 and EDS 2.32. I had to drop them for this chart as they show serious memory leaks, preventing any useful examination. Apparently the leak really is in those versions of EDS: The EBook benchmarks for EDS 3.2+ are using exactly the same code but don't show this bad behavior.
Notice that I've accumulated the client's and the backend's memory consumption. This allows us to estimate the cost of EDS's client-server architecture. Apparently this architecture increases memory consumption by about 40% in these benchmarks.
While the RSS usage gives us information about the efficiency and correctness of the code, it's also interesting to check the virtual memory size needed by the benchmarks. Please take the following numbers with a reasonable grain of salt: I got these numbers by simply adding together the virtual memory size of the client and of the backend process, as reported by the processes' status file. A proper measurement would process the maps file to properly account for shared memory regions.

The first issue we notice is the massively increased memory usage of EBookClient 3.2. It's almost 40% more than the others. Fortunately, the issue seems to have been fixed already in EDS 3.4.
At first glance, the very low virtual memory usage of the EDataBook benchmark is impressive. It seems to consume only 40% of the client-server based benchmarks. Still, there is a fairly high chance that this huge delta must be attributed to my poor measurement here: Assuming perfect code segment sharing there only remains a delta of about 20 MiB, which would be nothing but the RSS delta of EDataBook and EBookClient. It would be nice to investigate this in more detail.

This chart shows the memory per contact after the contact saving benchmark. The overall memory usage per contact has grown dramatically by almost 40% in EDS 3+. The most efficient approach is apparently to directly access EDataBook, which consumes only 55% of the RSS per contact, compared to the client-server approaches.

This high memory usage per contact is a bit surprising since, after subtracting effects from library and database initialization, the memory usage per contact remained constant between EDS 2.32 and EDS 3.5. The parallel usage of both Berkeley DB and SQLite in the file backend might be to blame, but this is currently pure speculation from me.
The temporary regression in EDS 3.2 was apparently fixed. The increased memory usage of EBookClient and EDataBook over EBook is because the EBookClient and EDataBook benchmarks, in a slightly unrealistic bias for performance, store both the EContact and the VCard string for each contact.
Conclusions
The developers of Evolution Data Server have paid good attention to performance and have successfully implemented significant improvements. However, EDS releases regularly suffer performance regressions, and the performance of EDS still isn't good enough for usage in mobile devices. Fortunately the performance problems are solvable. Some fixes will be straightforward, such as adding more batch API (or fast paths) for query processing. Others will need careful performance monitoring: For instance when activating more database indices, to speed up queries, we must be careful not to slow down contact saving.
A not so trivial improvement would be adding a direct access API for reading the local database. The speed and memory usage measurements show the value of such API: Direct access is significantly faster than via D-Bus in most usage cases, and it seems to significantly reduce memory usage.
Another significant improvement should be finishing the file backend's transition to SQLite: Using two database backends in parallel significantly increases code complexity and has measurably bad impact on memory consumption.
Usage Instructions
The full source code of this project is in our phonebooks-benchmarks repository on Gitorious. You'll need a fairly recent C++ compiler because I also used this project to get more familiar with the new features of C++11. I've successfully tested g++ 4.6.3 and g++ 4.7.0. Clang 3.0 definitely won't work because of incomplete lambda support.
Other than that, you'll need Boost 1.48 to compile the benchmarks. The optional chart-drawing module uses ImageMagick++ and MathGL 2.0.2.
There is a simple script for building and installing the tested EDS versions. The configure script will give instructions.
To finally run the benchmarks just call src/phonebook-benchmarks, and to draw the charts run src/phonebook-benchmarks-mkcharts.
When doing your own tests that needs a non-trivial vCard generator take look at src/phonebook-benchmarks-mkvcards.
Outlook
It would be interesting to take a more detailed look at the virtual memory usage.
Also it would be educational to compare these results with other address book implementations. The first candidates I have in mind are QtContacts on Tracker and Tizen's native contacts API.
We didn't cover EBookView and EBookClientView yet. These views take a query and notify the application when the contact set matching the query has changed. Typically, every user interface needs to use them.
We also didn't talk about the calendar API yet.
Well, and most importantly we at Openismus would enjoy fixing the identified performance problems.
Using Full Text Search Engines as Datastore
It's a common design to use full text search engines only for free text searches, but to store the actual structured data in a separate database. Such designs come at a cost. Therefore Openismus asked me to build upon my previous post, where I analyzed several FTS engines. This time I'll research if we could use the full text search index itself as our primary data store.
Relations
A first obvious limitation is the lack of joins. So to use the FTS index as
data store, you must denormalize your data. That is, instead
of storing your movie database in distinct entity tables like Movie and
Artist, linked by relationship tables like isLeadActor or isDirector,
you must find a way to put everything into one single flat table. This isn't
entirely nice in terms of redundancy and consistency. On the other hand joining
tables is what makes relational databases slow and hinders distributing them
across servers. Is there someone whispering "NoSQL"? Well. Yes, while I
absolutely dislike their striking marketing: They are on to something, and
with our journey today we enter their land.
Seems I've lost myself in chatting, so back on topic. So to store data in a FTS index we must denormalize our data. Luckily they make it easier than it sounds. In opposition to the relational model, there is no need to create complex relationships, just to assign more than only one actor or director to a movie: When adding artists to your movie you just tag each name with the proper field prefix before adding it to the index, and you are done. FTS engines natively support multi-value fields!
With some additional effort it also should be possible to store more structured
data in those multi-value fields, things like (release-date, country), or
(actor, role): You'd add more prefixes and use the positional information
stored for phrase searches to reliably identify those fields. Sadly my time is
too limited to research this more in detail, but the Internet surely has
documents about this. Well, or for additional fun you can try to figure it out
yourself.
Exact Matches
You can just add unanalyzed fields and use term queries on them like kamstrup pointed out.
Data Types
So we've learned that lack of relations isn't much of a problem for many useful datasets, but structured data is not only about relationships, it also is about data types. Full Text Search engines only support lexicographical order, so they surely fail for dates and numbers. You surely cannot use them to find documents within a given range!
I am sorry to disappoint you. The people researching FTS are smarter than that.
Actually properly sorting and ranging dates, while only using lexicographic
order is trivial. Most probably you have done it yourself already. Simply store
your dates in ISO format, that is YYYY-MM-DDThh:mm:ss.SSSNNN or any prefix of
this, and you are done. Omit the separators if you prefer. ISO-8601 explicitly
is designed for lexicographic sorting.
So how do you do this with numbers? You could prefix them, for instance with zeros, to get a fixed width. This works reasonably if you know your number ranges, and in most cases you do. Sometimes you know the range from your application's context, e.g. the first known celluloid film was recorded in 1888. More easily you just use your technical limits, like [-263..263-1] for long integers. While first experiments really followed that approach, padding numbers with up to 18 zeros isn't exactly efficient or pretty. Also we didn't talk about floating point numbers yet. Therefore FTS engines like Lucene or Xapian provide more efficient mechanisms for turning numbers into sortable strings. First they write a prefix indicating number precision (64 bit, 32 bit, 10 bit, ...). Then they convert the numbers to some unsigned format, and apply some kind of base-128 encoding to the resulting bytes. The most significant bit gets stored first. For floating point numbers they shuffle some bits of the number's IEEE-754 representation. The resulting, sortable 64 bit integer then is encoded like any other number. You can consult Lucene's documentation, and the source code of Lucene::NumericUtils, or Xapian::sortable_serialise for details.
Benchmarks
Hope I didn't lose you with all this theory, now it is benchmark time!
To test how useful FTS engines are for storing arbitrary data I've extended my previous benchmark to better support range searches, and to support exact matching of fields. I've also added Michal Hruby's patch for supporting prefix searches. Since the prefix search gives countless hits, the query results consistently are limited to 10.000 rows now. I've dropped QtCLucene for now since it doesn't seem to support numeric range searches and such. It was forked from Java Lucene a long time ago. For SQLite I ran two sets of tests: bm_sqlite doesn't create indices for fields like movie title or artist names. Since such setup is unfair when comparing with FTS engines, the second set bm_sqlite_index creates indices for all fields we perform lookups for. For tracker we again test the Nepomok media ontology (bm_tracker) and a optimized ontology (bm_tracker_flat), that attaches all properties to the same RDF class. I had to disable prefix searches for bm_tracker: The query ran for more than 2 hours on the dataset with 17k movies. I seriously wish I'd get sponsored to improve Tracker's data model!
Source code still is in the fts-benchmark repository, tagged as
release/0.3.
Results and Discussion
Each query got run 7 times on 5 different data sets. This time I didn't take
the mean of the query execution times. The individual results of each dataset
are grouped together and labeled with qxx_t1 to qxx_t7. Data and result
sets grow with each group.
Also be careful when reading the charts as time is scaled logarithmically. You might want to consult the raw data tables below for details. Please keep in mind that the basic goal of this benchmarks is to test scalability, not raw performance. Therefore I don't mind much if an engine is 10 times slower than another for small data sets. Constant performance is the ideal result.
You'll also notice that some charts have gaps for bm_tracker. Like explained above I had to skip bm_tracker for few data sets, as tracker took way to long to perform those benchmarks.
![rating:[90 TO 99]](http://taschenorakel.de/files/fts-benchmark/ftsds1.png)
Lucene++ appears significantly slower than its competition for small data sets, but then gives comparable results for data sets with more than 3,000 movies. Still I would not overrate this finding: We are talking about lookup times in the range of 10 ms. That's still pretty fast and close to measurement limits like the spikes in the other engine's results show.
![release:[1999/01/01 TO 1999/09/30]](http://taschenorakel.de/files/fts-benchmark/ftsds2.png)
This results are similar to the rating:[90 TO 99] query.

For this query you see the importance of having an index for your lookup keys: Performance of bm_lucene++ and bm_sqlite_index remains almost constant, while effort of the other engines grows dramatically as the data size grows.
Xapian's bad performance comes as a surprise, but actually I am to blame here:
For stupid reasons I've implemented this very search as range search in
Lucene++ and Xapian (release:[1999/03/31 TO 1999/03/31]). As the results
indicate Lucene++ seems to putting more effort into optimizing range searches,
and compensates my mistake.

Similar results as for release=1999/03/31, only that Xapian behaves as
expected now. When given a proper query it also shows constant lookup time for
exact phrase searches.

With this query you see the advantage you get from using denormalized tables: Lucene++ and Xapian just are as efficient as in the previous tests, but as a not so big surprise Tracker with the flat ontology now beats all remaining engines, including bm_sqlite_index.

Performance of the different engines is similar to each other when performing prefix searches.
Raw Result Data
| rating:[90 TO 99] - 9 movies, 3 matches | |||||||
|---|---|---|---|---|---|---|---|
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 12.333 ms | 10.409 ms | 9.885 ms | 9.821 ms | 10.221 ms | 9.840 ms | 9.986 ms |
| bm_sqlite | 0.196 ms | 0.169 ms | 0.169 ms | 0.173 ms | 0.166 ms | 0.167 ms | 0.167 ms |
| bm_sqlite_index | 0.207 ms | 0.183 ms | 0.172 ms | 0.192 ms | 0.193 ms | 0.173 ms | 0.172 ms |
| bm_tracker | 0.992 ms | 0.655 ms | 0.582 ms | 0.589 ms | 0.554 ms | 0.549 ms | 0.525 ms |
| bm_tracker_flat | 0.693 ms | 0.463 ms | 0.437 ms | 0.461 ms | 0.450 ms | 0.443 ms | 0.436 ms |
| bm_xapian | 0.242 ms | 0.201 ms | 0.200 ms | 0.198 ms | 0.200 ms | 0.199 ms | 0.197 ms |
| rating:[90 TO 99] - 1,099 movies, 17 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 12.949 ms | 13.057 ms | 12.981 ms | 13.018 ms | 13.150 ms | 12.840 ms | 12.644 ms |
| bm_sqlite | 0.696 ms | 0.546 ms | 0.516 ms | 0.530 ms | 0.515 ms | 0.518 ms | 0.522 ms |
| bm_sqlite_index | 0.448 ms | 0.234 ms | 0.231 ms | 0.237 ms | 0.236 ms | 0.231 ms | 0.231 ms |
| bm_tracker | 5.051 ms | 4.485 ms | 4.441 ms | 4.486 ms | 4.425 ms | 4.831 ms | 4.828 ms |
| bm_tracker_flat | 1.465 ms | 1.133 ms | 1.110 ms | 1.104 ms | 1.108 ms | 1.108 ms | 1.108 ms |
| bm_xapian | 1.445 ms | 1.285 ms | 1.159 ms | 7.824 ms | 1.878 ms | 1.669 ms | 1.393 ms |
| rating:[90 TO 99] - 3,216 movies, 35 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 14.287 ms | 13.596 ms | 13.453 ms | 13.912 ms | 13.875 ms | 14.559 ms | 13.981 ms |
| bm_sqlite | 3.524 ms | 4.110 ms | 4.129 ms | 1.916 ms | 1.732 ms | 2.300 ms | 9.584 ms |
| bm_sqlite_index | 0.423 ms | 2.036 ms | 4.617 ms | 4.577 ms | 0.388 ms | 1.957 ms | 7.981 ms |
| bm_tracker | 12.776 ms | 11.816 ms | 12.449 ms | 11.755 ms | 11.762 ms | 11.983 ms | 11.764 ms |
| bm_tracker_flat | 2.935 ms | 2.517 ms | 2.374 ms | 2.264 ms | 2.250 ms | 2.261 ms | 2.258 ms |
| bm_xapian | 9.292 ms | 2.702 ms | 10.573 ms | 6.773 ms | 3.098 ms | 11.438 ms | 3.035 ms |
| rating:[90 TO 99] - 17,251 movies, 260 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 58.996 ms | 56.894 ms | 62.172 ms | 57.028 ms | 57.255 ms | 57.540 ms | 57.259 ms |
| bm_sqlite | 36.682 ms | 28.260 ms | 34.116 ms | 34.786 ms | 35.195 ms | 35.813 ms | 35.221 ms |
| bm_sqlite_index | 45.802 ms | 62.460 ms | 31.603 ms | 32.982 ms | 33.302 ms | 31.904 ms | 31.656 ms |
| bm_tracker | 67.022 ms | 64.609 ms | 64.649 ms | 65.243 ms | 64.183 ms | 64.887 ms | 64.283 ms |
| bm_tracker_flat | 14.730 ms | 14.179 ms | 14.132 ms | 14.221 ms | 14.248 ms | 20.225 ms | 35.888 ms |
| bm_xapian | 94.872 ms | 47.067 ms | 85.202 ms | 28.575 ms | 142.854 ms | 48.562 ms | 52.567 ms |
| rating:[90 TO 99] - 121,587 movies, 1,510 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 283.122 ms | 392.801 ms | 382.164 ms | 403.929 ms | 384.512 ms | 408.292 ms | 361.548 ms |
| bm_sqlite | 293.488 ms | 236.636 ms | 249.677 ms | 232.674 ms | 270.198 ms | 282.806 ms | 218.726 ms |
| bm_sqlite_index | 231.638 ms | 311.523 ms | 198.781 ms | 279.063 ms | 219.294 ms | 192.589 ms | 276.822 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 181.478 ms | 272.453 ms | 251.730 ms | 256.744 ms | 293.067 ms | 230.615 ms | 245.113 ms |
| bm_xapian | 376.176 ms | 417.637 ms | 411.263 ms | 366.596 ms | 393.168 ms | 372.888 ms | 412.411 ms |
| release:[1999/01/01 TO 1999/09/30] - 9 movies, 2 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 18.768 ms | 10.167 ms | 10.799 ms | 10.215 ms | 10.443 ms | 10.917 ms | 10.210 ms |
| bm_sqlite | 0.165 ms | 0.166 ms | 0.164 ms | 0.164 ms | 0.168 ms | 0.164 ms | 0.164 ms |
| bm_sqlite_index | 0.175 ms | 0.175 ms | 0.170 ms | 0.169 ms | 0.169 ms | 0.169 ms | 0.170 ms |
| bm_tracker | 1.074 ms | 0.569 ms | 0.546 ms | 0.561 ms | 0.544 ms | 0.549 ms | 0.546 ms |
| bm_tracker_flat | 0.877 ms | 0.480 ms | 0.460 ms | 0.458 ms | 0.461 ms | 0.458 ms | 0.456 ms |
| bm_xapian | 0.183 ms | 0.175 ms | 0.175 ms | 0.178 ms | 0.178 ms | 0.180 ms | 0.175 ms |
| release:[1999/01/01 TO 1999/09/30] - 1,099 movies, 34 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 19.154 ms | 19.449 ms | 18.811 ms | 19.419 ms | 19.692 ms | 19.315 ms | 18.862 ms |
| bm_sqlite | 0.691 ms | 0.686 ms | 0.684 ms | 0.687 ms | 0.690 ms | 0.702 ms | 0.698 ms |
| bm_sqlite_index | 0.365 ms | 0.311 ms | 0.317 ms | 0.312 ms | 0.311 ms | 0.312 ms | 0.313 ms |
| bm_tracker | 6.231 ms | 5.543 ms | 5.734 ms | 5.522 ms | 5.663 ms | 5.538 ms | 5.465 ms |
| bm_tracker_flat | 1.998 ms | 1.494 ms | 1.466 ms | 1.469 ms | 1.470 ms | 1.454 ms | 1.469 ms |
| bm_xapian | 5.336 ms | 1.590 ms | 7.241 ms | 1.977 ms | 2.651 ms | 4.013 ms | 2.544 ms |
| release:[1999/01/01 TO 1999/09/30] - 3,216 movies, 84 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 32.202 ms | 31.513 ms | 31.362 ms | 30.894 ms | 31.345 ms | 31.741 ms | 31.518 ms |
| bm_sqlite | 6.169 ms | 2.645 ms | 7.560 ms | 20.764 ms | 10.385 ms | 13.278 ms | 10.206 ms |
| bm_sqlite_index | 19.176 ms | 4.358 ms | 12.576 ms | 15.448 ms | 15.745 ms | 5.572 ms | 5.770 ms |
| bm_tracker | 15.507 ms | 14.803 ms | 13.629 ms | 15.465 ms | 13.930 ms | 14.515 ms | 13.652 ms |
| bm_tracker_flat | 3.956 ms | 3.488 ms | 3.183 ms | 3.176 ms | 3.213 ms | 3.193 ms | 3.157 ms |
| bm_xapian | 18.414 ms | 5.874 ms | 11.902 ms | 12.932 ms | 19.995 ms | 21.098 ms | 13.009 ms |
| release:[1999/01/01 TO 1999/09/30] - 17,251 movies, 374 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 93.892 ms | 93.900 ms | 93.549 ms | 93.555 ms | 93.924 ms | 94.396 ms | 93.795 ms |
| bm_sqlite | 37.831 ms | 44.905 ms | 47.617 ms | 45.894 ms | 43.796 ms | 45.752 ms | 47.048 ms |
| bm_sqlite_index | 48.475 ms | 47.805 ms | 43.046 ms | 47.393 ms | 44.689 ms | 47.842 ms | 54.208 ms |
| bm_tracker | 72.507 ms | 72.667 ms | 72.233 ms | 73.570 ms | 72.997 ms | 72.991 ms | 72.527 ms |
| bm_tracker_flat | 29.351 ms | 48.892 ms | 55.351 ms | 49.793 ms | 88.375 ms | 55.393 ms | 45.917 ms |
| bm_xapian | 59.522 ms | 168.591 ms | 55.750 ms | 83.424 ms | 113.679 ms | 62.803 ms | 127.895 ms |
| release:[1999/01/01 TO 1999/09/30] - 121,587 movies, 2,265 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 543.495 ms | 564.582 ms | 609.045 ms | 519.248 ms | 561.844 ms | 663.549 ms | 590.518 ms |
| bm_sqlite | 165.617 ms | 387.256 ms | 293.285 ms | 335.219 ms | 324.528 ms | 324.022 ms | 371.839 ms |
| bm_sqlite_index | 375.504 ms | 315.671 ms | 321.115 ms | 371.228 ms | 300.951 ms | 344.073 ms | 356.366 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 241.569 ms | 316.626 ms | 398.308 ms | 349.426 ms | 398.289 ms | 318.078 ms | 363.809 ms |
| bm_xapian | 529.377 ms | 556.989 ms | 577.643 ms | 576.194 ms | 626.388 ms | 545.251 ms | 570.695 ms |
| release=1999/03/31 - 9 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 10.065 ms | 10.068 ms | 9.702 ms | 9.974 ms | 9.837 ms | 9.751 ms | 10.356 ms |
| bm_sqlite | 0.164 ms | 0.165 ms | 0.171 ms | 0.168 ms | 0.167 ms | 0.164 ms | 0.162 ms |
| bm_sqlite_index | 0.171 ms | 0.169 ms | 0.171 ms | 0.172 ms | 0.175 ms | 0.165 ms | 0.164 ms |
| bm_tracker | 0.659 ms | 0.476 ms | 0.473 ms | 0.469 ms | 0.464 ms | 0.468 ms | 0.468 ms |
| bm_tracker_flat | 0.510 ms | 0.395 ms | 0.385 ms | 0.384 ms | 0.389 ms | 0.383 ms | 0.389 ms |
| bm_xapian | 0.154 ms | 0.152 ms | 0.151 ms | 0.153 ms | 0.152 ms | 0.156 ms | 0.152 ms |
| release=1999/03/31 - 1,099 movies, 2 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 10.853 ms | 10.545 ms | 10.718 ms | 10.390 ms | 10.521 ms | 10.754 ms | 10.661 ms |
| bm_sqlite | 0.515 ms | 0.528 ms | 0.505 ms | 0.512 ms | 0.502 ms | 0.507 ms | 0.505 ms |
| bm_sqlite_index | 3.139 ms | 0.184 ms | 0.175 ms | 3.440 ms | 0.183 ms | 0.212 ms | 0.205 ms |
| bm_tracker | 4.559 ms | 4.229 ms | 4.177 ms | 4.220 ms | 4.383 ms | 4.532 ms | 4.464 ms |
| bm_tracker_flat | 0.977 ms | 0.830 ms | 0.800 ms | 0.808 ms | 0.802 ms | 0.811 ms | 0.802 ms |
| bm_xapian | 0.672 ms | 0.685 ms | 0.774 ms | 0.752 ms | 0.916 ms | 1.285 ms | 0.663 ms |
| release=1999/03/31 - 3,216 movies, 2 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 10.799 ms | 10.762 ms | 11.399 ms | 10.676 ms | 10.704 ms | 10.169 ms | 10.325 ms |
| bm_sqlite | 1.912 ms | 1.462 ms | 1.453 ms | 1.163 ms | 1.151 ms | 1.157 ms | 4.858 ms |
| bm_sqlite_index | 0.366 ms | 0.350 ms | 0.355 ms | 1.883 ms | 0.364 ms | 0.345 ms | 0.371 ms |
| bm_tracker | 11.707 ms | 11.548 ms | 11.433 ms | 11.425 ms | 11.465 ms | 11.450 ms | 11.912 ms |
| bm_tracker_flat | 1.661 ms | 1.511 ms | 1.513 ms | 1.714 ms | 1.507 ms | 1.612 ms | 1.510 ms |
| bm_xapian | 1.278 ms | 1.364 ms | 1.359 ms | 1.821 ms | 1.994 ms | 1.429 ms | 3.192 ms |
| release=1999/03/31 - 17,251 movies, 3 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 12.485 ms | 12.281 ms | 12.323 ms | 11.981 ms | 12.137 ms | 11.808 ms | 12.552 ms |
| bm_sqlite | 8.247 ms | 6.259 ms | 6.007 ms | 6.300 ms | 6.125 ms | 5.958 ms | 5.921 ms |
| bm_sqlite_index | 0.379 ms | 0.297 ms | 0.285 ms | 0.284 ms | 0.252 ms | 0.254 ms | 0.251 ms |
| bm_tracker | 61.537 ms | 60.815 ms | 61.014 ms | 60.821 ms | 61.013 ms | 60.850 ms | 60.820 ms |
| bm_tracker_flat | 11.063 ms | 8.021 ms | 8.414 ms | 8.690 ms | 7.798 ms | 7.811 ms | 8.313 ms |
| bm_xapian | 5.545 ms | 4.561 ms | 4.956 ms | 4.388 ms | 4.321 ms | 4.687 ms | 4.396 ms |
| release=1999/03/31 - 121,587 movies, 12 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 14.005 ms | 14.031 ms | 12.792 ms | 14.354 ms | 12.736 ms | 13.862 ms | 13.374 ms |
| bm_sqlite | 64.517 ms | 61.783 ms | 61.669 ms | 62.418 ms | 61.377 ms | 61.326 ms | 62.036 ms |
| bm_sqlite_index | 9.994 ms | 0.403 ms | 0.358 ms | 0.351 ms | 0.368 ms | 0.363 ms | 3.368 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 62.160 ms | 62.760 ms | 56.630 ms | 60.929 ms | 54.310 ms | 53.189 ms | 58.016 ms |
| bm_xapian | 29.180 ms | 28.239 ms | 28.080 ms | 28.054 ms | 27.777 ms | 27.615 ms | 27.505 ms |
| title=The Matrix - 9 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 9.248 ms | 8.929 ms | 9.139 ms | 9.455 ms | 9.609 ms | 9.128 ms | 9.110 ms |
| bm_sqlite | 0.163 ms | 0.163 ms | 0.163 ms | 0.161 ms | 0.160 ms | 0.163 ms | 0.164 ms |
| bm_sqlite_index | 0.167 ms | 0.165 ms | 0.178 ms | 0.164 ms | 0.164 ms | 0.163 ms | 0.165 ms |
| bm_tracker | 0.733 ms | 0.484 ms | 0.475 ms | 0.478 ms | 0.481 ms | 0.475 ms | 0.476 ms |
| bm_tracker_flat | 0.575 ms | 0.400 ms | 0.380 ms | 0.382 ms | 0.379 ms | 0.387 ms | 0.379 ms |
| bm_xapian | 0.226 ms | 0.197 ms | 0.194 ms | 0.191 ms | 0.191 ms | 0.194 ms | 0.190 ms |
| title=The Matrix - 1,099 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 10.758 ms | 10.578 ms | 10.083 ms | 10.230 ms | 10.555 ms | 10.630 ms | 10.831 ms |
| bm_sqlite | 0.728 ms | 0.524 ms | 0.504 ms | 0.501 ms | 0.506 ms | 0.500 ms | 0.501 ms |
| bm_sqlite_index | 0.218 ms | 0.203 ms | 0.201 ms | 0.198 ms | 0.199 ms | 0.277 ms | 0.233 ms |
| bm_tracker | 5.906 ms | 5.409 ms | 5.426 ms | 5.453 ms | 5.420 ms | 5.410 ms | 5.344 ms |
| bm_tracker_flat | 1.685 ms | 1.471 ms | 1.455 ms | 1.455 ms | 1.440 ms | 1.448 ms | 1.439 ms |
| bm_xapian | 0.445 ms | 0.385 ms | 0.398 ms | 0.373 ms | 0.836 ms | 0.451 ms | 0.374 ms |
| title=The Matrix - 3,216 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 10.138 ms | 10.144 ms | 10.652 ms | 10.124 ms | 10.169 ms | 10.070 ms | 10.547 ms |
| bm_sqlite | 2.587 ms | 1.180 ms | 1.198 ms | 2.202 ms | 1.411 ms | 1.422 ms | 1.288 ms |
| bm_sqlite_index | 0.323 ms | 0.300 ms | 0.306 ms | 0.298 ms | 0.493 ms | 0.304 ms | 0.304 ms |
| bm_tracker | 15.097 ms | 14.727 ms | 14.692 ms | 14.759 ms | 14.840 ms | 14.888 ms | 14.791 ms |
| bm_tracker_flat | 3.727 ms | 3.529 ms | 3.558 ms | 3.545 ms | 3.504 ms | 3.504 ms | 3.520 ms |
| bm_xapian | 0.432 ms | 0.353 ms | 0.345 ms | 0.349 ms | 0.348 ms | 0.342 ms | 0.692 ms |
| title=The Matrix - 17,251 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 12.462 ms | 11.871 ms | 12.020 ms | 11.603 ms | 12.469 ms | 11.850 ms | 11.823 ms |
| bm_sqlite | 6.093 ms | 6.096 ms | 6.130 ms | 5.941 ms | 5.882 ms | 5.959 ms | 6.789 ms |
| bm_sqlite_index | 1.431 ms | 0.304 ms | 0.201 ms | 0.200 ms | 0.201 ms | 0.199 ms | 0.199 ms |
| bm_tracker | 79.019 ms | 78.831 ms | 78.514 ms | 78.491 ms | 79.423 ms | 78.506 ms | 78.759 ms |
| bm_tracker_flat | 19.173 ms | 20.160 ms | 19.373 ms | 19.043 ms | 18.992 ms | 18.961 ms | 19.207 ms |
| bm_xapian | 0.422 ms | 0.344 ms | 0.339 ms | 0.335 ms | 0.336 ms | 0.339 ms | 0.345 ms |
| title=The Matrix - 121,587 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 13.367 ms | 13.395 ms | 12.906 ms | 13.164 ms | 12.856 ms | 13.348 ms | 12.862 ms |
| bm_sqlite | 62.625 ms | 61.341 ms | 61.296 ms | 61.361 ms | 61.248 ms | 61.195 ms | 61.607 ms |
| bm_sqlite_index | 0.328 ms | 0.312 ms | 0.300 ms | 0.303 ms | 0.301 ms | 7.473 ms | 0.330 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 138.148 ms | 131.762 ms | 130.937 ms | 131.431 ms | 131.471 ms | 130.975 ms | 130.770 ms |
| bm_xapian | 0.833 ms | 0.681 ms | 0.674 ms | 0.687 ms | 0.665 ms | 0.667 ms | 0.665 ms |
| director=Quentin Tarantino - 9 movies, 1 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 9.112 ms | 9.540 ms | 9.671 ms | 9.258 ms | 9.510 ms | 9.597 ms | 9.126 ms |
| bm_sqlite | 0.273 ms | 0.243 ms | 0.243 ms | 0.241 ms | 0.239 ms | 0.239 ms | 0.239 ms |
| bm_sqlite_index | 0.282 ms | 0.243 ms | 0.257 ms | 0.244 ms | 0.245 ms | 0.243 ms | 0.337 ms |
| bm_tracker | 0.810 ms | 0.547 ms | 0.542 ms | 0.544 ms | 0.541 ms | 0.554 ms | 0.567 ms |
| bm_tracker_flat | 0.606 ms | 0.410 ms | 0.398 ms | 0.403 ms | 0.383 ms | 0.459 ms | 0.392 ms |
| bm_xapian | 0.215 ms | 0.204 ms | 0.195 ms | 0.197 ms | 0.195 ms | 0.208 ms | 0.194 ms |
| director=Quentin Tarantino - 1,099 movies, 9 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 11.574 ms | 12.063 ms | 11.780 ms | 12.169 ms | 12.253 ms | 11.801 ms | 11.939 ms |
| bm_sqlite | 13.775 ms | 8.831 ms | 9.583 ms | 9.506 ms | 9.193 ms | 9.154 ms | 9.452 ms |
| bm_sqlite_index | 13.332 ms | 8.963 ms | 10.201 ms | 9.064 ms | 8.925 ms | 10.095 ms | 8.756 ms |
| bm_tracker | 5.173 ms | 4.644 ms | 4.546 ms | 4.473 ms | 4.552 ms | 4.472 ms | 4.455 ms |
| bm_tracker_flat | 1.137 ms | 0.857 ms | 0.851 ms | 0.855 ms | 0.844 ms | 0.842 ms | 0.844 ms |
| bm_xapian | 0.898 ms | 0.878 ms | 0.893 ms | 0.873 ms | 1.000 ms | 0.882 ms | 0.843 ms |
| director=Quentin Tarantino - 3,216 movies, 10 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 12.343 ms | 12.175 ms | 12.307 ms | 12.004 ms | 12.235 ms | 12.947 ms | 12.194 ms |
| bm_sqlite | 40.967 ms | 37.867 ms | 38.607 ms | 37.618 ms | 37.487 ms | 37.124 ms | 38.147 ms |
| bm_sqlite_index | 43.470 ms | 36.820 ms | 37.027 ms | 36.779 ms | 36.957 ms | 36.585 ms | 36.782 ms |
| bm_tracker | 13.707 ms | 13.074 ms | 12.763 ms | 12.740 ms | 12.848 ms | 12.779 ms | 12.855 ms |
| bm_tracker_flat | 2.015 ms | 1.559 ms | 1.531 ms | 1.525 ms | 1.530 ms | 1.545 ms | 1.511 ms |
| bm_xapian | 0.933 ms | 0.886 ms | 0.908 ms | 2.944 ms | 1.023 ms | 1.030 ms | 0.799 ms |
| director=Quentin Tarantino - 17,251 movies, 13 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 13.704 ms | 14.413 ms | 14.331 ms | 15.096 ms | 14.026 ms | 14.492 ms | 14.205 ms |
| bm_sqlite | 307.961 ms | 308.146 ms | 308.565 ms | 307.942 ms | 308.342 ms | 308.387 ms | 308.991 ms |
| bm_sqlite_index | 308.011 ms | 305.433 ms | 305.347 ms | 304.567 ms | 304.920 ms | 305.567 ms | 304.404 ms |
| bm_tracker | 72.690 ms | 72.075 ms | 72.005 ms | 71.999 ms | 71.938 ms | 71.946 ms | 72.108 ms |
| bm_tracker_flat | 7.489 ms | 6.996 ms | 6.877 ms | 6.987 ms | 7.148 ms | 7.088 ms | 7.021 ms |
| bm_xapian | 1.087 ms | 0.963 ms | 1.010 ms | 1.151 ms | 1.088 ms | 0.965 ms | 0.959 ms |
| director=Quentin Tarantino - 121,587 movies, 14 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 13.546 ms | 13.955 ms | 13.981 ms | 13.854 ms | 13.740 ms | 14.114 ms | 15.816 ms |
| bm_sqlite | 4,752.853 ms | 2,793.690 ms | 2,800.197 ms | 2,795.611 ms | 2,800.578 ms | 2,794.765 ms | 2,801.000 ms |
| bm_sqlite_index | 2,806.890 ms | 2,789.648 ms | 2,788.729 ms | 2,791.168 ms | 2,788.102 ms | 2,790.845 ms | 2,789.475 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 47.801 ms | 46.303 ms | 46.701 ms | 46.640 ms | 46.467 ms | 46.862 ms | 46.448 ms |
| bm_xapian | 20.098 ms | 1.260 ms | 1.176 ms | 1.162 ms | 1.156 ms | 1.149 ms | 1.148 ms |
| T* - 9 movies, 9 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 17.303 ms | 17.072 ms | 16.927 ms | 16.539 ms | 16.816 ms | 16.758 ms | 16.797 ms |
| bm_sqlite | 0.547 ms | 0.544 ms | 0.547 ms | 0.541 ms | 0.541 ms | 0.546 ms | 0.544 ms |
| bm_sqlite_index | 0.553 ms | 0.549 ms | 0.554 ms | 0.553 ms | 0.658 ms | 0.547 ms | 0.544 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 2.525 ms | 2.302 ms | 2.423 ms | 2.415 ms | 2.372 ms | 2.356 ms | 2.305 ms |
| bm_xapian | 3.086 ms | 2.871 ms | 2.947 ms | 2.893 ms | 3.104 ms | 3.022 ms | 3.126 ms |
| T* - 1,099 movies, 1,098 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 358.775 ms | 355.830 ms | 350.287 ms | 349.816 ms | 347.998 ms | 356.585 ms | 347.143 ms |
| bm_sqlite | 64.679 ms | 142.927 ms | 143.776 ms | 142.847 ms | 145.319 ms | 147.244 ms | 135.600 ms |
| bm_sqlite_index | 62.383 ms | 151.941 ms | 144.456 ms | 144.108 ms | 141.330 ms | 173.728 ms | 169.799 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 199.108 ms | 213.355 ms | 202.793 ms | 196.659 ms | 194.937 ms | 194.708 ms | 195.267 ms |
| bm_xapian | 419.323 ms | 516.929 ms | 677.357 ms | 591.280 ms | 599.091 ms | 643.124 ms | 497.649 ms |
| T* - 3,216 movies, 3,204 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 842.413 ms | 968.828 ms | 958.367 ms | 1,002.383 ms | 932.222 ms | 946.388 ms | 1,004.821 ms |
| bm_sqlite | 327.669 ms | 415.921 ms | 440.198 ms | 408.543 ms | 432.575 ms | 537.572 ms | 412.061 ms |
| bm_sqlite_index | 310.218 ms | 432.201 ms | 413.221 ms | 404.165 ms | 479.691 ms | 431.758 ms | 436.533 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 727.867 ms | 711.970 ms | 722.046 ms | 717.685 ms | 719.927 ms | 713.077 ms | 713.843 ms |
| bm_xapian | 1,442.238 ms | 1,470.821 ms | 1,415.183 ms | 1,392.164 ms | 1,437.493 ms | 1,464.149 ms | 1,520.747 ms |
| T* - 17,251 movies, ≥ 10,000 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 3,006.139 ms | 3,127.174 ms | 3,136.617 ms | 3,151.197 ms | 3,131.469 ms | 3,141.155 ms | 3,056.497 ms |
| bm_sqlite | 1,481.321 ms | 1,388.573 ms | 1,468.062 ms | 1,533.263 ms | 1,422.012 ms | 1,442.638 ms | 1,456.166 ms |
| bm_sqlite_index | 1,346.717 ms | 1,451.410 ms | 1,508.228 ms | 1,411.643 ms | 1,460.563 ms | 1,514.390 ms | 1,391.342 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 2,945.536 ms | 2,938.230 ms | 2,957.149 ms | 2,959.569 ms | 2,972.291 ms | 2,933.668 ms | 2,936.655 ms |
| bm_xapian | 3,391.825 ms | 3,490.307 ms | 3,474.203 ms | 3,483.310 ms | 3,560.886 ms | 3,505.060 ms | 3,398.937 ms |
| T* - 121,587 movies, ≥ 10,000 matches | |||||||
| t1 | t2 | t3 | t4 | t5 | t6 | t7 | |
| bm_lucene++ | 3,627.408 ms | 3,625.588 ms | 3,546.610 ms | 3,508.233 ms | 3,599.160 ms | 4,597.857 ms | 4,101.686 ms |
| bm_sqlite | 2,182.548 ms | 2,109.730 ms | 2,109.812 ms | 2,121.573 ms | 2,104.320 ms | 2,117.912 ms | 2,145.342 ms |
| bm_sqlite_index | 2,108.863 ms | 2,103.648 ms | 2,131.009 ms | 2,132.823 ms | 2,109.655 ms | 2,137.286 ms | 2,106.779 ms |
| bm_tracker | - | - | - | - | - | - | - |
| bm_tracker_flat | 8,757.130 ms | 9,316.640 ms | 8,708.298 ms | 8,781.584 ms | 8,788.042 ms | 8,699.770 ms | 8,721.099 ms |
| bm_xapian | 4,805.474 ms | 4,528.004 ms | 4,692.763 ms | 4,640.065 ms | 4,618.215 ms | 4,647.170 ms | 4,674.588 ms |
Full Text Search Engines, Part I
Openismus asked me to research how best to index media files and provide full text searching. For the last two years, I have used Tracker for this kind of thing. I like Tracker, but I want to avoid being biased. Therefore, I decided to evaluate alternatives.
Performance is an obvious requirement. We also want to provide a library to permit other applications to access the data we collected. Therefore, SQLite and Lucene (in its C++ incarnations) are obvious contenders. Lucene++ is an emerging project that got suggested by Mikkel Kamstrup Erlandsen at Canonical. QtCLucene is a bit special: So far Qt doesn't provide official support for this library and doesn't install its headers files. Still it is used by Qt's help system, which makes QtCLucene a widely deployed and well tested C++ implementation of Lucene.
Sadly, the big names like MySQL or PostgreSQL do not fit: MySQL's embedded server library is licensed under GPL (instead of LGPL, for instance), which greatly limits legal use cases. PostgreSQL doesn't provide any embedding at all. Because I enjoy RDF and SPARQL I also wondered about testing the Redland RDF libraries, but I found that they don't provide any full text search at all.
Contenders
- Tracker 0.14.0-2ubuntu1
- SQLite 3.7.9-2ubuntu1
- Lucene++ 3.0.3.4 (e28b15b02ff9de2208965e9af8eb80983380cdcd)
- QtCLucene as provided by libqtcore4 4.8.1-0ubuntu4
- Xapian 1.2.8-1
Test Platform
- Ubuntu 12.04
- Intel Core 2 Duo P8400 (2.26GHz), 4 GiB RAM
- HDD: WDC WD2500BEVT-2, encrypted (aes-cbc-essiv:sha256)
Test Scenario
To get somewhat realistic data I've fetched a copy of the Internet Movie Database from ftp.fu-berlin.de. Since it is a quite huge database (about 1 GiB when compressed with gzip) I've extracted a few subsets of it: All movies with at least 500,000, 50,000, 15,000 1,000 and 50 user votes. This data then got imported into a fresh instance of Tracker, SQLite, Lucene++ and QtCLucene. After that I've run a few trivial full text searches:
"The Matrix"
Fast Furious
"Star Trek" OR "Star Wars"
Lord Rings King
Keanu Reeves
"Brad Pitt" OR "Bruce Willis"
Jackson Samuel
Quentin Tarantino
Wachowski
Thomas Neo Anderson
Neo
Each scenario was repeated five times. To avoid cache effects each engine was tested after the others for a given set of parameters. Tracker was tested with two different scenarios: First I've tried the Nepomuk based multimedia ontology shipped with Tracker (nmm), after that I've also tried a flattened ontology (fmm) which is a much better fit for the data model of pure full text search indices like Lucene. All engines where used with default parameters. No magic configuration options or pragmas were applied. Feel free to repeat the tests with your own optimized settings, and report the results when doing so.
Source Code and Data
The source code of these benchmarks can be found at Gitorious, and can be built using autotools or qmake. Just like you prefer.
Run src/benchmark.sh to reproduce the tests. The log files can be turned into a CSV file by running src/report.sh.
The charts have been created with LibreOffice:
It should be sufficient to copy the CSV data into the data sheet of
logs/report.ods.
Select "English (USA)" as language in the import dialog, to ensure that
numbers are recognized properly. After that you still might have to sort
the rows by the columns suite, num_movies and experiment. The data
sorting dialog provides an option for marking the first row as column header.
Update: I've pushed some more changes, so to exactly reproduce the results
discussed in this post, checkout the tags releases/0.1 for the
initial results, and releases/0.2 to also include Xapian tests.
Results

| Lucene++ | QtCLucene | SQLite | Tracker (Nepomuk) | Tracker (Flat) | Xapian | |
|---|---|---|---|---|---|---|
| 9 | 6.84 ms | 3.46 ms | 43.2 ms¹⁾ | 36.2 ms | 7.13 ms | 52.561 ms¹⁾ |
| 1,099 | 2.93 ms | 5.72 ms | 3.63 ms | 26.4 ms | 3.32 ms | 5.94 ms |
| 3,216 | 2.32 ms | 5.37 ms | 2.87 ms | 21.2 ms | 2.89 ms | 4.97 ms |
| 17,251 | 1.98 ms | 5.10 ms | 2.50 ms | 14.2 ms | 2.19 ms | 3.58 ms |
| 121,587 | 1.21 ms | 5.21 ms | 3.96 ms²⁾ | 10.4 ms | 1.80 ms | 2.30 ms |
- The dataset is tiny. I suspect that some startup overhead is invalidating this result.
- We might see first signs of a memory barrier here.

| Lucene++ | QtCLucene | SQLite | Tracker (Nepomuk) | Tracker (Flat) | Xapian | |
|---|---|---|---|---|---|---|
| 9 | 2.23 ms | 0.572 ms | 0.159 ms | 1.33 ms | 0.494 ms | 0.271 ms |
| 1,099 | 6.06 ms | 2.18 ms | 1.17 ms | 90.3 ms | 1.67 ms | 0.955 ms |
| 3,216 | 8.72 ms | 3.41 ms | 1.55 ms | 335 ms | 3.57 ms | 1.50 ms |
| 17,251 | 13.1 ms | 5.33 ms | 1.92 ms | 2,380 ms | 7.52 ms | 2.35 ms |
| 121,587 | 17.0 ms | 44.2 ms | 17.4 ms | 86,800 ms | 19.885 ms | 18.1 ms |
| Complexity | O(log(n)²) | O(log(n)²) | O(log(n)²) | O(n log(n)) | O(sqrt(n)) | O(log(n)²) |
QtCLucene, SQLite, Tracker (Nepomuk) and Xapian seem to hit a memory barrier at 121,587 movies.

| Lucene++ | QtCLucene | SQLite | Tracker (Nepomuk) | Tracker (Flat) | Xapian | Raw Data | |
|---|---|---|---|---|---|---|---|
| 9 | 80 KiB | 76 KiB | 368 KiB | 4.4 MiB | 2.3 MiB | 424 KiB | 104 KiB |
| 1,099 | 4.9 MiB | 4.8 MiB | 32 MiB | 59 MiB | 29 MiB | 21 MiB | 7.8 MiB |
| 3,216 | 12 MiB | 12 MiB | 75 MiB | 114 MiB | 53 MiB | 47 MiB | 18 MiB |
| 17,251 | 39 MiB | 39 MiB | 257 MiB | 305 MiB | 155 MiB | 170 MiB | 57 MiB |
| 121,587 | 154 MiB | 154 MiB | 1.0 GiB | 906 MiB | 521 MiB | 683 MiB | 198 MiB |
Discussion
The performance of Tracker is devastating. Entirely not the result you want to see for a project you actually like and enjoy using. You clearly see the bad impact of the many joins it must perform for mapping the ontologies and queries to SQL. This is surprising since in my opinion Nepomuk's multimedia ontology is a quite typical ontology. Also the datasets itself are not that huge for something that initially started as file indexer. The (sadly quite unrealistic) flat ontology might give a few hints on how to improve Tracker. The execution times with this ontology are comparable with them of the other engines. Still the observed (and only estimated) complexity class for executing queries is worrying.
Lucene++ shines at writing data, it is just incredibly fast when building its index. In contrast to the other engines it even spends less time per movie, the bigger its index grows. It is noticable slower than QtCLucene or SQLite when looking up terms. Still I'd call an average time of 17 ms for finding matches within 122k documents a quite good achievement. Additionally Lucene++ seems to be implemented sufficiently efficient to not hit any memory barrier yet at this scale.
QtCLucene is about two times slower than Lucene++ or SQLite when building its index, still the index size doesn't seem to impact insertion time per movie. It pays back with good lookup performance. It is about 2 to 3 times faster than Lucene++. It seems to hit a memory barrier at 122k documents.
SQLite's performance is just in the middle between Lucene++ and QtCLucene when building the index. When searching terms it even beats QtCLucene, again by a factor of 2 to 3.
Lucene++ and QtCLucene consume less disk space than the original files, most probably because the raw data stores movies and artists in separate files. The records in this files must be linked with each other. Lucene just does this more efficiently. SQLite and Tracker consume significantly more disk space than Lucene or the original data. Partly this can be explained by fields being stored twice: Once in their table and another time in the full text search index. Column indexes also play a role. Still this doesn't explain why disk consumption is significantly higher.
Xapian's characteristics are quite similar to those of SQLite. It doesn't hit yet that memory barrier that affects SQLite's insert performance at 122k documents, maybe because it consumes only 2/3 of the disk space. Enjoyed its API for being much closer to modern C++ than any other engine. It gives more low-level access to all the FTS mechanics: For instance you have to attach values and feed the indexer yourself. Also you have to deal with token prefixes. Details that Lucene just hides behind a Field class and its attributes. Not sure yet, what approach I prefer.
Conclusion
Tracker is out. Lucene++, QtCLucene and SQLite are quite comparable in terms of performance, with Lucene++ being the fastest engine when building the index, and with SQLite being the fastest when performing full text searches. There are some first signs that Lucene++ is more memory efficient than its competitors. This needs further investigation. Also we should investigate capabilities for doing point and range searches, instead of full text searches.
They call us crazy, but we store Contacts in Tracker
Visa authorities playing bad games with Chandni gave me the chance to talk about the QtContacts tracker plugin, which I and others where working on for that past few months. In case you've missed that early talk, here are the slides.
Interesting to just watch George from KDE to talk about similar things.

Painting Strategy in MeeGo's Virtual Keyboard
I've never been happy with the conclusion in the influential blog post "Qt Graphics and Performance — The Cost of Convenience" by Gunnar Sletta: If you want performance, downgrade QGraphicsView to a mere canvas with a single QGraphicsItem. It defeats the whole purpose of decomposing the problem into many small QGraphicsItems and is therefore entirely counter-intuitive. One might be quick to ask what Qt Graphics View is good for (and one might find the answer here), but instead I would like to present an alternate solution to Gunnar's which reaches the same performance but embraces the very nature of Qt Graphics View.
Flashback
Gunnar took the example of a virtual keyboard (which doesn't really do anything, it just sits there and paints itself). And as it happens, I've been working on a real virtual keyboard. At one point, our painting strategy was exactly following the advice of Gunnar, but once new requirements piled up, it became increasingly harder to deal with that painting strategy.
The new requirements were custom key sizes, flexible layouts, custom font sizes for each label, custom font colours, custom key background and custom key labels. And all that dynamically during runtime please.
In MeeGo 1.3, applications will be able to make us of that. For example, when writing an email, the application can temporarily turn the enter key on the virtual keyboard into a "Send" button. Should the user enter an invalid email address, the application might choose to disable the "Send" button until the mistake has been fixed.
Reinventing QGraphicsItems - or not?
As the required logic for keys becomes more complex, so does the data structure representing keys. The paint method we used in the single keyboard item was about to become a real mess. Especially after I added geometry to keys, I knew that I was, in fact, reimplementing QGraphicsItems. But I could not reuse them because of performance reasons! Or was it possible that Gunnar's blog post pointed into the wrong direction?
So I thought about my dilemma and figured out that the actual constraint is rather the amount of paint event calls per frame, and eventually, how much screen area needs updating. With the single QGraphicsItem approach, every change triggers a scheduled repaint for the whole keyboard - even if it's only the label position in one key.
I then realized that only pressed or customized keys would need to be actively painted, and there we had it: An alternate painting strategy that blits the idle keyboard view into a single QGraphicsItem but lets active keys paint themselves as overlays. The rest of our team was sceptical about the idea, so Viacheslav Sobolev - a colleague of mine during my 11 months in Helsinki - had to first improve the performance benchmarks before the team would accept the fact that indeed we managed to keep the same performance, and even improve it for Qt's raster engine.
The Strategy
Even if the general idea of overlay items is very simple, I summarized the following steps that one needs to take care of:
- Aggressive caching - only update when absolutely needed. Analyse what changes and encapsulate change into QGraphicsItems.
- Keys keep changing. Therefore, model keys as QGraphicsItems. Qt Graphics View will not propagate paint events to hidden items, so we hide all keys by default.
- Allow the keyboard item to query a pixmap version from each key. Therefore, the keys act as painting delegates for the keyboard. However, they can also actively paint themselves when activated.
- Paint the idle state of the keyboard into a single QGraphicsItem and avoid calling update on it. Use QGraphicsItem::DeviceCoordinateCache as its caching strategy. The idle state itself can be described as the keyboard's background together will all keys in idle state (that is, neither pressed nor customized).
- Activate keys if pressed or customized, which lets themselves paint as overlays over the static keyboard item.
- Caveat#1: Make sure your implementation can paint the idle state of the keyboard at all times. There's no need to waste more memory with manual caching here - just make sure you can query pixmaps from a key's idle state (even if the key is not idle at that particular moment). The documentation regarding QGraphicsItem caching modes is incorrect.
- Caveat#2: Handle mouse and touch events on the parent keyboard item and propagate state changes to the keys. Most of the keys will usually be inactive, so they cannot receive such events directly.
As a bonus, you can find the basics of the strategy in a modified benchmark of the original blog post. I added two new options "-overlayitems" and "-fullscreen". The latter is useful for N900 where it helps to get rid of system compositor noise.
Is MeeGo Keyboard conquering GNOME3?
You'd probably like me more if I had spend my time testing the upcoming release of MeeGo Tablet UX on this shiny new WeTab (I ♥ Openismus), but Jon couldn't resist to test-drive GNOME3 first. Well, and since we like our virtual keyboard so much, we just had to make it work of course.
Jon also wrote about the features shown in the video.
New features in MeeGo Keyboard
This week we published new features for the MeeGo Keyboard, including accent popups (activated through long-press on certain keys) and support for Chinese Input Methods. The latter still requires an IM engine that supports Cangjie for example to be really useful (not provided by MeeGo Input Methods).
Ubuntu users can easily test it out, as I updated the packages in the MeeGo Input Methods PPA. I also enabled all language layouts by default for those packages, as we yet have no real UI for the Desktop to control such settings. You will have to install the additional layouts package manually though (package is called meego-keyboard-zh-layouts).
One can switch to another language by swiping to the left or right (on the keyboard itself), which is admittedly a bit stressful with a mouse (and nigh impossible with touchpads!). It works quite nicely though when using touch screens. I am sorry for the strange transition animation, it apparently needs some adjustments for the desktop.
We'll also have some GNOME3-related news soon, so stay tuned!
MeeGo Input Methods for your desktop
I've been working on the MeeGo Input Methods project (codename "Maliit") for nearly a year now. The project provides a pluggable framework for input methods. It comes with a reference plug-in for a multi-touch-capable virtual keyboard.
It had bothered me that, even though our source code was available at gitorious.org, there were nearly no contributions from the outside. I attribute that to the difficulties when it comes to compiling all required components, but also to the lack of perceived openness.
We now offer packages for Ubuntu, through the Openismus PPA for MeeGo Input Methods, thanks to the packaging efforts of Jon. This is an offer for those interested in developing input methods for MeeGo. It's not targeting end-users (yet?) - a virtual keyboard might not be useful on a desktop, unless you have a touchscreen.
We also have a public wiki and if you wish you can visit us in #meego-inputmethods @ freenode.net.
The limits of "community driven"
So I decided to finally upgrade from Karmic to Lucid. Easy task, only should take some time: After all I did a plain default install. Well, ok - I've used the alternative ISO as my employer requires me to use full disk encryption. It's a good idea anyway, no idea why this feature isn't the default for all installs.
So after some time the install finishes, the computer reboots and grub the kernel tells me it cannot find the root fs.
Obviously nobody cared to check if upgrade with encrypted disks works. Too boring. Still would not have happend with some professional OS vendor. Especially not four mounths after the initial release.
EPIC FAIL! Thank you. Not.
Update: In case you need proof for "too boring" check this rotting bug report.
About missing flights
Frequent fliers will know this, but to everyone else: If you ever should be that stupid to miss a flight, call the airline where you booked that missed flight as soon as possible. Otherwise they will cancel your inbound flight and bet on charging you an incredible amout of money for your previously reserved seat. For me they lost their €700 bet yesterday. Just took another airline.
Using DBus as lock-daemon
Recently I found this comment in the source code I am working with:
// what if both processes read in the same time and write at the same time, no increment
Please! Don't do such things! Don't just leave such comments in hope someone else will come around and will fix later. Please take the time to apply a locking mechanism.
Obvious choice when dealing with files would be to create a lock file. Unfortunately creating a file based lock isn't trivial, especially when you deal with portable software. Questions arise like: Is that system call really atomic in this context? Is the feature even available? Which characters can be used in the file name? Can it start with a dot? How much time does it take? Will it reduced lifetime of my flash media?
Uh, my head is spinning! Stop!!!
Somewhat understandable that my colleague just left a comment.
Well, fortunatly there is a more trivial solution for that problem. If you have DBus available. More experienced DBus hackers already know and will feel bored now, but to everyone else:
DBus service names can be used as locking mechanism!
Implementation would look similar to this:
bus.request_name('de.taschenorakel.locker.example')
bus.wait_for_name('de.taschenorakel.locker.example')
now some work...
bus.release_name('de.taschenorakel.locker.example')
Easy, not? Doesn't hit the file system. Fully implemented. Ready to use. Daily tested at your desktop.
One implementation of that concept can be found in qtcontacts-tracker.
Stylish New Business Cards
Our new business cards have arrived:

Stylish. Professional. No Nonsense. Hopefully the attributes associated with Openismus...
Well done Kat!
Application Theming Tricks
From time to time applications need custom theming rules. Especially when the project has professional UI designers involved. So how to achieve this with GTK+?
Trivial Theming
Most easy and very wrong:
if (gdk_color_parse ("pink", &color))
gtk_widget_modify_bg (widget, GTK_STATE_NORMAL, &color);
This will break and look childish as soon as your users use a custom color scheme.
Better:
static void
style_set_cb (GtkWidget *widget,
GtkStyle *old_style)
{
GtkStyle *style = gtk_widget_get_style (widget);
if (gtk_style_lookup_color (style, "SecondaryTextColor", &color))
gtk_widget_modify_bg (widget, GTK_STATE_NORMAL, &color);
}
static void
my_widget_init (MyWidget *widget)
{
g_signal_connect (widget, "style-set", G_CALLBACK (style_set_cb), NULL);
}
This will allow theme designers to override your color choice. Notice that you'll have to update those color overrides when the theme changes. Btw, the "style-set" signal is emitted when the widget is realized, therefore you don't have to manually invoke the callback during widget construction.
Guess it's also worth to mention that Hildon provides convenience API for simple theming requirements.
Complex Problems
So for simple requirements GTK+ (and Hildon) have reasonable API support. Things become troublesome when your designers invent rules like "this widget has a rounded border and drop shadow, but only within buttons". Obviously border and drop shadow radius should be themeable and therefore are implemented as style properties, but how to impose this rule?
You could scan the widget hierarchy when choosing default values for your style properties:
button = gtk_widget_get_ancestor (widget, GTK_TYPE_BUTTON);
gtk_widget_style_get (widget, "border-radius", &border_radius, NULL);
if (button)
border_radius = (button ? 12 : 0);
You'll quickly notice the flawed hard coded default value. Also such things are hard to override in theme files. So it's probably better to apply a custom theming rule via gtk_rc_parse_string():
static void
my_widget_class_init (MyWidgetClass *class)
{
...
gtk_rc_parse_string
("style 'my-widget-style-clickable' {"
" MyWidget::border-radius = 2"
"}"
"widget_class '*.<GtkButton>.MyWidget'"
"style 'my-widget-style-clickable''");
...
}
Application Theme Files
Looks like a perfect solution, until you realize that this rule is applied after all rules loaded from gtkrc files!
So how to inject this rule before the user's theming rules? This was a big question to me until I've found gtk_rc_add_default_file(). Well almost: This function only adds files to the end of the search path. Therefore it suffers from the same issues as gtk_rc_parse_string(). Fortunately the API author was smart enough to also provide gtk_rc_get_default_files() and gtk_rc_set_default_files(). Those functions can be used to apply application specific theming rules, which can be overwritten by the user - drum roll please:
static void
inject_rc_file (const char *filename)
{
char **system_rc_files, **p;
GPtrArray *custom_rc_files;
system_rc_files = gtk_rc_get_default_files ();
custom_rc_files = g_ptr_array_new ();
g_ptr_array_add (custom_rc_files, g_strdup (filename));
for (p = system_rc_files; *p; ++p)
g_ptr_array_add (custom_rc_files, g_strdup (*p));
g_ptr_array_add (custom_rc_files, NULL);
gtk_rc_set_default_files ((gpointer) custom_rc_files->pdata);
g_strfreev ((gpointer) g_ptr_array_free (custom_rc_files, FALSE));
}
int
main (int argc,
char **argv)
{
...
inject_rc_file (PKGDATADIR "/gtkrc." PACKAGE);
gtk_init (&argc, &argv);
...
}
Update: Benjamin Berg just pointed out that priorities can be assigned to styles. So the following should work fine:
gtk_rc_parse_string
("style 'my-widget-style-clickable' {"
" MyWidget::border-radius = 2"
"}"
"widget_class '*.<GtkButton>.MyWidget'"
"style : lowest 'my-widget-style-clickable''");
Awesome, little know feature.
Well, and it is a phone!
So after months of denial it turns out, that the/we Maemo folks really built a phone!

Week 1 or Comparing Disruptiveness of Communication Methods
So my first trainee week at Openismus is over, and it has been a pleasant surprise.
Day 1 started with a nice team-building exercise where David and I could show our understanding of real Office Origami: IKEA(TM) furniture assembly. Once that was done we were ready to set up our newly arrived PCs. I appreciate that we were offered a choice over the keyboards and such - those are the details that count! Since I could not figure out how to make it boot from a DVD first (press ENTER, d'oh) I lost 5 coolness points. I lost at least 7 points more for initially refusing to use a 64bit Linux Distro. But that's OK, really - I plan to make my personal coolness points wrap around at -2^31 some day.
The remaining 2 hours of the day I spend with customizing my desktop on that box, and I think I see a pattern here: usually the first three things I do on Ubuntu are
- installing Epiphany (together with customizing its toolbars to only use one line, as it was in ancient versions),
- deleting GNOME's bottom panel (the workspace selector fits well in the top panel),
- reconfiguring the GNOME shortcuts. Honestly, who wants to SHIFT+CTRL+ALT+ARROW_KEY just for moving windows around? Once the Windows(TM) key is properly configured we can get from 4 to 3 keys here. Using the Windows(TM) key for managing windows - I'd like to see that becoming a standard in GNOME.
Day 2-4 was all about compiling GNOME from scratch. For me it felt like using Gentoo again, only that building from trunk versions breaks far more often. But the rest - missing dependencies, poorly written Makefiles, outdated version requirements for build tools - it's all the same (Disclaimer: I don't blame jhbuild, it does a good job, and jhbuild shell is awesome).
Over all these days I enjoyed the friendly atmosphere in the office itself. The common breakfast, the chit-chat at the coffee machine, and - most importantly - no phone calls. Which brings me to what I originally wanted to blog about:
Comparing Disruptiveness of Communication Methods
It goes without saying that among IRC, Instant Messaging, E-mail and phone calls, the latter is the most disrupting mean of modern office communication. Internet Relay Chat (IRC), if used mostly in a passive way (e.g., only answer after being pinged, and only in defined intervals), could then be the other end, being the least disruptive. If we were to include face-to-face communication here (which we don't since I feel I have not much to say about that), it would probably top phone calls even. A study from an IT support organization (a citation from [this paper] (http://jcmc.indiana.edu/vol13/issue1/garrett.html)) "revealed that workers spent an average of just 11 minutes on a task before being interrupted or moving on to a new task, and more than half the interruptions (57%) were unrelated to the task at hand" [1].
So if the frequency of interruptions is given, is there at least a choice in disruptiveness? Let us define disruptiveness as the costs to switch back to the previous task after an interruption, and let's say the Loss of Productivity caused by interruptions equals to
#Interruptions x (Usage_IRC x Costs_IRC + Usage_IM x Costs_IM + Usage_E-mail x Costs_E-mail + Usage_Phone x Costs_Phone)
It then becomes obvious that we want to reduce disruptiveness of each communication type but also the usage of the more disrupting types.
Given the nature of phone calls there is little one could do. If the phone is an irremovable part of the workplace (helldesks, for example) one would probably want to choose a pleasant ring tone, or make it use visual alarms only. Another solution could be to filter every call with an answering machine, and then checking for voice mails only once or twice per day. While this obviously defeats the purpose of phone calls (high priority, synced) it may still be effective under some conditions. Another not-so-obvious part is that the recipient of a phone call has no estimate about the length and the type of information (in-depth with note-taking, simple notification, ...) of the call once she decides to answer. One could postulate that the recipient is almost always unprepared, hence resulting in high disruptiveness.
E-mail immediately solves 2 problems, compared to phone calls: async by default and note-taking (and because this topic is so complex that some make a profession out of it, you can haz your Wikipedia page). However, reading E-mails can be costly. Once the mail comes with attachments it requires additional tasks to be performed by the recipient.
IRC can probably do everything E-mail could do for you, though you first need to find the right networks and channels. It has often been said (I am glad this is only my blog and not Wikipedia, else you would read the dreaded [C... n...] here) how useful this approach is for giving support, since those seeking support would naturally find those that can give support by themselves. Also, the matching that happens between the two groups is random, meaning that the support-seeker is usually not addressing one person directly. Support-givers can then decide whether to interrupt themselves (by answering) or not (by waiting for someone else to answer). In that regard, the disruptive costs for support-givers would be nearly zero, amortized.
Now how about IM? One could think IM is just as disruptive as phone calls: A new message arrives, notification alert pops up in the center of your screen, perhaps even a sound is played. Imagine that happens for each of the gazillion conversations you are currently in. On top of that, you probably only chat with your friends and family anyways.
Luckily we can choose to only use sanely configured IMs, so notification alerts will all be combined in one icon in the systray, without any bubble boxes or sounds. What we get over IRC is improved tracking of the people we want to talk to. We can check their status before sending a message (online, busy, chatting), and that is an important part! If we care to not disrupt our co-workers overly we would naturally send messages only if their status indicates they are online and not busy. Also, the signal-to-noise ratio is dramatically improved, compared to IRC. The paper - "IM = Interruption Management" - which I already mentioned above - concludes that 1) IM users will report lower levels of disruptive interruption than non-users, 2) IM users will have the same overall level of work communication as non-users.
So if you see your sys-admin at lunch tomorrow ask her why IM (or IRC) is filtered at your company, but phone calls are not and whether she pays for the resulting productivity loss in your company from her own salary.
[1] González, V. M., & Mark, G. (2004). "Constant, constant, multi-tasking craziness:" Managing multiple working spheres. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 113-120). Vienna, Austria: ACM Press.
Missed Anniversary
Great, I've missed my anniversary! Hard to believe, but exactly 13 months ago I've started working for Openismus. Interesting, eventful, insightful months. Let's do even better for the next year.
LIRC Properties hosted on gnome.org now
Murray did the neccessary steps for moving GNOME LIRC Properties to GNOME infrastructure now last week. Bugs are tracked in Bugzilla now, and latest source code can be found here: http://svn.gnome.org/svn/gnome-lirc-properties/.
Thanks infrastructure for doing this job quickly. Thanks Fluendo for allowing to move the code.
Just wondering now, when GNOME's awesome I18N team find the module and submits the first translations.
GNOME LIRC Properties
Some time ago Murray announced this little GUI to configure IR remote controls we Openismus guys did for Fluendo. Last Friday the program finally got uploaded to Ubuntu's package archive, since yesterday its package is in Hardy's Universe repository.
The program can detect your IR receiver and suggest a remote control. Success of that choice can be directly tested in the main dialog. There also is support for customizing remote control configuration files, and for learning key codes.
Unfortunately the D-Bus backend of the program only supports Ubuntu right now, well but as usual in such situations: Patches welcome, I guess.
Also we'd like to have a much cleaner interface for using the features of irrecord. Currently we have to simulate a pseudo-terminal (for stdio buffering reasons) and heuristically guess from irrecord's reactions what happens. A nice library for driving LIRC's hardware detection capabilities would be nice to have. Unfortunately we could no do this due various time constraints (Hardy schedule, project schedule, ...).
I have to thank Stefan, Siggi, Mario and Scott for helping to get that package trough the REVU process despite Hardy being in feature freeze already.
I have to thank Andreas for drawing the application icon.
GTK+ Hackfest is over
So the GTK+ Hackfest really ended yesterday. I absolutely enjoyed having all you guys arround me. I am absolutely confident, that all the ideas collected and exchanged here will lead to some great GTK+ 3.0 release. It cannot be said often enough, but even in our highly connected world face-to-face meetings are invaluable for our work. This can be understand as invitation to order your tickets for Istanbul right now, if you are uncertain still.
The big BVG strick was a littlebit unfortunate - so we could not use all the U-Bahn stations arround us, to visit other exciting places. Well, but considering how exhausted many were at the end of the week, this reduced mobility probably was a good thing. We could have risked to party too much.
Guess I've learned alot about event organization during that week. Most importantly and as a hint for any successor: Next time I'll sleep on-site instead of driving 50km per day to sleep at home.
I have to thank Behdad for asking me to organize this event and Murray for allowing me to organize this on Openismus time. Thank you dad for helping with the chairs. Thank you Michael for being there on Wednesday. Thank you Ulrike and Marc-André your patience.
Launchpad's Personal Package Archives
Probably inspired by openSUSE's Build Service Launchpad provides this excellent Personal Package Archive (PPA) service.
It's quite exciting service: You just request your PPA, upload your source packages via dput command, and get them built within freshly setup Ubuntu environments. When building succeeds your packages are placed within some apt-get repository. No further maintenance steps required.
Unfortunately the service is not perfect yet: It seems to be driven by cron jobs and you only get feedback per mail. This makes the entire process very slow and inefficient. You have to find tiny tasks to fill up the 10 to 20 minute delays. Those tasks are rare, so at least I either end up idling/chatting or doing larger tasks. Very distracting. As a result it takes several hours until I know, that some PPA package is ready for use. I absolutely do not like this working style, I prefer staying focused on my current task, instead of wildly jumping arround between tasks.
Well, but this is just a minor annoyance, compared to the fact that PPA doesn't seem to allow you to delete/overwrite the upstream tarballs forming the base of your self-made Ubuntu packages. The idea of immutable upstream tarballs works nicely if you assume, that you aren't the maintainer of the software uploaded to your PPA: In that scenario you take an upstream tarball and apply patches until it works properly in Ubuntu, but you have no serious chance to modify the upstream tarball.
Things are different when you use PPA to publish software you maintain. In that case it happens, that you use PPA to package your release candidate and test it on your various Ubuntu VMs. During tests you spot minor show stoppers, commit fixes to your code repository, and run "make dist" to get another release candidate. After that you increase the PPA release number in debian/changelog and invoke "debuild -S -sa" to get the next interation of your Ubuntu package. Next step would be uploading that fresh package to your PPA, but that fails since suddenly the MD5 hash of your upstream tarball has changed. Of course! The bugfixes! Duh!
Only solution I've found so far: Skip an upstream release to make PPA happy. Not very convincing. Maybe the LazyWeb has some suggestions to work around that limitation?
Update: Yes, I didn't update the tarball's release number. It's just a release candidate. I'd expect a personal package archive to support release candidates. I prefer PPA over "debuild -i", since the build services attached to PPA provide a clean build environment.
GtkCalendar Details
Hmm... Too much whining on this blog recently. So let's demonstrate some small, but exiting code I've wrote for Openismus, and which will show up in GTK+ 2.16:
/**
* gtk_calendar_set_detail_func:
* @calendar: a #GtkCalendar.
* @func: a function providing details for each day.
* @data: data to pass to @func invokations.
* @destroy: a function for releasing @data.
*
* Installs a function which provides Pango markup with detail information
* for each day. Examples for such details are holidays or appointments.
* That information is shown below each day when #GtkCalendar:show-details
* is set. A tooltip containing with full detail information is provided,
* if the entire text should not fit into the details area, or if
* #GtkCalendar:show-details is not set.
*
* The size of the details area can be restricted by setting the
* #GtkCalendar:detail-width-chars and #GtkCalendar:detail-height-rows
* properties.
*
* Since: 2.16
*/

libepc 0.3 released
A new version of the Easy Publish and Consume Library has found its way on GNOME's FTP servers.
This release has seen many API changes to make the library more versatile. For instance streaming capabilities and support for ZeroConf bookmarks have been added. Also noteworthy is the new service monitor, which provides a very easy method for finding libepc (and DNS-SD) services.
To see the ridiculous long list of see the NEWS file.
To download, visit http://ftp.gnome.org/pub/GNOME/sources/libepc/0.3/.
Totem Playlist Sharing
To test versatility of libepc I've written a Totem plugin for sharing the current playlist.
Seems we've got another cool feature now, and libepc is in much better shape. Code is commited to Totem trunk, and also depends - surprise - on libepc trunk.
Disclaimer: Yes, demonstrating features of a media player without any audio track sucks, but Istanbul 0.2.2 hangs after recording with sound, and when running its pipeline with gst-launch I get alot of warnings about missing audio data plus some unpleasant, stuttering video.
Use gtk-doc to find required package version
Usually I am quite unsettled when adding package requirements to README files and configure scripts. AI never really know if the version I specified is correct.
This morning I had enough of this uncertainness and hacked a short Python script to extract that information from our find gtk-doc manuals: gtkdoc-check-symbol-versions.
Libxml2 cannot parse the HTML gtk-doc generates, so the script reads the gtk-doc manuals line by line, throws regular expressions at them and puts the matches into a hash table. When finished it reads the files it shall check, strips comments, splits the tails into words and looks up the minimum package version for every word.
Maybe the gtk-doc guys can bundle this (or a polished version of the script) with their package.
Usage is like this:
$ gtkdoc-check-symbol-versions \
-p glib -p gobject -p libsoup -p gnutls -p gtk */*.[ch]
gtk-2.6 required for gtk_combo_box_get_active_text in examples/consumer-ui.c
glib-2.4 required for g_markup_printf_escaped in examples/consumer-ui.c
glib-2.6 required for g_option_context_new in examples/list-resources.c
gobject-2.4 required for G_DEFINE_TYPE in libepc/consumer.c
glib-2.10 required for g_slice_new0 in libepc/dispatcher.c
glib-2.12 required for g_hash_table_remove_all in libepc/dispatcher.c
gtk-2.10 required for gtk_window_set_deletable in libepc-ui/entropy-window.c
glib-2.14 required for g_timeout_add_seconds in tests/framework.c
$
What!? I need glib-2.14 just for g_timeout_add_seconds in the regression tests? Crap, I better change this.
Simple description of libepc
Seems I failed to give a simple description of libepc. People still ask what it really is.
Well, its nothing fancy. It's really just a public hash table.
- plus encryption
- plus authentication
- plus discovery
Sounds trivial to implement, but isn't.
At least the discovery part: Current DNS-SD APIs provide a robust state monitoring API, and evilly also expect you to use it - were you just want a simple publish and forget API.
The easy to use hash table on top of HTTPS (server-side, client-side) really isn't that hard to do. Mainly an API design challenge. HTTPS - instead of D-Bus or XMPP or plain TCP/IP or whatever holy gral - was choosen for purly pragmatical reasons: Libsoup is a reasonable platform library, providing many of the features we wanted (encryption and authentication).
You believe libepc should use another transport mechanism or have other ideas? Well, I am writing about libepc to get feedback. One interesting point came from Havoc Pennington: He seems to believe this public hash table service should be shared by the entire desktop session, instead of setting up hash tables for every application.
Easy Publish and Consume Library
As GNOME remembers the meaning of the 'N' in its name, more and more GNOME applications become network aware. GNOME provides excellent libraries for implementing network capabilities, nevertheless quite some effort is needed to glue them together to provide a smooth user experience.
When starting to add network support to Glom, we at Openismus decided it would be time to stop this frequent reinvention of the wheel, and put some of this glue code into an easy to use library. So the Easy Publish and Consume Library (libepc) was born.
The Easy Publish and Consume Library is intended to provide an easy method to
- publish data per HTTPS
- announce that information via DNS-SD,
- find that information
- and finally consume it
When using that library you shall not bother about boring details like properly using Avahi or setting up SSL/TLS certificates. All you should do to let your application publish information to the network, is writing the following three commands:
publisher = epc_publisher_new ("Easy Publisher Example", NULL, NULL);
epc_publisher_add (publisher, "Hello", "World!", -1);
epc_publisher_run_async ();
Once you have done this, your application runs a HTTPS server, announcing itself to the local network via DNS-SD and publishes the information World! under the key Hello.
Retrieving values shouldn't be harder:
consumer = epc_consumer_new_for_name ("Easy Publisher Example");
value = epc_consumer_lookup (consumer, "Hello", &length, &error);
As those examples are working now, we Openismus guys would highly welcome some feedback - I even added e-mail notifications to my site to allow real discussion.
Well, maybe it should be noted, that libepc isn't meant to replace desktop configuration daemons like GConf or Ryan's dconf initiative. This library just is a simple method for publishing byte arrays and maybe UTF-8 strings to your network. No support for integers, booleans, no schemas. It's fully up to you, how to use the data published.
- Source code is available: git clone http://taschenorakel.de/git/libepc
- Alternatively you can get tarballs: libepc-0.2.tar.bz2 (Signature), git snapshots
- Maybe more important at this stage: The Reference Manual
- And code examples: list-resources.c, lookup-resource.c, simple-publisher.c, consumer-ui.c
This locations at taschenorakel.de are just temporary locations, if nobody disagrees I'd like to upload this library to the GNOME servers as soon as possible.
Canvas Review
As announced by Murray already I am working for Openismus now. First task I got assigned was providing patches for some binding glitches in GooCanvas. Seems I complained too much about GooCanvas' code during that task. So Murray asked me to review GooCanvas, HippoCanvas and libccc to figure out which canvas would fit best for diagrams and reports in Glom.
As we have that nice canvas overview on the GNOME Wiki already, I added additional information to that wiki page. Thank you Emmanuele for updating the clutter records, btw.
HippoCanvas dropped out quickly for lacking printing support and more importantly for its boxed, HTML/CSS like layout. This approach doesn't seem to be flexible enough to support things like table relationship diagrams.
GooCanvas seems to be the most complete canvas, but it has some issues:
By provinding a function to pass a custom cairo context to the rendering routines its does only the trivial part of printing, but leaves pagination, which is the hard part of printing, to its users. Pagination usually is not trivial as you have to avoid splitting text lines and pictures on page boundaries:

Techniques to avoid this glichtes include moving of short paragraphs and pictures to the next page, cutting paragraphs and text at cell and line boundaries, rewrapping text and rescaling pictures. I could not find hooks for doing this in GooCanvas, so when using GooCanvas for Glom at least the hooks for such features would have to be added to that canvas.
As it seems grid fitting is implemented for tables only. This causes a blurry and unprofessional look. It also lowers rendering performance as unnecessary antialiasing and resampling operations have to be applied.
![]()
GooCanvas currently also has problems in its code architecture. Its data model structures are all public which puts quite some burden on maintainers willing to provide a stable API. This can be fixed easily as most/all of the fields exposed are be accessable by getters and setters.
A more serious problem is the optional model/view split which causes huge amounts of code duplication in GooCanvas. Currently GooCanvas gives you view items and model object, which in fact just are containers for the very same model data structure. View and model objects provide the same set of properties and methods to manipulate that shared data structure. This obviously is wrong: When there are view and model objects, all the model data should be stored in the model objects. At least there shouldn't be separate model data structures. The view objects should reference their model objects. But even that approach would be too complicated in my opinion, as in my opinion the canvas items itself are the model and the canvas, cell renderers, ... are the views. When the items have to store view specific data this issue should be handled internally (and not exposed to the user). I am pretty sure this change would reduce the lines of code GooCanvas needs by at least one third.
libccc is quite similiar to GooCanvas, which is not very suprising considering both want to provide a cairo based replacement for GNOME Canvas. libccc puts high emphasis on creating clean API and modern programming techiques. Usually this is a good thing, but considering the shear amount of other projects Sven is working on, it also directly leads to the biggest problem of libccc: Although being much older than GooCanvas 1), libccc has still lists "work in progress" for many features.
It is ahead of GooCanvas with its clean model/view separation. Beside the mandatory canvas widget the libraries provides additional view implementations like a GtkCellRenderer or a camera item (picture-in-picture). In addition to the features of GooCanvas it provides inline text editing and moves the burden of grid fitting from the canvas items to the views.
Conclusion
Without requiring printing support and high quality (pixel aligned) rendering the choice would be simple: Take GooCanvas, as it seems to be more complete. When if its feature matrix fits your needs, it's most reasonable to use this canvas widget.
Things shift when you figure out that you have add features. In that case GooCanvas' unnecessary code complexity hurts. You have to choose between investing alot of time on working with that complex and fragile code or convincing Damon (and the other people on the mailing list) of dropping the optional model/view split first - which also would need quite some time. Considering my weak mailing list discussion skills I'd most probably choose finishing libccc, instead of wasting that time on discussing and refactoring GooCanvas - but that's my personal opinion. People with better mailing list skills and better contact to Damon probably will vote different.
1) In opposition to common believe libccc really was started before GooCanvas. Sven demonstrated me libccc on 20C3 which was held in 2003, whereas GooCanvas was started in 2005.



