Tuesday, April 21, 2009

Webmail Clients = Desktop Mail Clients Soon

Sources
(1) USIST 2007 - Enabling Efficient Orienteering Behavior in Webmail Clients by Stefan Nusser, Julian Cerruti, Eric Wilcox, Steve COusins, Jerald Schoudt, Sergio Sancho

This paper was discussing the advantages of desktop email clients vs browser based email clients. Then, they discussed their implementation of their own browser client that tries to have the same advantages of desktop clients while keeping the advantages of browser clients.

Limitations of Browser Based Clients
  1. scrolling
  2. searching
  3. sorting
These functions are better implemented in desktop clients. Browser based clients shift the computational load to the server. The browser client would operate only on a small subset of the user's mailbox and whenever the user wants to look at the next page of email or search or sort the email list, the client would have to get the new data from the server, which will take some time depending on the network speed.
_______________________________________________________

They have implemented a client called BlueMail, whose goal was to "gain the same independence from network and server performance and ultimately similar performance characteristics as a desktop mail application while running inside a web browser" (1).

Purpose for Email

As many studies have shown, email is no longer just used for communication purposes. In a corporate setting, people also manage personal information and tasks in the clients as well. Users use the inbox and folders as to do lists and frequently scroll and sort through the list. A study found that sort-by-headers (with pivoting) was first in the list of most useful email features, while instant search came in second.

They used the term orienteering, which "involves navigating to a search target by a series of small, local steps that leverage contextual knowledge and include both keyword search and browsing by meta-data" (1).

Got to be fast
In web-based clients, response time is very important because can be affected by factors such as network delay. In order to keep the user's flow of though uninterrupted, they determined that response time for sort, search, or switching folders must be less than a second. Response time for scrolling through a list of emails must be less than half a second.
__________________________________________

For the purpose of the paper, they focused on mailbox sizes of 10,000 messages but also looked at the fringe case of 50,000.
__________________________________________

Implementation
BlueMail use the traditional three tier architecture of web applications:browser, mid-tier, backend. The research paper focused on the mid-tier aspect and the client, and BlueMail uses existing IBM products to cover the backend tier.

The mid-tier service is to interact with the backend datastores and to provide an interface client for components. It was implemented as a lightweight J2EE application. The mid-tier service performs "data-transformation and integration services, allowing the client to offload CPU-intensive operations to the mid-tier if they are not time critical." (1)

The client application follows the design of an "object-oriented model-view-controller application" (1)
___________________________________________
They also discussed some of the classes that were used and what they did. However, I will not discuss them here.
___________________________________________
Performance

Display and Scrolling
BlueMail was designed to maintain a local cache of all the message headers. In order to shorten the time needed to show all the messages, they designed the Document Object Model (DOM) to contain only a subset of the headers, only 3 times the amount of headers that would fit on a page. They padded this structure with two large elements on both sides to determine the size of the task bar.

To scroll, they "captured the scroll event, calculated the rows that should be visible at the new position, render the new message headers in the DOM and recalculate the size of the padding elements. This had the effect of reducing the number of rows to create. Also, it could even be accelerated by reusing the existing DOM tree and modifying the attributes of the individual nodes. They also decided to suppress page refresh until the scrollbar is still. In order to have some visual feedback, while the user is scrolling, there is an overlay displaying the value of the sorted column (so it would display the date or the sender, depending on which column is sorting the list).

Sorting
They had 4 choices of how to implement the sort: full javascript array sort, incremental javascript array sort, index merge sort, or ranked AVL tree. The AVL tree was by far the fastest. However, the tradeoff was higher memory usage. A rough estimate of memory consumption for an AVL tree of 10,000 entires was about 1 MB of memory, and this consumption would grow linearly with the number of entries in the index.

They also implemented a pivot function where the user would change the sort order while leaving the selected row in the viewport. The purpose is to keep the "context of the ongoing mental search operation on the new sort order" (1).

Search
They implemented a search function which processes the primary index, searching through all the subjects and creating a temporary index with the matching results. The response time was about 400ms for 10,000 messages, but rendering the index would take an additional 200ms.

Test Case

Conclusion and Opinion
I think this paper is slightly out of date since there are some email clients that does similar things already. However, most of them are not implemented to this extent. For example, the new Yahoo email system is similar to this, but if I remember correctly, they still have a limit on the number of emails per page, and the response time could be better. Gmail is similar, although I like the interface on gmail better than yahoo. The one drawback to this is that it would not work well with older computers or dialup. It may run slowly on computers that have low memory and it seems that dialup internet connectiosn would take too long for the new messages to be retrieved.