Tag Archives: Performance

About Kerika’s software performance.

Expanding our use of Lazy Loading

Our introduction of lazy loading, as part of our recent redesign, was originally limited to just three columns: Backlog (for Scrum Boards), Done, and Trash.

We figured that these columns were most likely to be very long, and would therefore benefit the most from implementing lazy loading.

This worked well; so well, in fact, that we have expanded our use of lazy loading to work with all columns, across all Task Boards and Scrum Boards.

The practical effect of this should be to reduce the time needed by the browser to load large boards, for all users, on all kinds of computers.

Enjoy.

A transition to “lazy loading”

With our big UI redesign, launched a couple of weeks ago, we have started using lazy loading of cards in an effort to improve performance, particularly with very large boards.

Background:

Most Kerika boards tend to be small, or moderate: up to 100-200 cards in size.  A few users, however, have very large boards: several thousand cards in size!

And this is not because we have users who are tracking thousands of work items simultaneously; it’s just that some users have been continuously using the same board for years to track all their work.

For people who use the same board over several years, the number of items in the Done or Trash columns can eventually number in the thousands.  Displaying such large boards was already difficult in our old architecture: we had underestimated how many cards some boards might contain, so our old design downloaded all the cards on a board every time it was opened, and then created a DOM for each card!

This meant that, for very large boards, the browser had to create thousands of DOMs before it could even display the board.  This was obviously not a sustainable model.

What we did:

With our redesign, we have laid the groundwork for a better architecture using two related concepts in lazy loading:

  • For columns that we anticipate being very large — the Done, Trash and Backlog columns for Task Boards and Scrum Boards — the browser now fetches only a small number of cards, say 10-20, from the server.  (With our old design the browser would fetch every card, for every column.)
  • Fetching fewer cards means the amount of traffic between the browser and the server decreased dramatically, but it didn’t solve the performance problem by itself. We also changed our browser code to reuse DOMs instead of creating new DOMs.  By reducing the total number of DOMs created and maintained within the browser by the Kerika app, we are able to reduce Kerika’s overall browser footprint while significantly improving performance.

Here’s an example of lazy loading of the Done column:

Lazy loading of Done column
Lazy loading of Done column

On this board, the Done column contains 163 cards, but when the board is opened only 10 are shown.  Since these are the 10 most recently done cards, this works great for most users, most of the time.

If the user really wanted to see something that was done a long time ago, they can simply scroll down the Done column, as they would have with our old design as well.

As a user scrolls down, more cards are fetched automatically from the server.  Slightly more cards are fetched from the server than are likely to be displayed, e.g. the browser may fetch 15 cards from the server even when it expects to display only 10.

This helps avoid the perception of delay when the browser needs to fetch more cards, since it will already have 5 more cards stored in memory to show as the user begins scrolling, giving it time to fetch another 15 before the user has finished scrolling.

We also decided to use lazy loading on the Home page: with our new design we display more information about the state of each board than we did previously, and the cards themselves are much larger than before.  This means we are unlikely to show the full set of boards to any user at any time, so lazy loading is a natural choice for this view.

Lazy loading of Home
Lazy loading of Home

Finally, with our most recent update (launched two days ago), we have extended our use of lazy loading to include the Not Scheduled column in the Planning Views, where you can pivot your view of a Task Board or Scrum Board to see all the cards organized in terms of due dates.

Here’s an example of a board where there are a very large number of unscheduled cards:

Lazy loading of Not Scheduled
Lazy loading of Not Scheduled

The Not Scheduled column only fetches and displays 10 cards at a time even though there are over 200 cards that are not scheduled.  Since the browser (on this laptop) can only show 3-4 cards at a time, there isn’t any point in fetching all 200 cards: just fetching and displaying 10-15 at a time does the trick!

Why, yes, Kerika did get faster

If you are wondering whether Kerika is faster than it used to be, yes, it is.

We made a change in our software architecture — the way each board would connect to the server and ask for all its cards and all the updates on these cards — that has reduced the total upload of data.

This was actually a significant improvement in upload speed: about 10x.

Since upload speeds are frequently much slower than download speeds, even on broadband connections, this should help load larger boards much faster.  And on mobile connections this should help reduce the amount of data consumed.

We should probably stop saying “unlimited cards”…

OK, so our website has long stressed the word “unlimited”, and maybe that’s not such a great idea…

Most people have boards with a few dozen cards.

Some folks have boards with a couple of hundred cards.

Very few people have boards with up to 1,000 cards.

It turned out that one of our users had a board with nearly 4,200 cards, of which over 4,000 were in the Trash.

And that’s not good! A board with several thousand cards on it is going to take a long time to load, because each card has many different attributes to it: details, chat, assignments, status, history, etc.

And when someone with a very large board uses Kerika, this can cause very unexpected loads on the servers, particularly in terms of network I/O and CPU utilization.

This is what it looks like when someone suddenly opens a board with thousands of cards on it:

CPU spikes
CPU spikes

We have talked about this before, and now we need to do something about it…

Most of the time, boards get very big because they are very old: stuff piles up in the Done column, or, as in this case, in the Trash column.

Having very large boards can impact performance: unlike, say, email which you may be accustomed to leaving piled up in your Inbox for years on end, Kerika’s cards are “live object”: even when a card is in the Done or Trash columns, it is constantly listening for updates from the server, because it could be moved out of Done or Trash at any time by someone on the team, or have some of it’s attributes changed.

For example, even though a card is “Done” or “Trashed”, it could have its details updated, or new chat added to it.

This is different from email messages, which are essentially “dead objects”: once you get an email and read it, it doesn’t update after that, so it can sit for years in your Inbox without trying to get fresh updates from the mail server.

So, when you have 4,200 cards on a single board, you have that many live objects trying to get updates from the server at the same time, and that’s not good for your browser or our server!

(Imagine your laptop trying to read and display 4,200 emails at the same time, and you will get an idea of the problem…)

Having very large boards is not a good idea from a workflow or process management perspective, and so perhaps Kerika needs to do something about it: we need to think about how we can help our users improve their workflow, while also avoiding these CPU spikes.

A couple of ideas that we are considering:

  • Start warning users when their boards are getting too big.
  • If boards hit some threshold values (which we have yet to figure out, maybe it’s 1,000 cards?) force the user to reconfigure their board so they don’t affect the quality of the service for everyone.

 

Why we are integrating with Box, Part 9: Final QA

(The ninth in a series of blog posts on why we are adding integration with Box, as an alternative to our old integration with Google Drive.)

We have been doing internal testing (“eating our own dogfood”) of Kerika+Box for the past three weeks, and the results have been much better than we expected!

We have found very few bugs so far, which is great — it’s feels like a huge vindication of our decision to invest several Sprints in improving our internal QA processes, clearing the backlog of old bugs, and generally improving our software development processes with code reviews across the board, for even the smallest changes.

In other words, we didn’t move fast and break things: we moved slowly and broke nothing. Which makes sense when you have paying customers who rely upon your product to run their businesses…

Since Kerika makes it really easy to have multiple backlogs in a single account, we put all the OAuth and infrastructure work in a separate backlog, allowing a part of the team to concentrate on that work somewhat independently of other, more routine work like bug fixes and minor usability updates.

And, as before, put every feature in a separate git branch, making it easy to merge code as individual features get done.

Here’s what our Box QA board looks like, right now:

Box QA board
Box QA board

The user interface for Kerika+Box is essentially the same as for Kerika+Google, with a few quirks:

Box requires more frequent logins: Google provided us with relatively long-lived refresh tokens, so a user could close a Kerika browser tab and reopen it a day later and log right back in.

With Box, you are going to see a login screen much more often, along with a screen asking you to re-authorize Kerika as a third-party app that can access your Box account.

This is kind of irritating, but apparently unavoidable: from what we have found on Stack Overflow, Bug views this as a feature rather than a bug.

The other, really big difference is that files are edited offline rather than in the browser itself: when you click on the Edit button, you will end up downloading a local copy of the file, using Microsoft Office for example, and then when you do a Save of that file, your latest changes are uploaded automatically to the cloud.

Here’s what you see when you open a file attached to a card on a Kerika board, when you use Kerika+Box:

Example of opening a file within Box
Example of opening a file within Box

This works great most of the time, except when two people are making changes simultaneously: in that situation, Google’s in-browser editing seems a lot more convenient.

On the other hand, downloading local copies of files means that you get the full power of Microsoft Office, and we know that’s very important for some of our users, e.g. consultants dealing with complicated RFPs or government users dealing with official documents.

Performance also seems a little less than Google Drive, although we would stress that this is highly variable: while Google Drive files generally open within 1-3 seconds in a new browser tab, they can take much longer if Google’s servers are slow.

Overall, we are very pleased with Kerika+Box: we are planning to do all of our new development with this new platform, to continue eating our delicious dogfood 😉

The full series:

Making it faster to create new projects, and to copy/paste them

Kerika is starting to get used in larger organizations more: big global companies, and state and local governments, and as we move into these types of user communities we are finding that people are more likely to create large numbers of projects.

(On average, people create 4-5 projects if they are relatively passive users of Kerika, while more active users — like Project Leaders — might create 30-40 projects over a couple of months.)

To make this smoother, we have been improving the performance of the creation of new projects, as well as the copy-paste function for projects.

It turns out both are closely related, so improving one should help with the other!

What a CPU spike looks like

We have been experiencing a CPU spike on one of our servers over the past week, thanks to a batch job that clearly needs some optimization.

The CPU spike happens at midnight, UTC (basically, Greenwich Mean Time), when the job was running, and it looks like this:

CPU spike
CPU spike

It’s pretty dramatic: our normal CPU utilization is very steady, at less than 30%, and the over a 10-minute period at midnight it shoots up to nearly 90%.

Well, that sucks. We have disabled the batch job and are going to take a closer look at the SQL involved to optimize the code.

Our apologies to users in Australia and New Zealand: this was hitting the middle of their morning use of Kerika and some folks experienced slowdowns as a result.