Tag Archives: AWS

About our use of Amazon Web Services.

No, not Shellshocked

The announcement by CERT yesterday that there is a vulnerability in the Bourne Shell (more commonly known as “bash”) wasn’t great news for anyone running any variant of Unix, which includes Linux and MacOS.

Linux is very widely used for modern Web servers, particularly those running on Amazon Web Serviceslike Kerika does.

There are a number of variants of Linux out there, which makes things a little harder whenever a vulnerability is announced: you have to make sure your particular variant of Linux is patched quickly.

Luckily, this problem was fixed as fast as the notorious Heartbleed bug: within a couple of hours of the report of Shellshock, Amazon and Google (and, most likely, every other cloud services provider out there) started installing patches, and so the Software-as-a-Service (SaaS) world got back into good shape very quickly.

In our own case, we use Ubuntu Linux, and they were equally swift in issuing a patch for Shellshock which we installed yesterday.

On a side-note, we are less enthusiastic about Apple’s announcement that “the vast majority of users are not at risk“.

That’s true only in a literal sense: the vast majority of Mac users don’t ever use the Terminal program to access the shell, and a lot of permissions on Macs are locked down by default (and most users never bother exploring all their administrative privileges).

But, in a practical sense this bland statement from Apple understates the actual risk faced by Mac users: a significant majority of startups use Mac for their software development, which means a critical set of Mac users are still sitting exposed!

The sooner Apple fixes this bug, the better is will be for the startup world.

Why we are integrating with Box; Part 6: The Box Option

(The sixth in a series of blog posts on why we are adding integration with Box, as an alternative to our old integration with Google Drive.)

And this brings us to Box…

We first heard of Box through cloudPWR, a Kerika partner and long-time Box reseller: Shadrack White, Dennis Brooke, and Cullen Hower suggested to us (a year ago!) that we consider integrating with Box.

Shad and Dennis were both enthusiastic proponents of Box: they had done several implementations of Box, including a very interesting use-case for Washington State’s Liquor Control Board which found itself in the business of regulating medical marijuana this year.

And Dennis in particular has been a great proponent of Kerika: he had introduced it to some of his clients.

As we looked at Dropbox and OneDrive as possible alternatives to Google Drive, Box came up repeatedly in the conversations we were having with enterprises.

It was clear that Box was treated very seriously by some folks that we considered very smart and knowledgeable — a senior director at Amazon, for example, told us (off the record) that he considered Box to be the most enterprise-ready cloud storage platform — and so we decided to take a closer look at the Box platform.

We attended Boxworks in San Francisco last summer, and were immediately struck by the differences in tone and substance between Boxworks and DBX.

While Dropbox is a consumer-oriented company with a newly developed interest in the enterprise, Box is a very enterprise-focused company (with little or no interest in consumers).

We took a close look at the Box API, and were very pleased with what we found: Box’s API was very close to what we were getting from Google Drive, which meant that a Kerika+Box integration could offer a really good user experience:

  • If you add a file to a card or canvas on a Kerika project board, it will automatically get shared with everyone who is part of the project team.
  • People’s access to project files will be automatically controlled (and updated in real-time) based upon their roles: Team Members will get read+write access; Visitors will get read-only access.
  • There will be no need for users to manually adjust or manage permissions on any files: Kerika will provide a great contextual layer on top of the cloud storage platform.

Box has another great advantage: it doesn’t have its own proprietary format for storing Word, Excel, etc.  This is a big issue for many enterprises who would like to use Kerika, but don’t want to move away from Microsoft Office format.

If we can the great Kerika user experience, with an enterprise-class cloud service, and the convenience of Microsoft Office, we think we will have a winner on our hands!

So, from a technology perspective a Kerika+Box integration makes a lot of sense. But what about from a market perspective?

When we polled some of current and prospective customers, however, the reaction was somewhat mixed:

  • Folks who were knowledgeable about Box were very supportive. (Good!)
  • Folks who were already using Box were very enthusiastic. (Excellent!)
  • Unfortunately, too many people still haven’t heard about Box… (Not so good…)

Box’s biggest challenge at the moment is name-recognition: far too many folks we talked to confuse Box with Dropbox.

The name confusion vis-a-vis Dropbox is a pretty big issue that we are betting that Box can ameliorate on its own, and Box’s pending IPO should certainly help with gaining greater name recognition and a more distinctive personality in the marketplace.

We are also hoping to build good partnerships with Box resellers, like our friends at cloudPWR, who have long-standing relationships with enterprises that would be great candidates for Kerika’s work management tools.

The full series:

 

Heartbleed: no heartache, but it did prompt a complete security review

So, here’s how we dealt with the Heartbleed bug…

We learned about the bug just like you did: through news reports early on April 7th. Heartbleed was a “zero-day” bug, and the OpenSSL team put out an updated (patched) version of the OpenSSL protocol the same day, which meant that everyone, everywhere, had to scramble to get their systems patched as quickly as possible.

(And the bad guys, meanwhile, scrambled to grab sensitive information, with the Canadian tax authorities being among the first to report that they had been hacked. Of course, “first to report” isn’t the same as “first to actually get hacked”. Most people who got hacked either never found out, or never said anything…)

Kerika uses OpenSSL too, and our immediate concern was updating the Elastic Load Balancer that we use to manage access to our main Amazon Web Services (AWS) servers: the load balancers are where OpenSSL is installed; not on the Web servers that sit behind the load balancer.

Using Amazon Web Services turned out to be a really smart decision in this respect: Amazon just went ahead and patched all their load balancers one by one, without waiting for their customers to take any action. In fact, they patched our load balancer faster than we expected!

Patching the load balancer provided critical immediate protection, and gave us the time to do a more leisurely security review of all our operations. This was long overdue, it turned out, and so we went into “housecleaning mode” for over a week.

One part of this, of course, was updating all our Ubuntu Linux machines: Canonical Software was also very prompt in releasing a patched version of Ubuntu which we loaded onto all of our development, test, and production services. So, even though the OpenSSL vulnerability had been patched at the load balancer, we also applied patches on all our development, test and production servers even though these couldn’t be directly accessed from the Internet.

Next, we decided to clean up various online services that we weren’t actively using: like many other startups, we frequently try out various libraries and third-party services that look promising. We stick with some; others get abandoned. We had accumulated some API keys for services that we weren’t using any more (e.g. we had a YouTube API key that no one could even remember why we had gotten in the first place!), and we deactivated everything that wasn’t actively been used.

Closing unneeded online accounts helped reduce our “attack surface”, which adds to our overall security.

And, of course, we changed all our passwords, everywhere. All of our email passwords, all of our third-party passwords. All of our online passwords and all of our local desktop passwords. (On a personal level, our staff also took the opportunity to change all their banking and other online passwords, and to close unneeded online accounts, to reduce our personal attack surfaces as well.)

We got new SSL certificates: from Verisign for our production load balancer, and from GoDaddy for our test load balancer. Getting a new SSL certificate from Verisign took much longer than we would have liked; getting one from GoDaddy took just seconds, but on the other hand, Verisign does have a better reputation…

We reviewed our internal security policies and procedures, and found a few places where we could tighten things up. This mostly involved increased use of two-party authentication and — most importantly — further tightening up access to various services and servers within the Kerika team. Access to our production servers is highly restricted even within the Kerika team: we use AWS’s Identity & Access Management service to restrict access using roles and permissions, even within the small subset of people who have any access to the production server.

Finally, we are adding more monitoring, looking out for malicious activity by any user, such as the use of automated scripts. We have seen a couple of isolated examples in the past: not malicious users, but compromised users who had malware on their machines. Fortunately these attempts were foiled thanks to our robust access control mechanisms which manage permissions at the individual object level in Kerika — but, like every other SaaS company, we need to be vigilant on this front.

All of this was good housekeeping. It disrupted our normal product development by over a week as we took an “all hands on deck” approach, but well worth it.

Our use of data stores and databases

A question we are asked fairly often: where exactly is my Kerika data stored?

The answer: some of it is in Amazon Web Services, some of it is in your Google Drive.

Here are the details: your Kerika world consists of a bunch of data, some of which relate to your account or identity, and some relate to specific projects and templates.

Your account information includes:

  • Your name and email address: these are stored in a MySQL database on an Amazon EC2 virtual server.Note: this isn’t a private server (that’s something we are working on for the future!); instead, access is tightly controlled by a system of permissions or Access Control Lists.
  • Your photo and personalized account logo, if you provided these: these are stored in Amazon’s S3 cloud storage service.These are what we call “static data”: they don’t change very often. If you have a photo associated with your Google ID, we get that from Google – along with your name and email address – at the time you sign up as a Kerika user, but you can always change your photo by going to your Kerika preferences page.

Then, there’s all the information about which projects and templates you have in your account: this information is also stored in the MySQL database on EC2.

  • There are projects that you create in your own account,
  • There are projects that other people (that you authorize) create in your account,
  • There are projects that you create in other people’s accounts.
  • And, similarly, there are templates that you create in your own account or in other people’s accounts.

Within each project or template you will always have a specified role: as Account Owner, Project Leader, Team Member or Visitor – Kerika tracks all that, so we make sure that you can view and/or modify only those items to which you have been given access.

Projects and templates can be Task Boards, Scrum Boards or Whiteboards:

  • In Task Boards and Scrum Boards, work is organized using cards, which in turn could contain attachments including canvases.
  • In Whiteboards, ideas and content are organized on flexible canvases, which can be nested inside each other.

With Whiteboards, and canvases attached to cards on Task Boards and Scrum Boards, all the data are stored in MySQL.

With cards on Task Boards and Scrum Boards:

  • Files attached to cards are stored in your Google Drive,
  • The title, description, URLs, tags, people and dates are stored in MySQL,
  • Card history is stored in a Dynamo database on AWS.

So, our main database is MySQL, but we also use Dynamo as a NoSQL database to store history, and that’s because history data are different from all other data pertaining to cards and projects:

  • The volume of history is essentially unpredictable, and often very large for long-living or very active projects.
  • History data are a continuous stream of updates; they aren’t really attributes in the sense of relational data.
  • History data is accessed infrequently, and then it is viewed all at once (again, an aspect of being a continuous stream rather than attributes).

It’s also important to note what we don’t store:

  • We never store your password; we never even see it because of the way we interact with Google (using OAuth 2.0)
  • We never store your credit card information; we don’t see that either because we hand you off to Google Wallet for payment.

Along the way to a better search, a deeper dive into Amazon Web Services

We have been busy building a great new Search function: the old search worked only with whiteboards, but the new search indexes absolutely everything inside Kerika: cards, chat, attachments – the whole lot.

We will talk about Search in a separate blog post; this article is about the detour we made into Amazon Web Services (AWS) along the way…

Now, we have always used AWS: the Kerika server runs on an EC2 machine (with Linux, MySQL and Jetty as part of our core infrastructure), and we also use Amazon’s Dynamo Database for storing card history – and our use of various databases, too, deserves its own blog post.

We also use Amazon’s S3 cloud storage, but in a limited way: today, only some static data, like account logos and user photos are stored there.

The new Search feature, like our old one, is built using the marvelous Solr platform, which is, in our view, one of the best enterprise search engines available. And, as is standard for all new features that we build, the first thing we did with our new Search function was use it extensively in-house as part of our final usability testing. We do this for absolutely every single thing we build: we use Kerika to build Kerika, and we function as a high-performing distributed, agile team!

Sometimes we build stuff that we don’t like, and we throw it away… That happens every so often: we hate it when it does, because it means a week or so of wasted effort, but we also really like the fact that we killed off a sub-standard feature instead of foisting crapware on our users. (Yes, that, too, deserves its own blog post…)

But our new Search is different: we absolutely loved it! And that got us worried about what might happen if others liked it as much: search can be a CPU and memory intensive operation, and we became worried that if our Search was so good that people started using it too much, it might kill the performance of the main server.

So, we decided to put our Solr engine on a separate server, still within AWS. To make this secure, however, we needed to create a Virtual Private Cloud (VPC) so that all the communications between our Jetty server and our Solr server takes place on a subnet, using local IP references like 10.0.0.1 which cannot be accessed by people outside the VPC. This makes it impossible for anyone outside the VPC to directly access the Solr server, adding an important layer of security.

To communicate between the Jetty server and the Solr server, we have started using Amazon’s Simple Queue Service (SQS).

OK, that means we add VPC to our suite of AWS services, but this started triggering a wider review of whether we should use more AWS services than we currently do. One sore point of late had been our monitoring of the main server: our homemade monitoring software had failed to detect a brief outage (15 minutes total, which apparently no one except our CEO noticed :0) and it was clear that we needed something more robust.

That got us looking at Amazon’s CloudWatch which can be used with Amazon’s Elastic Load Balancer (EBS) to get more reliable monitoring of CPU thresholds and other critical alerts. (And, along the way, we found and fixed the bug which caused the brief outage: our custom Jetty configuration files were buggy, so we dumped them in favor of a standard configuration which immediately brought CPU utilization down from a stratospheric level to something more normal.)

We didn’t stop there: we also decided to use Amazon’s Route 53 DNS service, which provides greater flexibility for managing subnets than our old DNS.

In summary, we have greatly expanded our Amazon footprint:

  • EC2 for our main Web servers, running Linux, Jetty, MySQL, with separate servers for Solr.
  • S3 for basic storage.
  • Dynamo for history storage.
  • VPC for creating a subnet.
  • SQS for monitoring.
  • CloudWatch for monitoring.
  • Elastic Load Balancer for connecting to servers.
  • Route 53 for DNS.

Something from Amazon that we did abandon: we had been using their version of Linux; we are switching in favor of Ubuntu since that matches our development environment. When we were trying to debug the outage caused by the high CPU utilization, one unknown factor was how Amazon’s Linux works, and we decided it was an unknown that we could live without:

  • First of all, why is there an Amazon Linux in the first place, as in: why did Amazon feel they need to make their own Linux distribution? Presumably, this dates back to the very early days of AWS. But is there any good reason to have a vendor-specific Linux distribution today? Not as far as we can tell…
  • It just adds unnecessary complexity: we are not Linux experts, and have no interest in examining the fine print to determine how exactly Amazon’s Linux might vary from Ubuntu.

Unless you have in-house Linux experts, you, too, would be better off going with a well-regarded, “industry-standard” (yes, we know there’s no such thing in absolute terms but Ubuntu comes pretty close) version of Linux than dealing with any quirks that might exist within Amazon’s variant. When you are trying to chase down mysterious problems like high CPU utilization, the last thing you want is to have to examine the operating system itself!

What we continue to use from Google:

  • Google login and restriction, based upon OAuth 2.0,
  • Google Drive for storing user’s local files.

All the pieces are coming in place now, and we should be able to release our new Search feature in a day or two!

Card history on project boards in Kerika

The next release of Kerika will include a bunch of bug fixes and usability improvements, as usual, but a big new feature that we hope you will find useful is Card History: every card will contain a succinct history of everything that’s happened to it, since it was created.

Here’s an example:

Card history
Card history

Our implementation of this new feature is actually kind of clever, under the covers (of course!): rather than log every action immediately we wait a little to see if the user changes her mind about the action.

So, for example, if a user moves a card to Done, and then moves it back to another column soon afterwards, the Card History doesn’t show the intermediate action since the user clearly changed her mind about whether that work item was actually done or not. In other words, the system is forgiving of user errors: an important design principle that we have tried to adopt elsewhere as well.

Because the Kerika user interface makes it so easy to make changes to your task board, a built-in delay in the history is necessary to avoid creating a “noisy” or “spammy” history.

From a technical perspective, the most interesting aspect of creating this new feature was that we expanded our infrastructure to include Amazon’s DynamoDB.

DynamoDB is a fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. This is our first foray into using NoSQL databases; up to now we had been exclusively using MySQL.

 

Agile for large and distributed teams: conversations with Al Shalloway, Mike DeAngelo and the Wikispeed team

Three great conversations about Agile and Scrum in recent days, with Al Shalloway of the Lean Software and Systems Consortium in Seattle; Mike DeAngelo, Deputy CIO of the State of Washington; and Clay Osterman and Joe Justice from Team WIKISPEED in Lynnwood. Common threads in these conversations:

  • Scaling up Scrum to large projects (e.g. the global WIKISPEED team numbers close to 300 people), and
  • Adapting Scrum for distributed teams (where people are located in multiple offices).

Agile purists might well recoil at the prospect of Scrum teams that can’t be fed with a single large pizza (the traditional rule-of-thumb for the optimal team size, still followed at companies like Amazon) or having to deal with people in multiple locations that can’t have face-to-face contact., but these are real-world problems for many organizations, and simply saying “No”, because the idea of very large or distributed teams offends one’s theology about Agile, isn’t a useful stance to take.

Increasingly, large organizations are distributed across cities, timezones, and even continents, and complex systems require large delivery teams. A pragmatic approach is necessary, not a purist one: we need to consider how we can adapt the basic principles of Scrum to meet the real-world needs of large organizations. Here are some lessons learned over the years in how to adapt Scrum for large or distributed teams:

  • Let multiple project teams push/pull items from a single Backlog, so that many small teams can work in parallel on a single system, rather than a single, large team take on the entire Backlog. This requires coordination among the various teams through a “Scrum of Scrums”: each individual team does it’s Daily Standup, and then the Scrum Masters of each team participate in a second meta-Standup where they report to each other on their particular teams’ progress and impediments.
    To succeed, you need project tools that make it very easy to have multiple teams push and pull items from a single Backlog. The project management system must make it easy for any any member of any team to have real-time visibility into the progress of every other team, so that the task of managing dependencies can be pushed down to individual team members rather than concentrated within the Scrum Masters. (Leaving it up to the Scrum Masters alone to manage all the inter-dependencies leaves you with the same single-point-of-failure that you have with traditional Waterfall approaches.)
  • Try stay within the “1 large pizza” size for individual teams. There’s a simple, practical reason why you should avoid having individual teams become much more than 8 in number: the Daily Standup takes too long, and people start to either under-report, or tune out much of the discussion.

    If a team has 20 people for example, and each person simply took 30 seconds to say what they had done, 30 seconds for what they plan to do next, and 30 seconds to describe impediments, that still adds up to a 30-minute long Standup!

    When faced with a Daily Standup that has become something of an ordeal, people tend to under-report, as a coping mechanism, and, frequently, what they under-report (under-discuss?) are the impediments.

    This can be fatal to the team’s overall success: problems and worries are not discussed very well, and eventually accumulate to the point where they become fatally large.

  • Split up the work, not the team. If your people are distributed across multiple locations, it is far better to split up the work rather than the teams: in other words, give each location a different set of deliverables, rather than try to get people working in several locations to work on the same deliverables.
    Too many organizations, particularly when they first built onshore-offshore teams, cling to the myth of “following the sun”: the idea that a team in India, for example, could work on a deliverable during Indian working hours, and then hand that work off at the end of the day to a California-based team that is conveniently 12-hours away.

    This is the myth of continuous work: the notion that the same deliverable can effectively be worked on 24 hours a day, by having two shifts of people work on it in non-overlapping timezones.This simply doesn’t work for most knowledge-intensive professions, like software development or product design.

    A huge effort is needed to hand over work at the end of each workday, and invariably there is a significant impact upon the work-life balance of the people involved: either the India team or the California team, in our example, would have to sacrifice their evenings in order to accommodate regular phone calls with the other team. Eventually (sooner rather than later), people get burned out by having their workdays extend into their evenings on a regular basis, and you are faced with high turnover.
    Splitting up the work means you can have loosely-coupled teams, where there isn’t the same burden of keeping every person aligned on a daily basis. A project tool that makes it easy for everyone to have a real-time view of everyone else’s work is essential, of course, but you no longer have to have Standups that would otherwise easily take up an hour each day.

What do you think? Let us know your best practices!

We sell donuts. It would have been great if Google+ had let us advertise when they first launched.

Some numbers out today from comScore suggesting that Google+ users are spending just 3 minutes per month using the service have grabbed a lot of attention, mostly because of the direct comparison being made to how much time users spend on Facebook (6-7 hours per month). No word on how many hours people are spending on Lamebook.

Unfortunately, these numbers sound about right. Fellow entrepreneurs who were initially psyched about G+ seem to have turned much cooler about the service, and we think it may be because of our own bewilderment that businesses were banned from creating Google+ profiles when G+ launched.

To use a wonderfully succinct comparison of social media, startups sell donuts:

Source: douglaswray on Instagram

 

Google made a strategic error in actively prohibiting businesses from creating Google+ pages last summer when the service launched. (To quote a Google manager, “we are discouraging businesses from using regular profiles to connect with Google+ users. Our policy team will actively work with profile owners to shut down non-user profile.” In other words: get lost.)

So, what lies ahead? Not necessarily doom and gloom, if Google sticks with its long-term strategy as described by Bradley Horowitz in the Wall Street Journal today:

Google+ acts as an auxiliary to Google services — such as Gmail and YouTube — by adding a “personal” social-networking layer on top of them.

This comment is consistent with what we have heard from the Google rank-and-file as well in recent months; to quote one local Googler: “Now that we have built Google+, we need to rebuild Google around Google+.”

As of now, the WSJ article has approximately 1,000 Facebook “likes”; 337 G+s; and 3,210 Tweets. Go figure.

A really obscure bug, all because Firefox on Linux doesn’t support “browser safe fonts”

We have a feature in Kerika that had been behaving more like a bug for one of our users, and the root cause of her dissatisfaction turned out to be a very obscure problem with the way Firefox runs on Linux.

First, some background: one very cool feature in Kerika is that you can see little thumbnails of pages so you can tell at a glance what each item contains. Here’s an example of a project, as it appears on a user’s “All Projects” page:

Example of a project thumbnail, as it appears on a user's My Project page
Example of a project thumbnail, as it appears on a user's My Project page

Kerika displays the project as a small thumbnail. You can see these thumbnails in several places in Kerika: for example, when you are viewing a project page where items represent sub-projects, or when you are using the Breadcrumbs to navigate back up the hierarchy of projects and pages.

To create these thumbnails, we have a “Render Server” that runs on a Linux virtual machine on Amazon’s cloud. The Render Server automatically produces a new thumbnail of your project about 3 minutes after you have finished making your changes to the page.

(Why not instantly? Because in our experience, people often make a bunch of relatively small changes within a few seconds: for example, they might move something on their page several times, fairly quickly, before they are completely satisfied with where they have placed the item. If we created a new image with every single action a user takes, we would flood people with updated images every few seconds; instead, we wait for a couple of minutes until after the user has finished making changes before updating the thumbnail of the page.)

Well, this Render Server runs Firefox 9. There are a class of fonts that are supposed to be “browser safe”: i.e. all browsers are supposed to support these fonts. This class is commonly thought to include about a dozen fonts, including Trebuchet MS, but in reality the set of browser-safe fonts may actually be much smaller. And, it turns out, Firefox 9 on Linux doesn’t support Trebuchet MS after all!

Which brings us to our user and her very obscure bug… She had been using Trebuchet MS as a font for her project pages, and her pages included a number of diamond shapes. Kerika has a feature that resizes shapes automatically to fit all the text contained within the shape, so that, for example, if you increase the font size, the shape will resize automatically to accommodate the larger footprint of the text.

Whenever our user made a changes to her page, after about 2 minutes the Render Server would automatically try to create a new thumbnail of the image. But… since Firefox on Linux doesn’t support Trebuchet MS, Firefox would fall back to the closest font it had, which was Times New Roman. The footprint of Times New Roman is quite different from that of Trebuchet MS, so the diamond shapes need to be slightly taller and less wide, and the Render Server would automatically adjust the shape when it created the thumbnail for the page.

As a result, our user found that her diamonds were changing size in some random manner, which was both puzzling and annoying. The bug was obscure indeed, because it ultimately had to do with Firefox not supporting Trebuchet MS on Linux even though that font is supported by Firefox on Windows and Macs.

Luckily, our team includes some unusually smart people! One of them spotted the quirk fairly quickly and the fix is going to be relatively easy: we are already providing our users with an extended set of fonts, and for these we provide Firefox with the necessary CSS to render the fonts. We hadn’t been providing the custom CSS for Trebuchet MS, thinking that it was already supported by Firefox, but now we will.

In case you are interested, here are the fonts we currently support:

The fonts we support for Kerika pages

 

 

The nature of “things”: Mukund Narasimhan’s talk at Facebook

Mukund Narasimhan, an alumnus of Microsoft and Amazon who is currently a software engineer with Facebook in Seattle, gave a very interesting presentation at Facebook’s Seattle offices this week. It was a right-sized crowd, in a right-sized office: just enough people, and the right sort of people to afford interesting conversations, and a very generous serving of snacks and beverages from Facebook in an open-plan office that had the look-and-feel of a well-funded startup.

Mukund isn’t posting his slides, and we didn’t take notes, so this isn’t an exhaustive report of the evening, but rather an overview of some aspects of the discussion.

Mukund’s talk was principally about data mining and the ways that Facebook is able to collate and curate vast amounts of data to create metadata about people, places, events and organizations. Facebook uses a variety of signals, the most important of which is user input, to deduce the “nature of things”: i.e. is this thing (entity) a place? Is this place a restaurant? Is this restaurant a vegetarian restaurant?

Some very powerful data mining techniques have been developed already, and this was illustrated most compelling by a slide showing a satellite image of a stadium: quite close to the stadium, on adjacent blocks, were two place markers. These marked the “official location” of the stadium, as told to Facebook by two separate data providers. The stadium itself was pin-pricked with a large number of dots, each marking a spot where a Facebook user had “checked in” to Facebook’s Places.

The visual contrast was dramatic: the official data providers had each estimated the position of the stadium to within a hundred yards, but the checkins of the Facebook users had perfectly marked the actual contours of the stadium.

Mukund’s talk was repeatedly interrupted by questions from the audience, since each slide offered a gateway to an entire topic, and eventually he ran out of time and we missed some of the material.

During the Q&A, Nikhil George from Mobisante asked a question that seemed highly relevant: was Facebook trying to create a “semantic Web? Mukund sidestepped the question adroitly by pointing out there was no established definition of the term semantic Web, and that is certainly true – the Wikipedia article linked above is tagged as having “multiple issues”– but while Facebook may not be using the specifc protocols and data formats that some might argue are indispensible to semantic Webs, one could certainly make the case that deducing the nature of things, particularly the nature of things that exist on the Internet, is the main point of creating a semantic Web.

While much of the Q&A was about technical matters, a more fundamental question occupied our own minds: at the outset, Mukund asserted that the more people interact with (and, in particular, contribute to) Facebook, the better the Facebook experience is for them and all of their friends.

This, clearly, is the underpinning of Facebook’s business model: people must continue to believe, on a rational basis, that contributing data to Facebook – so that Facebook can continue to deduce the nature of things and offer these things back to their users to consume – offers some direct, reasonably tangible rewards for them and their friends.

Presumably, then, Facebook must be taking very concrete measures to measure this sense of reward that their users experience; we didn’t hear much about that last night since that wasn’t Mukund’s area of focus, but it must surely be well understood within Facebook that the promise of continued reward for continued user interaction – which is essentially their brand promise – must be kept at all times. Has a lot of research been done in this area? (There is, of course, the outstanding research done by dana boyd on social networks in general.)

At a more practical level, a question that bedevils us is how we can improve the signal:noise ration in our Facebook Wall! In our physical worlds, we all have some friends that are exceptionally chatty: some of them are also very witty, which makes their chatter enjoyable, but some are just chatty in a very mundane way. In our Facebook worlds, it is very easy for the chatty people (are they also exceptionally idle or under-employed?) to dominate the conversation.

In a physical world, if we find ourselves cornered by a boor at a party we would quickly, determinedly sidle away and find someone more interesting to talk to, but how does one do that in Facebook? One option, offered by Mukund, would be to turn off their posts, which seems rather like “unfriending” them altogether. But we don’t want to unfriend these people altogether, we just don’t want to hear every detail of every day.

Mukund suggested that by selecting hiding individual posts, as well as “liking” others more aggressively, we could send clearer indications of preferences to Facebook that would help the system improve the signal:noise ratio, and that’s what we have been trying over the past few days.

It is an intriguing topic to consider, and undoubtedly a difficult problem to solve, because you need to weed out individual messages rather than block entire users. For example, one Facebook friend used an intermission of the movie “Thor” to report that he was enjoying the movie. It’s great that he is enjoying the movie, but this low-value update spawned a low-value thread all of its own. We don’t want this person blocked altogether; we need some way of telling Facebook that updates sent during movie intermssions are not very valuable. If Facebook misinterprets our signals and relegates him to the dustbin, we might miss a more useful notification in the future, such as a major life event or career move.

The problem seems analogous to developing a version of Google’s PageRank, but at the message level. In the example above, if a post sent during a movie intermission is of low value, it would affect the ranking of everyone who piled on to create its low-value thread.

People like us who are more technical would probably prefer a more direct way of manipulating (i.e. correcting) the signal:noise ratio. Presumably someone at Facebook is working, even as we write this blog, on some visual tools that provide sliders or other ways for people to rank their friends and notifications. One idea that comes to mind might be a sort of interactive tag cloud that shows you who posts the most, but which also lets you promote or demote individual friends.

Some email clients and collaboration tools assume that people who email you the most are the ones that matter the most, but with a social network, wouldn’t this bias have the opposite effect? Wouldn’t the most chatty people be the ones who have the least to say that’s worth hearing?

One piece of good news from Mukund is that Facebook is working on a translator: one of our closest friends is a Swede, and his posts are all in Swedish which makes them incomprehensible. Having a built-in translator will certainly make Facebook more useful for people with international networks, although it will be very interesting indeed to see how Facebook’s translation deals with the idiosyncrasies of slang and idiom.

Update: we got it wrong. Facebook isn’t working on a translator; Mukund was referring to a third-party application.

One topic that particularly intrigues, but which we couldn’t raise with Mukund for lack of time, was Paul Adam’s monumental slideshare presentation on the importance of groups within social networks. Paul argues that people tend to have between 4-6 groups, each of which tend to have 2-10 people. (This is based upon the research behind Dunbar’s Number, which posits that there is a limit to the number of friends which whom one can form lasting relationships, and this number is around 150.)

Facebook still doesn’t have groups, which is surprisingly since Paul Adam decamped Google for Facebook soon after making his presentation available online. It is a massive presentation, but fascinating material and surprisingly light reading: just fast-forward to slide 64 and the picture there sums up the entire presentation rather well.

Update: we got that wrong, too. Faceb0ok does have a groups product, and it is becoming increasingly popular within the company itself

All in all, one of the most enjoyable presentations we have attended in recent days. Mukund needs special commendation for his fortitude and confident good humor in standing up before a savvy crowd and braving any and all questions about Facebook’s past, present and future.

Seattle’s “evening tech scene” is really getting interesting these days: perhaps we are seeing the working of a virtuous cycle where meetups and other events start to “up their game”!