Monthly Archives: June 2012

How to tell if a file has been updated

The first version of Kerika was written as a peer-to-peer (p2p) application, so one challenge we faced was detecting when files that are being shared as part of a project were changed by a user, so that we could send the latest version to everyone else on the team.

Our first attempt at a solution was to simply examine the Last Modified time for files. However, this proved to be very unreliable for a rather odd reason: whenever you open a spreadsheet using Microsoft Excel, it automatically updates the Last Modified time to be the current time – even before you had made any changes.

And when you close Excel, without having made any changes, it resets the Last Modified time back to its original value. So, whenever you opened a Excel files for viewing, we would erroneously identify it as an updated file.

We then tried looking at the size of files, to see if these had changed since we last examined them. We knew, of course, that this would be error prone in its own way: if you change some text within a file such that it contains the same number of characters as before, the overall size of that file would not change.

But this approach failed for another reason altogether: Microsoft Word allocates disk space in chunks at a time, rather than as exact amounts. This means that any edits to Word files that do not require Word to grab another chunk, or give up a chunk, would never be reflected in the reported size of the file.

Eventually, we decided to take the MD5 hash of files, which is a more reliable way of detecting if a file has been modified. We were concerned about how much CPU overhead this would take, but it proved to not be a problem after all.

How to tell if a file has been updated

The first version of Kerika was written as a peer-to-peer (p2p) application, so one challenge we faced was detecting when files that are being shared as part of a project were changed by a user, so that we could send the latest version to everyone else on the team.

Our first attempt at a solution was to simply examine the Last Modified time for files. However, this proved to be very unreliable for a rather odd reason: whenever you open a spreadsheet using Microsoft Excel, it automatically updates the Last Modified time to be the current time – even before you had made any changes.

And when you close Excel, without having made any changes, it resets the Last Modified time back to its original value. So, whenever you opened a Excel files for viewing, we would erroneously identify it as an updated file.

We then tried looking at the size of files, to see if these had changed since we last examined them. We knew, of course, that this would be error prone in its own way: if you change some text within a file such that it contains the same number of characters as before, the overall size of that file would not change.

But this approach failed for another reason altogether: Microsoft Word allocates disk space in chunks at a time, rather than as exact amounts. This means that any edits to Word files that do not require Word to grab another chunk, or give up a chunk, would never be reflected in the reported size of the file.

Eventually, we decided to take the MD5 hash of files, which is a more reliable way of detecting if a file has been modified. We were concerned about how much CPU overhead this would take, but it proved to not be a problem after all.

Kerika isn’t written in Indonesian. (And Google Docs isn’t in Vietnamese either…)

An odd problem that we cannot quite figure out: every once in a while Google’s Chrome browser will tell the user that the application is written in Indonesian, and then offer to do a translation.

We cannot figure out what’s going on with this: all of the code is written in Javascript and Scalable Vector Graphics (SVG), so why would Chrome consider it to be an Indonesian page?

The problem must lie with Google Chrome itself: now, we are noticing that it will sometimes report that Google Docs is written in Vietnamese, as this screenshot shows!

When techies turn libertarian, neither Democrats nor Republicans will win

The Wall Street Journal’s editorial page contains a bemusing article called Keeping the Spirit of Steve Jobs Alive that concludes that

Thanks to Washington, the liberal politics of Silicon Valley may now be tilting toward the libertarian right.

The WSJ is, of course, a Republican mouthpiece, which probably explains the writer’s smug assumption that when techies turn libertarian, they will all start voting Republican, which is absurd on two counts:

  • It is based upon what is essentially “non-news” in the first place: many techies have long tilted libertarian in their political views because of the very nature of their profession: information technology itself is based upon the free flow of information, so respect for the First Amendment runs deep within the tech world, and the business itself is relentlessly meritocratic and brutally disruptive, which supports neither the collective approach of the left, nor the deference to authority of the right.
  • Libertarianism may be to the “right” of liberalism, but in its emphasis on personal liberty, it could just as easily be described as being to the “left” of conservatism. In any case, libertarianism forms a 3-D point with respect to classic liberalism and conservatism, so any mapping of a 3-D topography of political views to a 2-D spectrum will necessarily be misleading.

And because libertarianism confounds the simple soundbites and clear definition of the “enemy” that both Democrats and Republicans find convenient, neither party would really welcome libertarians in their midst.

If the WSJ had understood the difference between Republicans and libertarians (with a small “l”) in the first place, they would have noticed a long time ago that a lot of techies are social liberals and economic conservatives…