kivikakk.ee

Time zones in .NET

I’m a fairly new .NET developer. I worked with the framework a bit in 2005–6, but hadn’t really touched it since then. In the meantime, I’ve been sticking to the Linux ecosystem, and a little OS X, as mentioned in a previous article.

So, time zones. I know they’re a sore point for many environments, but most seem to dig themselves out of the hole and provide something that is, in the end, usable.

Ruby’s builtin Time is actually pretty darn good, and if you use Rails, ActiveSupport makes it even better. pytz seems .. alright. Databases generally have their heads screwed on straight. A lot of the time you can get away with just storing seconds since the epoch and call it a day, because there’s nothing more intrinsic built into the system.

Then I got my new job, and it was time get back into .NET. A lot has changed since 2006; it had only hit 2.0 then, mind.

So I felt confident I was using the latest, modern stuff. We target 4.0 and 4.5 across our projects, and there’s plenty nice about it.

Then I had to work with System.DateTime. Oh. Oh, gosh.

I quote the manual at you.

DateTime.Kind Property

Gets a value that indicates whether the time represented by this instance is based on local time, Coordinated Universal Time (UTC), or neither.

Kind is the lone field on a DateTime which has anything to do with time zones. It can take the value Local, Universal, or Unspecified. What does that even MEAN. Note that Kind is ignored in comparisons, too, which can only mean more fun for application developers.


It would be remiss of me to fail to note the paragraph in the docs which state:

An alternative to the DateTime structure for working with date and time values in particular time zones is the DateTimeOffset structure. The DateTimeOffset structure stores date and time information in a private DateTime field and the number of minutes by which that date and time differs from UTC in a private Int16 field. This makes it possible for a DateTimeOffset value to reflect the time in a particular time zone, whereas a DateTime value can unambiguously reflect only UTC and the local time zone’s time. For a discussion about when to use the DateTime structure or the DateTimeOffset structure when working with date and time values, see Choosing Between DateTime, DateTimeOffset, and TimeZoneInfo.

The linked page states that “although the DateTimeOffset type includes most of the functionality of the DateTime type, it is not intended to replace the DateTime type in application development.”

Intention or not, it should ALWAYS be used. It lists as a suitable application for the DateTimeOffset:

  • Uniquely and unambiguously identify a single point in time.

Because we don’t want that at any other time? When do you want a DateTime which non-specifically and ambiguously identifies several points in time?

On the other hand, listed as suitable for DateTime:

Retrieve date and time information from sources outside the .NET Framework, such as SQL databases. Typically, these sources store date and time information in a simple format that is compatible with the DateTime structure.

No fucking comment.

It continues:

Unless a particular DateTime value represents UTC, that date and time value is often ambiguous or limited in its portability. For example, if a DateTime value represents the local time, it is portable within that local time zone (that is, if the value is deserialized on another system in the same time zone, that value still unambiguously identifies a single point in time). Outside the local time zone, that DateTime value can have multiple interpretations. If the value’s Kind property is DateTimeKind.Unspecified, it is even less portable: it is now ambiguous within the same time zone and possibly even on the same system on which it was first serialized. Only if a DateTime value represents UTC does that value unambiguously identify a single point in time regardless of the system or time zone in which the value is used.

Completely useless. So, we’ll use DateTimeOffset in our application code, right?

Only the ecosystem hasn’t caught up.


Enter Npgsql, a Postgres driver for .NET with a frightening amount of code. It only works with DateTime objects when sending or receiving timestamps to or from Postgres.

Postgres has two column types: timestamp with time zone and timestamp without time zone (or timestamptz and timestamp, respectively). The former is about as good as a DateTime, but without trying to be more than it can: it doesn’t have Kind, which improves its usability by an order of magnitude. You can make a policy decision like “we’ll always store UTC timestamps”, and you’ve solved timezones in your application. They mark a specific point in time unambiguously.

Or you can just use timestamptz and they still unambiguously mark a specific point in time. It’s magic!

So how does Npgsql deal with this?

The genesis of this post was because we were noting strange behaviour: we had read a timestamptz out of the database, and then later SELECTed all rows where that column was strictly less than the value we read out. And yet that same row would be included in the result. It made no sense.

Turns out it really did make no sense.

The rest of this blog is a sequence of test cases which demonstrate just how bad the situation is.

[SetUp]
public void SetUp()
{
    _connection = new NpgsqlConnection(
        "Host=localhost; Port=5432; Database=time_zones_in_dot_net; " +
        "User ID=time_zones_in_dot_net");
    _connection.Open();
}

[TearDown]
public void TearDown()
{
    _connection.Close();
}

[Test]
public void TimeZonesSane()
{
    // introduction
    // ------------

    // This test assumes the *local* machine (running NUnit) has local time of +10.
    // It's agnostic to the time zone setting on the database server.
    // In other words, Postgres +1, .NET -100000000.

    // Render UTC (+0), Postgres (+3) and .NET (+10) distinguishable.
    _connection.Execute("SET TIME ZONE '+3'");

    // In the below tests we assert that the queries yield a .NET DateTime object
    // which, when .ToUniversal() is called on it, produces the given date in
    // "mm/dd/yyyy HH:MM:SS" format.

    // After that is the non-.ToUniversal() date in parenthesis.  This is *always* 10
    // hours ahead for a Local or Unspecified, and the same for Utc.  DateTime
    // objects have no knowledge of offset, only Kind.

    // There's also a character appended to represent what time zone "kind" it came
    // back with; "L" for Local, "?" for Unspecified, "U" for Universal.

    // As noted below, ToUniversal() on a Local or Unspecified returns a new DateTime
    // with Kind set to Universal, and unilaterally subtracts the time zone offset of
    // the machine the code is running on.


    // tests using string literals
    // ---------------------------

    // Not useful in themselves, because we'll never use string literals, but help to
    // demonstrate some initial weirdness.

    // string timestamp time zone unspecified: assumed to be in database local time.
    // Returns with Local Kind.
    QueryEqual("09/11/2013 03:47:03 (09/11/2013 13:47:03) L",
               "SELECT '2013-09-11 06:47:03'::timestamp with time zone");

    // string timestamp with time zone: should come back with the correct universal
    // value, with Local Kind.
    QueryEqual("09/11/2013 05:47:03 (09/11/2013 15:47:03) L",
               "SELECT '2013-09-11 06:47:03+1'::timestamp with time zone");


    // string timestamp without time zone: comes back 'unspecified' with the exact
    // datetime specified.  ToUniversal() assumes unspecified = local, so -10.
    // Returns with Unspecified Kind.
    QueryEqual("09/10/2013 20:47:03 (09/11/2013 06:47:03) ?",
               "SELECT '2013-09-11 06:47:03'::timestamp without time zone");

    // string timestamp with time zone, coerced to not have a time zone: as if the
    // time zone wasn't in the string.  Returns with Unspecified Kind.
    QueryEqual("09/10/2013 20:47:03 (09/11/2013 06:47:03) ?",
               "SELECT '2013-09-11 06:47:03+1'::timestamp without time zone");


    // tests using .NET values as parameters
    // -------------------------------------

    // These represent what we'll usually do.  They're also really messed up.

    // unadorned parameter: regardless of the DateTimeKind, the date is treated as
    // without time zone; the exact date given comes back, but with Unspecified Kind,
    // and so is as good as forced to local time.
    DateTimesEqual("09/10/2013 20:47:03 (09/11/2013 06:47:03) ?",
                   "SELECT @DateTime",
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Local),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Unspecified),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Utc));

    // parameter specified as with time zone: regardless of the DateTimeKind, the
    // date is treated as in the database local time.  It comes back with Local Kind.
    DateTimesEqual("09/11/2013 03:47:03 (09/11/2013 13:47:03) L",
                   "SELECT (@DateTime)::timestamp with time zone",
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Local),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Unspecified),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Utc));

    // parameter specified as without time zone: as for unadorned parameter.
    DateTimesEqual("09/10/2013 20:47:03 (09/11/2013 06:47:03) ?",
                   "SELECT (@DateTime)::timestamp without time zone",
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Local),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Unspecified),
                   new DateTime(2013, 9, 11, 6, 47, 3, DateTimeKind.Utc));


    // discussion
    // -----------

    // DateTime parameters' kinds are ignored completely, as shown above, and are
    // rendered into SQL as a 'timestamp' (== 'timestamp without time zone').  When a
    // comparison between 'timestamp with time zone' and timestamp without time zone'
    // occurs, the one without is treated as being in database local time.

    // Accordingly, you should set your database time zone to UTC, to prevent
    // arbitrary adjustment of incoming DateTime objects.

    // The next thing to ensure is that all your DateTime objects should be in
    // Universal time when going to the database; their Kind will be ignored by
    // npgsql.  If you send a local time, the local time will be treated as the
    // universal one.

    // Note that, per the second group just above, 'timestamp with time zone' comes
    // back as a DateTime with Local Kind.  If you throw that right back into npgsql,
    // as above, the Kind will be summarily ignored and the *local* rendering of that
    // time treated as UTC.  Ouch.


    // conclusions
    // -----------

    // 'timestamp with time zone' is read as DateTime with Local Kind.  Note that the
    // actual value is correct, but it's invariably transposed to local time (i.e.
    // +10) with Local Kind, regardless of the stored time zone.  Calling
    // .ToUniversal() yields the correct DateTime in UTC.

    // DateTime's Kind property is ignored.  To work around, set database or session
    // time zone to UTC and always call ToUniversal() on DateTime parameters.

    // Don't use 'timestamp without time zone' in your schema.
}

private void DateTimesEqual(string expectedUtc, string query,
                            params DateTime[] dateTimes)
{
    foreach (var dateTime in dateTimes) {
        var cursor = _connection.Query<DateTime>(query, new {DateTime = dateTime});
        DatesEqual(expectedUtc, cursor.Single());
    }
}

private void QueryEqual(string expectedUtc, string query)
{
    DatesEqual(expectedUtc, _connection.Query<DateTime>(query).Single());
}

private static void DatesEqual(string expectedUtc, DateTime date)
{
    var code = "_";
    switch (date.Kind) {
        case DateTimeKind.Local:
            code = "L";
            break;
        case DateTimeKind.Unspecified:
            code = "?";
            break;
        case DateTimeKind.Utc:
            code = "U";
            break;
    }

    var uni = date.ToUniversalTime();
    Assert.AreEqual(expectedUtc,
                    string.Format("{0} ({1}) {2}",
                                  uni.ToString(CultureInfo.InvariantCulture),
                                  date.ToString(CultureInfo.InvariantCulture),
                                  code));
}

How and Why We Switched from Erlang to Python tells an intern’s tale, from whose perspective it runs like this: we used Erlang, “No one on our team is an Erlang expert” (despite “how crucial this service is to our product”!), and also would you please suspend brain activity while I make some performance claims.

Hold your horses. The good decision to rewrite critical services in a language they actually know aside, let’s look at their notes on perf:

Another thing to think about is the JSON library to use. Erlang is historically bad at string processing, and it turns out that string processing is very frequently the limiting factor in networked systems because you have to serialize data every time you want to transfer it. There’s not a lot of documentation online about mochijson’s performance, but switching to Python I knew that simplejson is written in C, and performs roughly 10x better than the default json library.


Let’s distill these claims:

  • Erlang is historically bad at string processing
  • string processing is very frequently the limiting factor in networked systems
  • simplejson is written in C
  • simplejson performs 10x better than the default json library

Further down:

I went into Mixpanel thinking Erlang was a really cool and fast language but after spending a significant amount of time … I understand how important code clarity and maintainability is.

Thus by implication?

  • Erlang is not a really cool and fast language
  • Erlang is somehow not conducive to code clarity and maintainability

This whole paragraph is just a mess and I can’t quote it without losing some of its slimy essence.

Again, in full:

I’ve learned a lot about how to scale a real service in the couple of weeks I’ve been here. I went into Mixpanel thinking Erlang was a really cool and fast language, but after spending a significant amount of time sorting through a real implementation, I understand how important code clarity and maintainability is. Scalability is as much about being able to think through your code as it is about systems-level optimizations.

By your own admission, no-one on your team is an Erlang expert; you “have trouble debugging downtime and performance problems”. Plenty of Erlang users don’t, so it suggests the problem with your team’s command of the enviroment is severe. Similarly, you earlier mention the “right way” to do something in Erlang, and immediately comment that your code didn’t do that at all – never mind that the “right way” mentioned was wrong.

Yikes.

So why does the word “Erlang” feature in the above-quoted paragraph at all?

There’s no reason to expect either code clarity or maintainability of a service developed over 2 years without an engineer skilled in the environment overseeing the architectecture.

I didn’t say Erlang in that sentence, and yet it has greater explanatory power than the intern’s claim for the same phenomenon.

I suspect their explanation is more controversial, however, and it’s easier to make these claims than arrive at the conclusion that the team’s own shortcomings were responsible for the technical debt accrued – and it makes for a better article. I choose my explanation:

  • Erlang is somehow not conducive to code clarity and maintainability: there is not even anecdotal support in the article for this claim

That leaves 5 claims.

Let’s note an important confounding factor: the article is from August 2011. The state of Python and Erlang, and libraries for both have changed since.


As an aside: it’s easy to think that the performance claims they do indirectly make are incidental (and not essential) to the article.

But remove them, and note there’s not really an article any more; a prologue about mapping some concepts from an old codebase to new, and .. an epilogue espousing the virtues of code clarity and maintainability.

Ain’t nobody gonna argue with that, but, as noted above, just that alone does not a “How and Why We Switched from Erlang to Python” blog post make.


Let’s now dig into it – this won’t be much of an article without benchmarks either. Unlike their benchmarks, I’m actually comparing things in order to contrast; their decision to give benchmarks on the new system but not on the old is baffling at best.

I compared 4 Python JSON implementations and 3 Erlang ones:

  • json (built-in in Python 2.7.4)
  • simplejson 3.3.0 from PyPI, both with and without C extensions
  • ujson 1.30 from PyPI
  • mochijson and mochijson2 from mochiweb
  • jiffy

simplejson is what the intern picked for their rewrite. mochijson is what they were using before.

All versions are current at time of writing.

Testing method:

  • read 5.9MB of JSON from disk into memory
  • start benchmark timer
  • parse JSON from memory 10 times, each time doing some minimal verification that the parse was successful
  • force a garbage collect
  • stop our benchmark timer

The code is available on my GitHub account, and instructions to reproduce are found there.

Here are the results:

  • ujson: 1,160ms
  • jiffy: 1,271ms
  • simplejson (with C): 1,561ms
  • json: 2,378ms
  • mochijson2: 8,692ms
  • mochijson: 11,111ms
  • simplejson (no C): 16,805ms

ujson wins! jiffy a close second! simplejson a close third! These results are the average of three runs each, but I did many more runs in testing the benchmark code and can say the variance was quite low.

So:

  • simplejson performs 10x better than the default json library: this doesn't appear to be the case now. It may have been the case in 2011, depending on what the default json library was back then.
  • Erlang is not a really cool and fast language: in this particular example the best Erlang library is on par with both of the best Python libraries – all three C-boosted, of course – and the best pure Erlang library runs in half the time as the apparently-best pure Python one. (json is C-boosted)

That leaves us with these claims unrefuted:

  • Erlang is historically bad at string processing
  • string processing is very frequently the limiting factor in networked systems
  • simplejson is written in C

Erlang’s historical performance is somewhat irrelevant, but the claim stands nevertheless.

No evidence was advanced for the second claim: there was no way to determine whether faster string processing was responsible for any improvement in their benchmarks: we don’t even know if the benchmarks improved because we only were given one set (!). Of course, the changes being the entire system and not just string processing, before-and-afters would prove nothing, especially given the proficiency gap. Hence:

  • string processing is very frequently the limiting factor in networked systems: maybe, maybe not, but picking the right library makes a big difference!

I mean, jeez; they could reduce their string processing (and thus the “limiting factor”?) by 33% if they switched from simplejson to ujson!

As for the third claim, if I don’t nitpick, it stands. Kinda.


Why did I feel the need to write this up?

I saw the article pop up on Hacker News today, 2 years after its publication. In fact, I’d seen the article not long after it was originally published, and I know it was on HN back then too. I don’t care about the fact that it was reposted; more that it was written at all.

It’s exactly the sort of useless bullshit that seems to fill a quarter of HN’s front page at any given stage: articles with titles like “Why I Don’t Use Vim”; “Why You Should Learn Vim”; “The Reason You Need To Understand System F If You Want To Ever Amount To Anything”; “Stop Thinking. Just Write.”; “How We Saved 95% Of Our Datacentre Costs By Rewriting Everything In Ada”; etc. etc. etc.

It’s edgy shit that grabs your attention and way oversells one point of view, at the expense of reason. This comment thread started out by acknowledging this trend, but was ruined by someone not catching the pattern being invoked. Usually there’s a nice novel point to be made somewhere if you can read the moderate subtext between the bold claims.

Unfortunately this article had no such point, and so turned out to be 100% garbage. But still, some people will read that article and go away thinking their prejudices about Erlang have been confirmed by someone who’s been in battle and seen it for themselves.

And really, this isn’t about Erlang. I don’t care what language or environment you use. Your coding philosophies are irrelevant, if you deliver – and Mixpanel are delivering, and clearly made the right choice to get rid of some technical debt there.

But don’t then try to shift any part of the responsibility of the decision to pay off that debt as being your tools’ faults, and especially not with such flawed logic.

k6_bytea

I’ve started a project in Erlang recently, and I needed a big block of mutable memory – a massive byte array, buffer, whatever you want to call it. Erlang’s binary strings are immutable and so probably not suitable.

I figured the core distribution must’ve had something like this … nope. Spending 30 minutes Googling and checking docs twice and thrice over, but there’s clearly no mutable byte array in Erlang itself.

Is there a hack that approximates to this? Search … this StackOverflow almost seems hopeful at the end, referencing hipe_bifs:bytearray/2 and hipe_bifs:bytearray_update/3, but alas, they are so-named because they are part of the HiPE native compiler, and not Erlang itself. (I’m not currently using HiPE, and it would be nice to not be tied to it.)

Then, I thought, surely someone else has extended Erlang to do this. There are several modes of extending Erlang, but native implemented functions (NIFs) would be the obvious candidate for manipulating large chunks of memory.

Googling this yielded complete failure: people just don’t seem to be doing it. (If people are doing it and you know this, please, let me know.)

So I did it: k6_bytea.

The interface is simple and fairly pleasant:

1> Bytea = k6_bytea:from_binary(<<"Hello, world.">>).
<<>>
2> k6_bytea:set(Bytea, 7, <<"Thomas">>).
ok
3> k6_bytea:to_binary(Bytea).
<<"Hello, Thomas">>
4> Gigabyte = k6_bytea:new(1000000000).
<<>>
5> k6_bytea:delete(Gigabyte).
ok
6>

The gigabyte allocation caused a small notch on my memory performance graph:

Screenshot of a GNOME desktop menubar, with a memory performance widget showing a very abrupt increase, then decrease, in memory allocated.

perf

The obvious question remains: how does it perform, vis-à-vis binary strings?

Let’s make a contrived benchmark: initialise a large buffer, write all over it in a deterministic fashion, and copy data all over it. Let’s do it in parallel, too.

Here’s the benchmarking code for your perusal:

-module(k6_bytea_bench).
 
-export([run/0, binary_strings/1, k6_bytea/1]).
 
-define(COUNT, 1024).   % Parallel operations.
-define(SIZE, 102400).  % Size of buffer to work on.
 
run() ->
    measure(<<"Binary strings">>, ?MODULE, binary_strings),
    measure(<<"k6_bytea">>, ?MODULE, k6_bytea).
 
measure(Name, M, F) ->
    {SM, SS, SI} = erlang:now(),
    spawn_many(?COUNT, M, F, [self()]),
    receive_many(?COUNT, done),
    {EM, ES, EI} = erlang:now(),
    Elapsed = ((EM - SM) * 1000000 + (ES - SS)) * 1000 + ((EI - SI) div 1000),
    io:format("~s [~p ms]~n", [Name, Elapsed]),
    ok.
 
spawn_many(0, _, _, _) -> ok;
spawn_many(N, M, F, A) ->
    spawn(M, F, A),
    spawn_many(N - 1, M, F, A).
 
receive_many(0, _) -> ok;
receive_many(N, M) -> receive M -> receive_many(N - 1, M) end.
 
binary_strings(Done) ->
    B0 = <<0:(?SIZE*8)>>,
    B1 = binary_strings_set_bits(B0, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
    _ = binary_strings_copy_bits(B1, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
    Done ! done.
 
binary_strings_set_bits(B, []) -> B;
binary_strings_set_bits(B, [H|T]) ->
    <<LHS:H/binary, _:1/binary, RHS/binary>> = B,
    binary_strings_set_bits(<<LHS/binary, (H rem 255), RHS/binary>>, T).
 
binary_strings_copy_bits(B, []) -> B;
binary_strings_copy_bits(B, [H|T]) ->
    <<LHS:H/binary, Bit:1/binary, _:1/binary, RHS/binary>> = B,
    binary_strings_copy_bits(<<LHS/binary, Bit/binary, Bit/binary, RHS/binary>>, T).
 
k6_bytea(Done) ->
    B = k6_bytea:new(?SIZE),
    k6_bytea_set_bits(B, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
    k6_bytea_copy_bits(B, lists:seq(0, ?SIZE - 1024, ?SIZE div 1024)),
    k6_bytea:delete(B),
    Done ! done.
 
k6_bytea_set_bits(B, []) -> B;
k6_bytea_set_bits(B, [H|T]) ->
    k6_bytea:set(B, H, <<(H rem 255)>>),
    k6_bytea_set_bits(B, T).
 
k6_bytea_copy_bits(B, []) -> B;
k6_bytea_copy_bits(B, [H|T]) ->
    Bit = k6_bytea:get(B, H, 1),
    k6_bytea:set(B, H + 1, Bit),
    k6_bytea_copy_bits(B, T).

Over 3 runs, binary_strings averaged 24,015ms, and k6_bytea 198ms (0.83% time, or 121x speed).

There’s nothing very surprising about this; it’s large, unwieldy immutable data-structures vs. one mutable data-structure. It’s even the case that I have no idea if there are any better ways to simulate a byte array in Erlang, either with binary strings, or without!

The binary string-manipulating code above is ugly and error-prone, as it’s clearly not the purpose it was built for. If it should turn out that this really hasn’t been done better by someone else, then I encourage you to look to and improve k6_bytea for this purpose.

Lately I’ve been game programming. A few things have been coming to mind:

  • The Sapir–Whorf hypothesis is totally true when it comes to programming languages.
  • Dynamically typed languages encourage a whole lot of waste.
  • There’s some sweet spot of expressivity, low level and non-shoot-yourself-in-the-footness in language that’s missing.

I will attempt to expound.

The languages I end up using on a day-to-day basis tend to be higher level. Non-scientific demonstration by way of my GitHub account’s contents at time of writing:

15 Ruby
9 OCaml
6 Go
6 Javascript
4 C
3 C++
2 Clojure
3 Perl
2 Haskell
...

In my professional career, I’ve concentrated on JavaScript, PHP, Erlang, Python and C#. The lowest level of these is, by far, Erlang. Perhaps it’s fairer to say that Erlang keeps me in check, perf-wise, more than any of the others.

So my mind has been a fairly high-level sort of mindset. I’ve made occasional trips to lower-level code, but there hasn’t been much of that lately, particularly as I’ve changed jobs and needed to concentrate solely on work stuff for a while.

Choosing C++ to write a game wasn’t too hard; it’s fairly standard in the industry, and I know it quite well. Bindings to libraries are plentiful, and getting the same codebase compiling on Windows, OS X and Linux is a well-understood problem that’s known to be solvable.

The thing is, C++ makes it abundantly clear when you’re doing something costly. This is something that occurs particularly to me now as I’ve not done this lower-level stuff in a while.

You wrote the copy constructor yourself, so you know exactly how expensive pushing a bunch of objects into a vector is. You chose a vector, and not a list, so you know exactly why you don’t want to call push_front so many times. You’re creating a ostringstream to turn all this junk into a string: it has a cost. Are we putting this on the stack or in the heap? Shall we use reference counting?

You make hundreds of tiny decisions all the time you’re using it; ones which are usually being abstracted away from you in higher level languages. It’s why they’re higher level.

And that’s basically all I have to say on that: the language makes you feel the cost of what you choose to do. Going to use a pointer? Better be sure the object doesn’t get moved around. Maybe I’ll just store an ID to that thing and store lookups in a map. How costly is the hashing function on the map key? You add such a lookup table in Ruby without a moment’s notice; here, you’re forced to consider your decision a moment. Every time you access the data, you’re reminded as well; it’s not something that ever goes out of mind.

Moreover, the ahead-of-time compilation means you can’t do costly runtime checks or casts unless you really want to (e.g. dynamic_cast), but again, the cost of doing so means you’ll never be caught unaware by slowing performance. In many (dynamic) higher level languages, basically every operation is laced with these.

So it’s suited for games programming, because performance is usually pretty important, and the language keeping performance on your mind means it’s not hard to consistently achieve high performance.

But C++’s deficiencies are also well-documented. It’s awful. It’s waiting to trip you up at every turn. After re-reading those talk slides, I figured I’d just port the code to C – until I remembered how much I used std::string, std::vector, std::list, and moreover, enjoyed the type- and memory-safety they all bring. I’m not particularly fond of giving that up and implementing a bunch of containers myself, or using generic containers and throwing away my type checks.

I think I’m after C with templates for structs (and associated functions), but I’m not sure yet. If you think I want C++, you probably need to re-read those notes.

The other solution is to only use as much of C++ as I like, and that’s basically what I do – but the language is still waiting to trip me up, no matter how much I try not to use it.

Time to think a bit about what the issues at hand really are.

I use Snapchat. It’s an app where you can take a photo or short (< 10 second) video and send it to your friends who use the service; they’ll then be able to see it, once, before it disappears forever.

Ostensibly, the app is for sexting, because there’s no fear that your photo will get spread around (no forwarding/etc.) or retained for longer than you’d like, but it seems like it’s not as much a sexter’s hangout as the media might want you to think.

My circle of friends use it basically as an extension of weird Twitter – most snaps I send and receive are strange angles of weird objects; the completely mundane but somehow therapeutic (7 seconds of the camera pointed outside the window of a tram, pointed at the ground moving below); or just closeups of Curtis Stone’s face, wherever we see him.

Of course, the promise that they won’t get retained is just that: a promise. Since your phone receives this image and shows it to you at some point, it must be downloaded by your phone. If it can be downladed by the phone, it can be downloaded by something else. We decided to find out how.

Read more

Here’s Contrigraph, a “data visualisation” (?) created by generating commits to match the Contribution Graph Shirt from GitHub.

It was pretty much a hack; first, I read the colours straight off the shirt, producing a big block of data like

0002342
2223322
2323241
2224333
3322122
2242231
...

Then we read that in one digit at a time, work out what day to start on so everything aligns correctly, and how many commits on a given day produce each gradient of colour. The result isn’t pretty:

start = Time.new 2012, 4, 23, 12, 0, 0, 0

tbl = {"0" => 0, "1" => 0, "2" => 1, "3" => 9, "4" => 14, "5" => 23}

3.times do 
  dbd.each do |n|
    tbl[n].times do
      `echo 1 >> graafik`
      `git commit -a -m 'contrigraph' --date='#{start.to_s[0..9]}T12:00:00'`
    end
    start += 86400
  end
end

Three times so this thing will scroll on for the next few years. Other values for tbl would work too; I just didn’t bother to do anything better. I’ve written clearer code, but that wasn’t really the point either.

I actually screwed this up twice: first I didn’t remember to treat the 0 entries correctly (i.e. I should have skipped those days, whereas I ignored them entirely); second, it seemed like I was getting hit by timezone issues where everything was shifted up a block.

In retrospect, I should have first produced a mini-contributions graph generator (i.e. one that takes a Git repository and produces what GitHub would do), validate that against an existing user/repo, then use that to ensure it’d work the first time. I did a similar thing to ensure I had the data correct, by producing a graph directly from the data:

As programmers, we spend a lot of time just carting data from one place to another. Sometimes that’s the entire purpose of a program or library (data conversion whatevers), but more often it’s just something that needs to happen in the course of getting a certain task done. When we’re sending a request, using a library, executing templates or whatever, it’s important to be 100% clear on the format of the data, which is a fancy way of saying how the data is encoded.

Let’s do the tacky dictionary thing:

encoding (plural encodings)

  1. (computing) The way in which symbols are mapped onto bytes, e.g. in the rendering of a particular font, or in the mapping from keyboard input into visual text.

  2. A conversion of plain text into a code or cypher form (for decoding by the recipient).

I think these senses are a bit too specific—if your data is in a computer in any form, then it’s already encoded. The keyboard doesn’t even have to come into it.

Read more