Code Encryption

A friend of mine just sent me a link to this discussion on indiegamer.com. He also commented that he shouldn’t read stuff on the internet any longer, the stupidity just makes his head explode. I concur.

But that’s really not the point of this little rant. The point of this little rant is that it’s astonishing how little developers understand encryption, and how it applies to code. I can summarize it quite simply for you: it doesn’t really work1.

No, I know that won’t do as an explanation. But really, learn a little about encryption before you even think about encrypting code, or else the results won’t be what you had in mind. I can almost guarantee that.

So here’s a crash course, and it’s very, very devoid of details:

  1. There’s asymmetric or public key cryptography. You generate a private key that you keep secret2 and a matching public key that you can show to everyone3. If you combine plaintext4 and the public key in a special mathematical operation, then you the result is called cipher text. If you combine the cipher text with the private key in another mathematical operation, you get the plain text again. In that manner, anyone can encrypt data with your public key and make sure only you can read that, because you kept your private key well hidden. For two-way communications, each participant will need a public and private key of their own.
  2. There’s symmetric or shared key encryption, the simplest — if insecure — example would be something like a rotation algorithm. Shift all letters in the alphabet by one, so A becomes B, B becomes C, etc. In order to decipher the plain text, you need to reverse the operation used to encrypt it, applying the same, shared secret, in this case “shift one letter to the right”.

Symmetric encryption is typically fast, but typically insecure in that given some plain text and some cipher text, the secret can be derived. So secrets shouldn’t be re-used very often, and there’s a plethora of techniques to avoid that.

Asymmetric encryption is typicall slow, but doesn’t suffer from this issue. Most of the time — i.e. when using SSL connections on the internet, for example — you’ll be using a mixture of both, where a secret key is generated randomly, exchanged via asymmetric encryption, and once that slow exchange is out of the way data is then encrypted symmetrically.

Notice a pattern here?

You cannot decrypt anything without knowing some sort of secret. That means you cannot decrypt code without knowing some sort of secret.

You must decrypt code in order to run it, or your processor will think it’s gibberish and throw up.

Therefore, you must know some sort of secret in order to run encrypted code.

So, if you encrypt your code, you will encrypt it using some secret that is in the possession of whoever is supposed to run it. Which means they could decrypt it, and share the decrypted version around. Or they could decrypt it, and then decompile it. Or whatever else scares you so much that you think encryption is required.

Here’s how hardware manufacturers try to work around this issue sometimes: they embed a secret in some part of the hardware’s memory, that should only be accessible to a small and well-validated part of the operating system code. Then any additional secrets used will be encrypted with this embedded secret to — hopefully — achieve the effect that only that small and well-validated part of the operating system code can decrypt the secret, and therefore decrypt your code, and therefore run your code.

That’s working reasonably well… but you must understand that this is inherently insecure. It’s security by obscurity. If you can trick the processor at all into believing that your code is authorized to access this memory area, you’ve cracked the platform. Granted, that’s hard… but it is always possible.

Why is that the case? Well, if you somehow tie your hardware to the exact code that should be authorized to read this piece of secret memory, then you’ve lost the ability to fix crucial security issues in that part of code, because you’ve lost the ability to change it. And that can be a lot worse than leaving the possibility open that someone could trick the processor.

So… to summarize (again): code signing is only “secure” if there’s hardware support for it. Even then, it’s possible — though hard — to get at your code. Once your code is got at in this way, it can be freely copied around or analyzed, without you being able to stop it.

Now I don’t think that code encryption is entirely futile. It’ll protect you for a little while, from those with a little criminal energy and no knowledge, or those with vast knowledge and no criminal energy. But — unfortunately — there’ll always be people with just enough criminal energy and/or just enough knowledge to get at your stuff, sooner or later.

If you know that — and I mean know that — then it’s a question of what you think a reasonable investment is to make the above unlikely for a while. Invest that much, sure, knock yourself out. You might even gain from it.

Most people I’ve met don’t really know that at all, though…

… and now I’ve ranted enough for today, I think.

  1. For a definition of “work”. []
  2. Hence the name “private”. []
  3. Hence the name “public”. []
  4. Original data, stuff you want to encrypt. Doesn’t have to be text, can be code, despite the name. []