What is ZRTP?

ZRTP is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over Internet Protocol (VoIP) phone telephony call based on the Real-time Transport Protocol. It uses Diffie–Hellman key exchange and the Secure Real-time Transport Protocol (SRTP) for encryption. ZRTP stands for “Zimmermann Real-time Transport Protocol” and was developed by Silent Circle’s own, Phil Zimmermann.

Why do we need secure VoIP?

As VoIP grows into a replacement for the PSTN, we will absolutely need to protect it, or organized crime will be attacking it as intensively as they attack the rest of the Internet today. VoIP is far more vulnerable to interception than the PSTN. A PC on your office network can unknowingly host spyware that can intercept your corporate VoIP calls and store and organize them on a hard disk for convenient browsing by criminals half a world away, giving them trade secrets and insider trading opportunities.

The Internet is not a safe medium to carry our phone calls. But Silent Circle solves these problems. This technology has social benefits. It has the power to change our lives, enabling us to have a private conversation any time we want with anyone, anywhere – without buying a plane ticket.

Why is the ZRTP protocol better?

The ZRTP protocol has some nice cryptographic features lacking in many other approaches to VoIP encryption. Although it uses a public key algorithm, it avoids the complexity of a public key infrastructure (PKI). In fact, it does not use persistent public keys at all. It uses ephemeral Diffie-Hellman with hash commitment, and allows the detection of man-in-the-middle (MiTM) attacks by displaying a short authentication string for the users to verbally compare over the phone. It has perfect forward secrecy, meaning the keys are destroyed at the end of the call, which precludes retroactively compromising the call by future disclosures of key material. But even if the users are too lazy to bother with short authentication strings, we still get fairly decent authentication against a MiTM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call’s DH shared secret,giving it key continuity properties analogous to certificate authorities, All this is done without reliance on a PKI, key certification, trust models, or key management complexity that bedevils the email encryption world. It also does not rely on SIP signaling for the key management, and in fact does not rely on any servers at all. It performs its key agreements and key management in a purely peer-to-peer manner over the RTP packet stream. And it supports opportunistic encryption by auto-sensing if the other VoIP client supports ZRTP. ZRTP doesn’t need a PKI, and there are good reasons why it’s a mistake to require a PKI for secure VoIP. Plus, there have been a growing number of spectacular security failures of traditional PKIs. Nonetheless, ZRTP can use a PKI if you already have one up and running.

Is there an IETF RFC for the ZRTP protocol?

Yes, the IETF has published RFC 6189 that defines the ZRTP protocol to set up the cryptographic key agreement.

Also note that the XMPP Standards Foundation has a protocol extension for “Use of ZRTP in Jingle RTP Sessions”, XEP-0262.

Will the government attempt to stop VoIP encryption?

It’s a fair question to ask in a post-9/11 world. Just how likely would it be for the government to restrict the end user’s use of secure VoIP? The question of whether strong cryptography should be restricted by the government was debated all through the 1990s. This debate had the participation of the White House, the NSA, the FBI, the courts, the Congress, the computer industry, civilian academia, and the press. This debate fully took into account the question of terrorists using strong crypto, and in fact that was one of the core issues of the debate. Nonetheless, society’s collective decision (over the FBI’s objections) was that on the whole, we would be better off with strong crypto, unencumbered with government back doors. The export controls were lifted and no domestic controls were imposed. This was a good decision, because we took the time and had such broad expert participation. The 9/11 attacks did not change the wisdom of that collective decision, and although civil liberties on the whole have eroded since then, we haven’t lost our right to use strong crypto.

In the early 1990s, the government tried to control the end user’s use of crypto by introducing the Clipper chip. That didn’t go over too well politically, and had to be abandoned. The government will find it difficult to try again to stop end users from encrypting their traffic, regardless of whether that traffic is email, e-commerce web transactions, or VoIP calls.

Further, the government would have to force everyone to abandon peer-to-peer communication protocols in favor of centralized, old Eastern-Bloc-style, panoptic ways of doing things. That’s not the direction technology has been heading. Rather than a “war on terrorism”, the government would have to conduct a war on technology.

Can ZRTP use a PKI (public key infrastructure)?

Yes. The ZRTP protocol does have the optional capability to use a PKI if you already have a PKI up and running. But ZRTP does not actually require a PKI. It’s a mistake to make a secure VoIP protocol require a PKI. There are major problems and complexities with building, maintaining, and relying on PKI.

First let’s review some good reasons why it’s a mistake to make a secure VoIP protocol require a PKI. There are major problems and complexities with building, maintaining, and relying on PKI. That’s why, in the 1990s, a number of companies died trying to build and market PKI technology. See Ellison and Schneier’s paper Ten Risks of PKI: What You’re Not Being Told About Public Key Infrastructure and Ellison’s paper Improvements on Conventional PKI Wisdom. In the email encryption world, a PKI architecture was the kiss of death for PEM and MOSS, both of which were swept aside by PGP. This also led to S/MIME never reaching critical mass, despite its advantage of being bundled in Microsoft’s products. PGP became the dominant email encryption standard because it did not depend on a centrally managed PKI.

Plus, there have been a growing number of spectacular security failures of traditional PKIs, notably, the Comodo and DigiNotar debacles, and the stolen certificate-signing keys that enabled the Stuxnet worm attack.

Nonetheless, if you feel you must use a PKI and already have one, here’s how ZRTP can make use of it.

The ZRTP spec (RFC 6189) describes how ZRTP can use a PKI-backed digital signature key to sign the short authentication string (SAS) in the ZRTP CONFIRM packet, to reduce reliance on users verbally comparing them during the call. Organizations that feel comfortable with PKIs can still get what they want. Thus, ZRTP offers all of the advantages of a protocol that can use a PKI, without actually becoming dependent on a PKI for security.

There is another way for a ZRTP implementation to benefit from a PKI, without becoming dependent on one. The IETF plans to someday add integrity protection to the delivery of SIP information, and that integrity protection will rely on a PKI. If this ever happens, ZRTP has protocol features that can leverage an integrity-protected SIP layer to provide integrity protection for ZRTP’s Diffie-Hellman exchange in the media layer. Which thus confers protection against a man-in-the-middle (MiTM) attack, without requiring the users to verbally compare the SAS.

Exactly how does Silent Phone and ZRTP protect against a man-in-the-middle (MiTM) attack?

Two human beings verbally compare the Short Authentication String, drawing the human brain directly into the protocol. And this is a Good Thing.

The Diffie-Hellman key exchange by itself does not provide protection against man-in-the-middle (MiTM) attacks. To authenticate the key exchange, ZRTP uses a Short Authentication String (SAS), which is essentially a cryptographic hash of the two Diffie-Hellman values. The SAS value is rendered to both ZRTP endpoints. To carry out authentication, this SAS value is read aloud to the communication partner over the voice connection. If the values on both ends do not match, it indicates the presence of a man-in-middle attack. If they do match, there is a high probability that no man-in-the-middle is present. The use of hash commitment in the DH exchange constrains the attacker to only one guess to generate the correct SAS in his attack, which means the SAS can be quite short. A 16-bit SAS, for example, provides the attacker only one chance out of 65536 of not being detected.

ZRTP provides a second layer of authentication against a MiTM attack, based on a form of key continuity. It does this by caching some hashed shared key material to use in the next call, to be mixed in with the next call’s DH shared secret, giving it key continuity properties analogous to SSH. If the MiTM is not present in the first call, he is locked out of subsequent calls. Thus, even if the SAS is never used, most MiTM attacks are stopped, because they weren’t present in the first call. ZRTP’s key continuity features have some self-healing properties that are better than the SSH approach.

ZRTP provides yet a third layer of protection against a MiTM attack. The IETF plans to add integrity protection to the delivery of SIP information, and that integrity protection will rely on a PKI. When this eventually deploys, ZRTP can take advantage of this. See RFC 6189 on how ZRTP can leverage an integrity-protected SIP layer to provide integrity protection for ZRTP’s Diffie-Hellman exchange in the media layer. This protects against a MiTM attack, without requiring the users to verbally compare the SAS. However, no VoIP clients yet offer a fully implemented SIP stack that provides end-to-end integrity protection for the delivery of SIP information. Thus, real-world implementations of ZRTP endpoints will continue to depend on SAS authentication for quite some time. Even after there is widespread availability of SIP products that offer integrity protection, many users will still be faced with the fact that the signaling path may be controlled by institutions that do not have the best interests of the end user in mind. In those cases, ZRTP’s built-in SAS authentication will remain the gold standard for the prudent user. That, plus the key continuity features.