Your Video Call Metadata Is Plaintext
Every major video conferencing platform transmits and stores participant names, emails, and user IDs in plaintext on the relay server. The media might be encrypted. The metadata isn't. We fixed it.
What Your Video Platform Knows About Every Call
When you join a video call on any major platform, the relay server—the infrastructure that routes your video and audio packets between participants—knows exactly who's in the room. Not just IP addresses. Names. Email addresses. User IDs. Organizational affiliations. Join times. Leave times. Who spoke. For how long.
This is true for every platform that uses a Selective Forwarding Unit (SFU) architecture, which is all of them at scale. The SFU needs to know where to forward packets, which means it maintains a list of participants and their connection state. That participant list includes identity metadata in plaintext.
End-to-end encryption for media (the actual audio and video streams) is increasingly common. Zoom, Google Meet, Microsoft Teams, and others offer E2EE for the media layer. This means the relay server can't listen to your conversation or watch your video.
But the metadata—who's in the room, when they joined, their display names, their email addresses—is not encrypted. It can't be, under traditional architectures, because the server needs to read it to do its job.
Why Metadata Matters More Than Media
An attacker who compromises a video platform's relay infrastructure gains something far more valuable than recordings of individual calls. They gain a complete social graph.
Who met with whom. How often. When the meetings were scheduled. Which meetings were recurring. Which participants were from which organizations. Which meetings had legal counsel present. Which meetings had board members. Which meetings preceded major announcements, mergers, or regulatory actions.
This is the metadata. It's structured, searchable, and far more actionable than a raw audio stream. Surveillance agencies have stated publicly that metadata is more valuable than content. "We kill people based on metadata," said former NSA and CIA director Michael Hayden.
Every video platform stores this metadata in plaintext because the server needs it to function. Join a room? The server reads your name and email from your JWT to display it to other participants. The server reads your user ID to enforce permissions. This data sits in memory (DashMap, Redis, in-memory data structures) in cleartext, accessible to anyone who gains server access.
The Conventional Wisdom: "The Server Has to Know"
The standard argument is that the relay server must know participant identities to function. It needs to:
- Display participant names to other users in the room
- Route signaling messages to the correct peer
- Enforce room capacity limits
- Apply access control (who's allowed in this room)
- Generate billing records (which tenant, how long)
If the server can't read the participant metadata, how does it do any of this?
The answer requires splitting the problem. The server doesn't need to know who you are. It needs to know that you're authorized and where to send your packets. These are different questions with different answers.
Encrypted Participant Identity
We built a system where participant metadata—display name, user ID, email—is encrypted on the client device before being sent to the relay server. The server stores and forwards the encrypted metadata without ever seeing the plaintext.
The encryption uses AES-256-GCM with a room key derived through Kyber (ML-KEM) post-quantum key exchange. Each room has a unique encryption key. The key is distributed to authorized participants through the signaling channel, wrapped in Kyber key encapsulation. The relay server facilitates the key exchange but never possesses the room key.
When a participant joins a room:
- The client encrypts their display name, user ID, and email with the room's AES-256-GCM key.
- The encrypted metadata is sent to the server as an opaque blob.
- The server stores the blob alongside the participant's connection state (peer ID, relay address, track subscriptions).
- When another participant joins, the server forwards the encrypted blobs. Each participant decrypts using the shared room key.
- The server never sees any plaintext identity.
The server can still do its job. It routes packets by peer ID (a random identifier with no identity information). It enforces room capacity by counting connections, not by reading names. Access control is verified by checking a cryptographic token at join time—the token proves authorization without revealing identity to the server.
The Commitment Problem
There's a subtle trust issue with encrypted metadata. If the server can't read the metadata, how does a verifier confirm that the encrypted blob actually contains what the participant claims? A malicious participant could encrypt a fake name and the server would forward it without question.
This is where the cryptographic commitment comes in. When the participant encrypts their metadata, they also compute a SHA3-256 commitment of the plaintext values. This commitment is a 32-byte hash that binds to the exact identity without revealing it. The commitment is signed with a post-quantum attestation covering three algorithm families.
The verification chain is:
- Encrypted metadata arrives with a commitment hash.
- The commitment is bound to a post-quantum attestation (74 bytes).
- The attestation proves that three independent signature families vouched for this specific identity at this specific time.
- When a peer decrypts the metadata, they recompute the commitment from the plaintext and verify it matches the attested commitment.
- If it matches: the identity is genuine and post-quantum attested.
- If it doesn't match: the metadata was tampered with.
The server never sees the plaintext. The commitment is a hash—the server can't reverse it. The attestation is cryptographically bound to the specific identity. And the entire mechanism adds 74 bytes of overhead per participant.
Encrypted Billing
Video conferencing billing depends on usage data: bytes relayed, session duration, participant count. Traditionally, the billing engine operates on plaintext usage records. This creates another metadata exposure point—billing records reveal who used the service, how much, and when.
We extended the same principle to billing computation. Usage data (bytes relayed, cost calculations, tier pricing) is processed through an attested computation pipeline. The billing result is bound to a post-quantum attestation proving the computation was performed correctly, without exposing the underlying usage data to any system that doesn't need it.
The tenant decrypts their own bill. The billing infrastructure proves the computation was honest. The raw usage numbers are not stored in plaintext anywhere in the pipeline.
Meeting Artifact Attestation
After a meeting ends, the artifacts—recording, transcript, AI summary—need to be signed to prove they haven't been tampered with. Traditional approaches use a single signature algorithm, usually ECDSA or RSA.
We sign every meeting artifact with three post-quantum signature families through the same attestation pipeline used for participant metadata. The 74-byte attestation binds to the artifact hash, proving:
- This recording was produced by this meeting at this time
- It has not been modified since signing
- The attestation survives a quantum computing breakthrough in any single algorithm family
The signing happens automatically at the end of every meeting. No user action required. The attestation is returned alongside the artifact URL. Existing systems that don't understand post-quantum attestations can ignore it—the artifact is unchanged. Systems that do understand it get three-family PQ verification.
What This Changes for Enterprises
Board meetings
The relay server doesn't know which board members attended. The encrypted metadata is decryptable only by participants with the room key. An attacker who compromises server infrastructure cannot determine that the CEO, CFO, and lead counsel were in a meeting at 2 AM before a major acquisition announcement.
Legal proceedings
Attorney-client privilege requires confidentiality. When the relay server knows who's in a call between a client and their attorney, that metadata is potentially discoverable in litigation. Encrypted metadata eliminates this exposure point.
Healthcare
Telehealth sessions involve protected health information (PHI). Under HIPAA, even the fact that a patient met with a specific specialist is PHI. Encrypted participant metadata means the video infrastructure provider is not a business associate processing PHI—they never see it.
Government
Classified briefings over video require that the participant list itself be protected. Encrypted metadata prevents the relay infrastructure from being a target for participant list extraction.
The Post-Quantum Dimension
All of this encryption uses Kyber (ML-KEM) for key exchange, which is post-quantum secure. The room keys are exchanged through lattice-based key encapsulation that a quantum computer cannot break. The AES-256-GCM encryption of metadata uses symmetric cryptography, which is already quantum-resistant at 256-bit key lengths (Grover's algorithm halves the effective security, so AES-256 provides 128-bit post-quantum security).
The attestation layer adds three-family post-quantum signing. Even if the key exchange is somehow compromised in the future, the attestation proves who was in the room and when—with three independent mathematical proofs that remain verifiable.
This matters for "harvest now, decrypt later" attacks. A nation-state adversary recording encrypted meeting traffic today, waiting for quantum computers to break the encryption tomorrow, would still face three-family post-quantum attestation on the metadata commitments. The attestation is an independent layer that survives even if the transport encryption falls.
Performance Impact
The natural concern with any encryption-at-rest approach in real-time video is latency. Can you add encryption and attestation without making the join experience feel sluggish?
The answer is yes, because the operations are fast and they happen in the right places:
- Metadata encryption: AES-256-GCM encrypting a display name (20 bytes) takes about 0.5 microseconds. Imperceptible.
- Kyber key exchange: ML-KEM-768 encapsulation takes about 50 microseconds. This happens once at room join, not per-message.
- Attestation: The 74-byte post-quantum attestation takes about 16 milliseconds to produce (dominated by SPHINCS+ signing). This happens once per participant join, in the background, after the participant is already connected and seeing video.
- Verification: Checking an attestation is a sub-microsecond lookup. Other participants verify the joining peer's attestation without any perceptible delay.
Total added latency to the join experience: less than 1 millisecond for the encryption and key exchange. The attestation runs asynchronously. Users don't wait for it.
What We Still Can't Hide
To be clear about the limitations:
- IP addresses: The relay server needs to know where to send packets. Client IP addresses are visible to the server. Use a VPN or Tor if IP-level anonymity is required.
- Connection timing: The server knows when connections are established and terminated. It can't read who connected, but it can observe the connection pattern.
- Bandwidth usage: The server forwards packets and can observe aggregate bandwidth per connection. It can't attribute this to a named individual (because the name is encrypted), but it can see that "connection X used Y bytes."
- Room existence: The server knows rooms exist and how many connections are in each room. It doesn't know who's in them.
These are inherent limitations of any relay-based architecture. Full metadata anonymity would require onion routing or mix networks, which introduce latency incompatible with real-time video. Our approach encrypts identity metadata while accepting that network-level metadata (IPs, timing, bandwidth) remains visible to the relay.
The Standard Should Be Higher
Every video platform today treats participant metadata as non-sensitive. It's logged, stored, indexed, and queryable. It flows through multiple systems in plaintext. It's accessible to server operators, cloud providers, and anyone who gains access to the infrastructure.
This was understandable when encryption added too much overhead or complexity. It's no longer understandable. AES-256-GCM encryption of a display name takes half a microsecond. Kyber key exchange takes 50 microseconds. Post-quantum attestation takes 16 milliseconds in the background. These are not performance barriers. They're implementation decisions.
The question every enterprise should ask their video provider: "Does your relay server know who's in my meeting?" If the answer is yes, the metadata is plaintext, and every limitation we described above applies.
74 bytes of post-quantum attestation. Sub-microsecond identity encryption. No participant metadata in the clear. That's the standard we think video should meet.
V100: Post-Quantum Video Infrastructure
Encrypted participant metadata. Three-family PQ attestation on every artifact. Under a microsecond overhead.
Talk to Us Technical Documentation