The science

How Attune measures emotional compatibility

Attune is built on three layers of intelligence: profile compatibility, emotional response profiling, and interaction analysis. Each layer informs the next. This page explains the scientific foundation underneath all three — the measurement methodology, the data model, and what closed beta data showed.

The FACS framework

The Facial Action Coding System (FACS) is a comprehensive anatomical taxonomy of human facial expressions, originally developed by psychologist Paul Ekman and Wallace V. Friesen in 1978 and updated in 2002. FACS codes facial movements in terms of the underlying muscle groups that produce them, called Action Units (AUs).

FACS is the established standard for facial expression research across psychology, neuroscience, and human-computer interaction. It is peer-reviewed, cross-culturally validated, and widely used in academic and clinical settings. EchoDepth's emotion analysis is built on this foundation — not proprietary pseudoscience.

EchoDepth and Action Units

EchoDepth analyses 44 facial Action Units in real time using the device's front-facing camera. Each AU corresponds to a specific facial muscle movement — for example, AU1 (inner brow raise), AU6 (cheek raiser), AU12 (lip corner puller, commonly associated with genuine smiling).

Detection runs at under 200 milliseconds latency, capturing both voluntary expressions and involuntary micro-expressions — brief, automatic facial movements lasting under 200ms that are extremely difficult to consciously control. This is what makes emotional profiling resistant to deliberate manipulation: you can perform happiness, but the underlying micro-expression pattern of genuine joy is distinct and measurable.

All processing occurs on the user's device. No raw images, video frames, or biometric data are transmitted or stored at any point.

Temporal data — beyond snapshots

Earlier emotion AI systems treated facial data as a static measurement — a single score or vector at a point in time. Attune's data model is fundamentally different: it captures emotional responses as time-series data across the duration of each stimulus.

For each stimulus clip and each live interaction, the system records four temporal dimensions per AU:

The result is not a single emotional vector but an emotional trajectory — a shape across time, per stimulus, per person. This richer representation is what enables meaningful comparison between individuals and meaningful analysis of live interactions.

Storing time-series data rather than single values requires a different backend architecture. Attune's data model is built to support this from the outset — not retrofitted later.

Labelled stimulus data

Emotional response data is meaningless without context. An AU pattern that looks like surprise could indicate delight, disgust, shock, or disbelief — the facial muscle movements overlap significantly. Without a label, there is no ground truth.

Attune's onboarding solves this with a labelled stimulus design. After each calibration clip, users provide an explicit rating — like, neutral, or dislike — for the content they just watched. This label is stored alongside the AU time-series data as a matched pair.

The consequence is that Attune's emotional profiles are anchored in actual stated preferences, not inferred ones. The system learns to associate specific temporal AU patterns with specific emotional valences — per individual, calibrated against their personal baseline. This is the foundation on which the matching model is trained.

The calibration baseline — a 30-second neutral face capture before any stimulus is presented — provides the individual reference point against which all subsequent responses are measured. Without it, inter-person comparison is confounded by the natural variation in baseline facial muscle tone. With it, the deviation from baseline becomes the meaningful unit of analysis.

The VAD emotional model

EchoDepth maps AU activations to a three-dimensional emotional space using the Valence-Arousal-Dominance (VAD) model, the most widely used dimensional model of emotion in affective computing research.

In Attune's model, VAD coordinates are calculated across time for each stimulus, producing an emotional trajectory rather than a static point. An emotional profile is the full set of these trajectories across the stimulus battery — anonymised, vectorised, and incapable of reconstructing the original video or identifying the individual's face.

How matching works

Attune's matching engine runs in strict sequence. Profile compatibility is resolved first. Emotional similarity ranking only applies within a compatible pool. Interaction outcomes refine the model over time.

Stage 1: Profile compatibility filter

Relationship intent, lifestyle preferences, identity, and dealbreakers are used as hard filters before any emotional comparison occurs. This is not optional weighting — it is a gate. Two people with near-identical emotional profiles but incompatible relationship goals are not shown to each other. Emotional resonance cannot overcome a fundamental incompatibility at the values level.

Stage 2: Emotional response similarity ranking

Within the compatible pool, users are ranked by emotional response profile similarity — VAD vector proximity across the stimulus battery, weighted by temporal features. The goal is not to find identical emotional profiles, but complementary ones. People who respond to the world in ways that are likely to resonate with each other, not simply mirror each other.

Stage 3: Interaction outcome validation

Every video date with a mutual yes/no outcome is a labelled training example. The matching model is updated as real interaction data accumulates — weighting features associated with mutual interest more heavily, and down-weighting features that do not predict real-world chemistry. Accuracy improves over time through data, not through algorithmic assumption at launch.

Interaction analysis and learning

Attune's video date environment is built inside the platform. This is a deliberate product decision: external tools cannot provide the data access necessary for post-session analysis, and compliance with biometric data handling requirements demands a controlled, consent-explicit environment.

After each session — with explicit prior consent from both participants — the interaction is processed to extract a set of interaction features:

The post-date feedback mechanism is double-blind by design: each participant privately submits a yes or no for a second date; neither response is revealed unless both say yes. This eliminates social desirability bias from the outcome label, producing cleaner training data than any self-report mechanism that reveals responses mutually.

The goal of the learning system is not to determine compatibility — it is to improve the probability of a mutual yes over time. The model does not claim to predict love. It claims to get better at identifying the conditions under which two people are likely to want to meet again.

Closed beta results

94%
of matched users rated their first Attune conversation as genuinely interesting or better

In Attune's closed beta, participants were asked to rate their first conversation with each match on a simple scale: not interesting, somewhat interesting, genuinely interesting, or better than expected. 94% of matched users selected "genuinely interesting" or above after their first conversation.

This data is from Cavefish's internal beta programme. For questions about the methodology or to request further data, contact hello@attunechemistry.com.

Data and privacy

Emotional data is the most personal data Attune handles. The architecture is built to minimise what is stored and maximise user control:

For full details, read our Privacy Policy.

Key terms defined

Action Unit (AU)
A coded facial muscle movement in the FACS taxonomy. EchoDepth tracks 44 AUs in real time. Each AU corresponds to a specific anatomical muscle or muscle group — for example, AU6 (cheek raiser) and AU12 (lip corner puller) together characterise a genuine Duchenne smile.
Valence
The positive-to-negative dimension of the VAD emotional model. High valence = pleasurable; low valence = unpleasant. Measured continuously across a stimulus rather than as a single endpoint score.
Arousal
The intensity or activation dimension of the VAD model. High arousal = excited or agitated; low arousal = calm or bored. Paired with valence to distinguish, for example, contentment (positive valence, low arousal) from excitement (positive valence, high arousal).
Dominance
The control dimension of the VAD model. High dominance = feeling in control or powerful; low dominance = feeling overwhelmed or submissive. Less studied than valence and arousal but informative in social interaction contexts.
Micro-expression
An involuntary facial expression lasting under 200 milliseconds, too brief to be consciously controlled or masked. First documented by Paul Ekman and Wallace Friesen. EchoDepth's sub-200ms detection window is specifically calibrated to capture these.
Emotional synchrony
The degree to which two people's emotional expressions align over the course of an interaction. A well-documented predictor of felt rapport and connection — couples with higher synchrony consistently rate interactions as more satisfying.
Labelled stimulus
A stimulus (video clip) paired with an explicit user rating — like, neutral, or dislike. The label provides the ground truth that anchors Attune's emotional profiles to actual stated preferences rather than inferred ones.

Primary sources and further reading

Frequently asked questions

What is the FACS framework?

The Facial Action Coding System (FACS) is a comprehensive anatomical taxonomy of human facial expressions developed by psychologist Paul Ekman and Wallace V. Friesen in 1978. It is the established standard for facial expression research across psychology, neuroscience, and human-computer interaction.

EchoDepth analyses 44 facial Action Units in real time using the device's front-facing camera. It captures both voluntary expressions and involuntary micro-expressions lasting under 200 milliseconds, then maps them to emotional states using the Valence-Arousal-Dominance model. All processing occurs on-device.

The Valence-Arousal-Dominance model is the most widely used dimensional model of emotion in affective computing. Valence measures positive-to-negative feeling, arousal measures intensity, and dominance measures sense of control. Together they produce a three-dimensional emotional fingerprint.

In Attune's closed beta, 94% of matched users rated their first conversation as genuinely interesting or better. The matching system compares emotional vectors using a similarity model that finds complementary profiles — not identical ones.

Questions about the science or the data?

Get in touch Join the waitlist