How Attune measures emotional compatibility
Attune is built on three layers of intelligence: profile compatibility, emotional response profiling, and interaction analysis. Each layer informs the next. This page explains the scientific foundation underneath all three — the measurement methodology, the data model, and what closed beta data showed.
The FACS framework
The Facial Action Coding System (FACS) is a comprehensive anatomical taxonomy of human facial expressions, originally developed by psychologist Paul Ekman and Wallace V. Friesen in 1978 and updated in 2002. FACS codes facial movements in terms of the underlying muscle groups that produce them, called Action Units (AUs).
FACS is the established standard for facial expression research across psychology, neuroscience, and human-computer interaction. It is peer-reviewed, cross-culturally validated, and widely used in academic and clinical settings. EchoDepth's emotion analysis is built on this foundation — not proprietary pseudoscience.
EchoDepth and Action Units
EchoDepth analyses 44 facial Action Units in real time using the device's front-facing camera. Each AU corresponds to a specific facial muscle movement — for example, AU1 (inner brow raise), AU6 (cheek raiser), AU12 (lip corner puller, commonly associated with genuine smiling).
Detection runs at under 200 milliseconds latency, capturing both voluntary expressions and involuntary micro-expressions — brief, automatic facial movements lasting under 200ms that are extremely difficult to consciously control. This is what makes emotional profiling resistant to deliberate manipulation: you can perform happiness, but the underlying micro-expression pattern of genuine joy is distinct and measurable.
All processing occurs on the user's device. No raw images, video frames, or biometric data are transmitted or stored at any point.
Temporal data — beyond snapshots
Earlier emotion AI systems treated facial data as a static measurement — a single score or vector at a point in time. Attune's data model is fundamentally different: it captures emotional responses as time-series data across the duration of each stimulus.
For each stimulus clip and each live interaction, the system records four temporal dimensions per AU:
- Reaction timing — onset latency from stimulus start. Fast involuntary responses carry different weight to deliberate, delayed expressions. Sub-200ms activations are treated as automatic; later activations are flagged as potentially managed.
- Intensity — the magnitude of activation at each point in time. A faint, fleeting smile and a full Duchenne smile involve the same AUs but represent meaningfully different emotional states.
- Duration — the sustained length of an expression. Emotional engagement is reflected in how long a response is held, not just whether it occurred.
- Recovery — the rate of return to baseline after peak activation. Slow recovery indicates emotional persistence; rapid recovery suggests a transient or suppressed response.
The result is not a single emotional vector but an emotional trajectory — a shape across time, per stimulus, per person. This richer representation is what enables meaningful comparison between individuals and meaningful analysis of live interactions.
Storing time-series data rather than single values requires a different backend architecture. Attune's data model is built to support this from the outset — not retrofitted later.
Labelled stimulus data
Emotional response data is meaningless without context. An AU pattern that looks like surprise could indicate delight, disgust, shock, or disbelief — the facial muscle movements overlap significantly. Without a label, there is no ground truth.
Attune's onboarding solves this with a labelled stimulus design. After each calibration clip, users provide an explicit rating — like, neutral, or dislike — for the content they just watched. This label is stored alongside the AU time-series data as a matched pair.
The consequence is that Attune's emotional profiles are anchored in actual stated preferences, not inferred ones. The system learns to associate specific temporal AU patterns with specific emotional valences — per individual, calibrated against their personal baseline. This is the foundation on which the matching model is trained.
The calibration baseline — a 30-second neutral face capture before any stimulus is presented — provides the individual reference point against which all subsequent responses are measured. Without it, inter-person comparison is confounded by the natural variation in baseline facial muscle tone. With it, the deviation from baseline becomes the meaningful unit of analysis.
The VAD emotional model
EchoDepth maps AU activations to a three-dimensional emotional space using the Valence-Arousal-Dominance (VAD) model, the most widely used dimensional model of emotion in affective computing research.
- Valence — the positive-to-negative quality of an emotional response (pleasure vs. displeasure)
- Arousal — the intensity or activation level of an emotional response (excited vs. calm)
- Dominance — the sense of control or submissiveness in an emotional response (in control vs. overwhelmed)
In Attune's model, VAD coordinates are calculated across time for each stimulus, producing an emotional trajectory rather than a static point. An emotional profile is the full set of these trajectories across the stimulus battery — anonymised, vectorised, and incapable of reconstructing the original video or identifying the individual's face.
How matching works
Attune's matching engine runs in strict sequence. Profile compatibility is resolved first. Emotional similarity ranking only applies within a compatible pool. Interaction outcomes refine the model over time.
Stage 1: Profile compatibility filter
Relationship intent, lifestyle preferences, identity, and dealbreakers are used as hard filters before any emotional comparison occurs. This is not optional weighting — it is a gate. Two people with near-identical emotional profiles but incompatible relationship goals are not shown to each other. Emotional resonance cannot overcome a fundamental incompatibility at the values level.
Stage 2: Emotional response similarity ranking
Within the compatible pool, users are ranked by emotional response profile similarity — VAD vector proximity across the stimulus battery, weighted by temporal features. The goal is not to find identical emotional profiles, but complementary ones. People who respond to the world in ways that are likely to resonate with each other, not simply mirror each other.
Stage 3: Interaction outcome validation
Every video date with a mutual yes/no outcome is a labelled training example. The matching model is updated as real interaction data accumulates — weighting features associated with mutual interest more heavily, and down-weighting features that do not predict real-world chemistry. Accuracy improves over time through data, not through algorithmic assumption at launch.
Interaction analysis and learning
Attune's video date environment is built inside the platform. This is a deliberate product decision: external tools cannot provide the data access necessary for post-session analysis, and compliance with biometric data handling requirements demands a controlled, consent-explicit environment.
After each session — with explicit prior consent from both participants — the interaction is processed to extract a set of interaction features:
- AU patterns during conversation — smiles, tension markers, negative affect indicators — tracked across the full arc of the date, not only at peak moments
- Emotional synchrony — the degree to which the two participants' expressions align over time. Interpersonal synchrony is a well-documented predictor of felt connection and rapport
- Turn-taking and conversational balance — who speaks, for how long, and when. Persistent imbalance in conversation dynamics reliably predicts low satisfaction ratings
- Emotional trajectory — whether engagement (measured through arousal and valence) increased or decreased as the conversation progressed
The post-date feedback mechanism is double-blind by design: each participant privately submits a yes or no for a second date; neither response is revealed unless both say yes. This eliminates social desirability bias from the outcome label, producing cleaner training data than any self-report mechanism that reveals responses mutually.
The goal of the learning system is not to determine compatibility — it is to improve the probability of a mutual yes over time. The model does not claim to predict love. It claims to get better at identifying the conditions under which two people are likely to want to meet again.
Closed beta results
In Attune's closed beta, participants were asked to rate their first conversation with each match on a simple scale: not interesting, somewhat interesting, genuinely interesting, or better than expected. 94% of matched users selected "genuinely interesting" or above after their first conversation.
This data is from Cavefish's internal beta programme. For questions about the methodology or to request further data, contact hello@attunechemistry.com.
Data and privacy
Emotional data is the most personal data Attune handles. The architecture is built to minimise what is stored and maximise user control:
- Raw video and images are processed on-device and never transmitted
- What is stored is an anonymised emotional vector — a mathematical representation with no biometric identifiers
- Emotional vectors are encrypted at rest
- Users can permanently delete their entire profile, emotional vector, and all associated data in one tap
- Attune is fully compliant with UK GDPR, including the right to erasure
For full details, read our Privacy Policy.
Key terms defined
- Action Unit (AU)
- A coded facial muscle movement in the FACS taxonomy. EchoDepth tracks 44 AUs in real time. Each AU corresponds to a specific anatomical muscle or muscle group — for example, AU6 (cheek raiser) and AU12 (lip corner puller) together characterise a genuine Duchenne smile.
- Valence
- The positive-to-negative dimension of the VAD emotional model. High valence = pleasurable; low valence = unpleasant. Measured continuously across a stimulus rather than as a single endpoint score.
- Arousal
- The intensity or activation dimension of the VAD model. High arousal = excited or agitated; low arousal = calm or bored. Paired with valence to distinguish, for example, contentment (positive valence, low arousal) from excitement (positive valence, high arousal).
- Dominance
- The control dimension of the VAD model. High dominance = feeling in control or powerful; low dominance = feeling overwhelmed or submissive. Less studied than valence and arousal but informative in social interaction contexts.
- Micro-expression
- An involuntary facial expression lasting under 200 milliseconds, too brief to be consciously controlled or masked. First documented by Paul Ekman and Wallace Friesen. EchoDepth's sub-200ms detection window is specifically calibrated to capture these.
- Emotional synchrony
- The degree to which two people's emotional expressions align over the course of an interaction. A well-documented predictor of felt rapport and connection — couples with higher synchrony consistently rate interactions as more satisfying.
- Labelled stimulus
- A stimulus (video clip) paired with an explicit user rating — like, neutral, or dislike. The label provides the ground truth that anchors Attune's emotional profiles to actual stated preferences rather than inferred ones.
Primary sources and further reading
- Ekman, P. & Friesen, W.V. (1978) Facial Action Coding System. Consulting Psychologists Press.
- Mehrabian, A. (1996) Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.
- Dimberg, U., Thunberg, M. & Elmehed, K. (2000) Unconscious facial reactions to emotional facial expressions. Psychological Science, 11(1), 86–89. Related: Nature neuroscience of facial expression.
- Levenson, R.W. & Gottman, J.M. (1983) Marital interaction: Physiological linkage and affective exchange. Journal of Personality and Social Psychology, 45(3), 587–597. Related on Nature.com.
- Posner, J., Russell, J.A. & Peterson, B.S. (2005) The circumplex model of affect. Development and Psychopathology, 17(3), 715–734.
- Ekman, P. (2003) Emotions Revealed. Times Books — micro-expression detection and universality of facial expression.
- Russell, J.A. (1980) A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
Frequently asked questions
What is the FACS framework?
The Facial Action Coding System (FACS) is a comprehensive anatomical taxonomy of human facial expressions developed by psychologist Paul Ekman and Wallace V. Friesen in 1978. It is the established standard for facial expression research across psychology, neuroscience, and human-computer interaction.
EchoDepth analyses 44 facial Action Units in real time using the device's front-facing camera. It captures both voluntary expressions and involuntary micro-expressions lasting under 200 milliseconds, then maps them to emotional states using the Valence-Arousal-Dominance model. All processing occurs on-device.
The Valence-Arousal-Dominance model is the most widely used dimensional model of emotion in affective computing. Valence measures positive-to-negative feeling, arousal measures intensity, and dominance measures sense of control. Together they produce a three-dimensional emotional fingerprint.
In Attune's closed beta, 94% of matched users rated their first conversation as genuinely interesting or better. The matching system compares emotional vectors using a similarity model that finds complementary profiles — not identical ones.