This webpage was generated automatically. To view the article in its original setting, you may follow the link below:
https://www.nature.com/articles/s41467-025-55833-x
and if you wish to have this article removed from our site, please get in touch with us
This study adheres to all applicable ethical standards and received approval from the Institutional Review Board of Sungkyunkwan University (2018-05-005).
Participants
In total, twenty-four participants aged between 19 and 25 (M = 22, SD = 2.18; 6 females) from the Sungkyunkwan University population took part in the analyses. Initially, 32 fluent Korean speakers with normal or corrected vision and hearing were enlisted and provided informed consent as per the guidelines established by the Institutional Review Board of Sungkyunkwan University. They were compensated financially for contributing to the fMRI sessions. Data from eight participants were omitted due to excessive head movement, as over 5% of the data surpassed the threshold of FD > 0.5 in at least one fMRI run. Many of the excluded participants had limited experience with verbally recounting within an MRI scanner, which likely resulted in the excessive head movement during the recall task. We did not perform analyses based on sex and gender since we lacked specific hypotheses concerning the impact of these variables on memory processes.
Task structure
All tasks were executed in an MRI scanner (Fig. 2A) and were administered and managed using MATLAB with the Psychophysics Toolbox86. Participants underwent six fMRI runs, incorporating separate viewing and narrative recall runs for three episodes.
Movie-viewing task
Participants viewed the first season of the Korean YouTube web series ‘Love Playlist’ (accessible at https://www.youtube.com/playlist?app=desktop&list=PLS–ClexQbQ1lg6TttQTcE3a60T_noSnh), concentrating on the social interactions depicted. None of the participants had prior exposure to this series, and no behavioral reactions were necessary during the fMRI scan. The series illustrates romantic and friendship dynamics among six university students, capturing the evolution of relationships, romantic connections, breakups, love triangles, and unreciprocated affection. The entirety of the series was split into three episodes of approximately 14 minutes each, preceded by a 10-second blank screen and followed by a 16-second blank screen after each episode. To ensure continuity, each episode opened with a replay of the final 30 seconds from the previous one, and data from these repeated segments were excluded from analysis. Audio was transmitted through MR-compatible headphones (OptoACTIVE II, Optoacoustic Ltd.).
Narrative recall task
Prior to watching the movie, participants viewed an unrelated 90-second video clip for a recall exercise and received feedback such as “Clearly recount character names.” Following each episode of the series, they were instructed to verbally recount the narrative in chronological order with maximum detail, particularly emphasizing social relationships. The recall task commenced with a 10-second blank screen, followed by a fixation cue indicating the beginning of the recall task. Participants had to recall each episode in detail for no less than five minutes, with the opportunity to revisit prior events if desired. We stressed that while chronological order was significant, providing detailed recollections was even more essential. The recall task was manually concluded 10 seconds after the participants declared, “I’m done.” On average, participants recalled each episode for 422 seconds (s.d. across episodes and participants = 144 sec, range = 180 ~ 866 sec). Recollections were recorded using an MR-compatible microphone (FOMRI III, Optoacoustics Ltd.).
Movie annotation
Event segmentation
Four independent annotators, unfamiliar with the movie, segmented it into defined events by marking the beginning and concluding times of each event as they viewed it. They recognized new events upon observing alterations in topic, location, timing, characters, or relationship dynamics, giving each segmented event a title and providing their rationale for the changes observed25. From the initially identified 61 events, only those lasting over 20 seconds with event boundaries receiving agreement from three or more annotators within a 5-second window were preserved for further analyses. This selection process yielded 44 events, but the first event was excluded from analyses due to its lack of availability for novelty measurements, resulting in 43 events (mean duration = 48.45 seconds). To further substantiate our event segmentation protocol, we executed an additional experiment with an independent group of fifteen participants utilizing the spontaneous protocol, where participants were instructed to press a key whenever they recognized the onset and conclusion of an event28,29,87. All event boundaries identified by the annotators were acknowledged as boundaries by over 60% of the participants in the spontaneous protocol (Supplementary Fig. 2A). Moreover, consistent with earlier findings, we witnessed heightened hippocampal activation at these event boundaries28,87 (Supplementary Fig. 2B).
Detailed annotation
A trained annotator, who was not involved in the fMRI study or the event segmentation, provided a meticulous second-by-second annotation of the movie. This annotation encompassed 1) identifying social interaction characters as senders or receivers of information, 2) detailing characters’ actions, 3) depicting their emotions, and 4) supplying scene descriptions. For instance, in a bar scene where character A converses with characters B and C, with character B laughing and character C appearing disinterested, the annotation included the following specifics: character A was designated as the sender and engaged in conversation; characters B and C were regarded as receivers; character B’s action was laughing, accompanied by a pleasant emotion; character C’s emotion was categorized as bored; and the scene was described as ‘character A, B, and C are conversing in the pub’. For the sake of uniformity, synonymous actions and emotions (e.g., cry and sob) were standardized (e.g., cry).
Recall transcript
Audio recordings from the recall assignment were transcribed into text files with millisecond-level word timings utilizing NAVER CLOVA speech recognition88. These transcripts were subsequently organized into words per second, aligning with the TR of the fMRI data. The first author segmented the recall transcripts into events that corresponded to the movie’s event structure and logged the start and end times for recounting each movie event.
Novelty and memorability of movie events
Measure of novelty
The principal narrative of the movie utilized in our research revolves around active social interactions among the characters. These social interactions are categorized into two components: the co-occurrences of characters, which reflect their connections, and the valence of their interactions, indicating the context.
For example, the regular occurrence of character A together with character B would suggest a reinforced connection between them. Likewise, if character C regularly engages with character D in an affirmative way, the situation of their interaction is considered favorable. Utilizing these elements, we measured the level of novelty for each cinematic event by evaluating co-occurrence and valence novelty.
$${{mbox{Co}}}-{{mbox{occurrence}}}_{{{mbox{AB}}},{{mbox{t}}}}=left{begin{array}{c}1 , , {if} ; {appeared} ; {together} 0 , {if}; {not}end{array}right.,$$
(1)
where A and B signify distinct characters, and t represents each moment in time.
At first, we computed directional co-occurrence and valence values between each character pair (co-occurrence and valence matrices) for all events. We tallied how often characters appeared simultaneously during each event to ascertain the co-occurrence value. Specifically, when characters A (sender) and B (receiver) coexisted in the annotation, a directional co-occurrence value of 1 was designated from A to B at time t (Co-occurrenceAB,t = 1). On the other hand, directional co-occurrence values (Co-occurrenceBC,t and Co-occurrenceCB,t) were assigned a value of 0 for characters that did not appear together.
$${{mbox{C}}}_{{mbox{AB}},{mbox{ev}}}=,frac{{sum }_{t={start}}^{{end}}{{mbox{Co}}-{mbox{occurrence}}}_{{mbox{AB}},{mbox{t}}}}{{Duration; of; the; event}}$$
(2)
$${{mbox{Co}}-{mbox{occurrence}}}_{{mbox{ev}}}=left[begin{array}{cccc}0 & {C}_{{AB},{ev}} & cdots & {C}_{{AF},{ev}} {C}_{{BA},{ev}} & 0 & cdots & cdots cdots & cdots & 0 & {C}_{{EF},{ev}} {C}_{{FA},{ev}} & cdots & {C}_{{FE},{ev}} & 0end{array}right],,$$
(3)
where Co-occurrenceev constitutes a 6 x 6 matrix (for six characters in the film), with ev symbolizing an individual event.
The event-based co-occurrence score (co-occurrence matrix) was subsequently determined by dividing the total of directional co-occurrences during an event by the event’s length. For instance, if characters A (sender) and B (receiver) were present together for 16 seconds within a 42-second event, the co-occurrence scoreAB,ev was calculated as 16/42 = 0.38.
$${{Novelty}}_{{Co}-{occurrence},{ev}}=1-{corr}left({sum }_{e=1}^{{ev}-1}{{mbox{Co}}-{mbox{occurrence}}}_{{mbox{e}}},,{{mbox{Co}}-{mbox{occurrence}}}_{{mbox{ev}}}right)$$
(4)
Thereafter, co-occurrence novelty for each event was measured by computing 1 − the Pearson correlation between the accumulated co-occurrence matrices from prior events and the co-occurrence matrix of the present event. This serves to illustrate how the co-occurrence patterns of characters in the current event diverge from previous cumulative patterns.
We evaluated moment-to-moment valence details between characters by calculating sentiment values for annotated action and emotion terms utilizing the Korean Sentiment Word Dictionary89 (KSWD), which spans from −2 (highly negative) to +2 (highly positive).
$${{Valence}}_{{AB},{t}}=,sum {mbox{Sentiment scores of action and emotion words}}$$
(5)
We compiled these sentiment scores for each TR for all action and emotion terms. For instance, if character A irritated character B and character B felt annoyed at time t, we allocated a sentiment score of −1 for the interaction between characters A and B (Valence scoreAB,t = −1) and between characters B and A (Valence scoreBA,t = −1) based on the sentiment scores of the words ‘annoy’ and ‘be irritated’ from the KSWD.
$${{mbox{V}}}_{{mbox{AB}},{mbox{ev}}}=,frac{{sum }_{t={start}}^{{end}}{{mbox{Valence}}}_{{mbox{AB}},{mbox{t}}}}{{Duration; of; the; event}},$$
(6)
$${{mbox{Valence}}}_{{mbox{ev}}}=left[begin{array}{cccc}0 & {V}_{{AB},{ev}} & cdots & {V}_{{AF},{ev}} {V}_{{BA},{ev}} & 0 & cdots & cdots cdots & cdots & 0 & {V}_{{EF},{ev}} {V}_{{FA},{ev}} & cdots & {V}_{{FE},{ev}} & 0end{array}right],,$$
(7)
where the Valenceev is a 6 x 6 matrix derived by dividing the total of sentiment scores over an event by the event’s duration.
$${{Novelty}}_{{Valence},{ev}}=1-{corr}left({sum }_{e=1}^{{ev}-1}{{mbox{Valence}}}_{{mbox{e}}},,{{mbox{Valence}}}_{{mbox{ev}}}right)$$
(8)
Valence novelty was evaluated using a methodology akin to co-occurrence novelty as applied to the valence matrices. We postulated that the processing of valence indicators for a particular relationship in the current event could be separate from the established valence information from other relationships. Consequently, we limited the assessment of valence novelty to the relationships featured in the current event. Since novelty computation necessitates two contiguous events (i.e., past and present), the initial event of each episode was dismissed from the analysis.
Evaluation of memorability
We gauged the memorability of each cinematic event by quantifying the number of words in the participants’ recall transcripts that accurately corresponded with the words in the comprehensive scene descriptions from the annotation. First, the terms in the recall transcript from all participants were aligned with the respective words in the scene descriptions. To ensure a uniform appraisal of recall capability and address semantic discrepancies in participants’ responses, we employed a standardization method for word selections made by participants during recall. This procedure involved manually replacing diverse terms (e.g., Facebook, Instagram, SNS, etc.) with appropriate common terms from the scene description (e.g., social media). Our analysis showed that, on average, 19.73% of the terms recounted by participants were standardized (s.d. across participants = 2.67%; Supplementary Fig. 11A). To validate this approach, we utilized the fastText algorithm90 to derive word embeddings and assessed the similarity of word embeddings between the original words recounted by participants and the standardized terms (e.g., Facebook to social media). The findings demonstrated that the standardized terms accurately conveyed the intended meaning of the participants’ original words (Supplementary Fig. 11B; mean embedding similarity = 0.33, p < 0.001, one-sided permutation test).
$${{Memorability}}_{{ev}}=,frac{{N}_{{mbox{correct words corresponding to the event}}}}{{N}_{{mbox{total words in the event}}}}$$
(9)
Next, for each event and participant, we computed the count of accurately recalled words and normalized it by the total number of words in the corresponding event description from the annotation, taking into considerationfor differences in the volume of content across occurrences. This method offered a quantitative indicator of the level of memorability for each occurrence.
Imaging methodology
Data acquisition
Neuroimaging data were collected at the Center for Neuroscience Imaging Research of the Institute for Basic Science utilizing a 3 Tesla Siemens MAGNETOM Prisma outfitted with a 64-channel head coil. Functional images were captured through a T2*-weighted echo-planar imaging (EPI) sequence (TR = 1000 ms, TE = 30 ms, FOV = 240 mm, multiband factor = 3, in-plane acceleration factor (iPAT) = 2, and 3 mm isotropic voxel dimension with 48 slices encompassing the entire brain). In addition, high-resolution anatomical images were gathered using a T1-weighted magnetization-prepared rapid gradient echo (MPRAGE) sequence (TR = 2200 ms, TE = 2.44 ms, FOV = 256 mm, and 1 mm isotropic voxel dimension).
Data preprocessing
The neuroimaging data underwent preprocessing via fMRIprep91 (version 21.0.2) using standard configurations, which incorporated adjustments for head movement and alignment of each participant’s brain to the MNI152NLin2009cAsym template for spatial normalization. Following this, we regressed out confounding variables92, which included six motion parameters along with their derivatives, global signals, framewise displacement measurements, six components from anatomical component correction (aCompCor) obtained from fMRIprep, and polynomial regressors up to the second degree. Ultimately, the BOLD time series from the movie-viewing and narrative-recall sessions were scaled, spatially smoothed (FWHM = 5 mm), and z-scored within each individual session.
Regions of interest
The hippocampal voxels were distinguished utilizing the Brainnetome atlas93. We executed the same analysis across additional cortical areas, including the parahippocampal cortex, entorhinal cortex, perirhinal cortex, superior temporal gyrus, superior temporal sulcus, temporoparietal junction, posterior medial cortex, dorsomedial prefrontal cortex, ventromedial prefrontal cortex, visual cortex, and auditory cortex. Nevertheless, consistent results were only evident in the hippocampus (Supplementary Fig. 12). Therefore, our focus was solely on the outcomes derived from the hippocampus.
Identification of a canonical space and subspaces for memory functions
We mapped out a canonical space along with three subspaces related to memory functions, each corresponding to 1) novelty encoding, 2) memory creation, and 3) memory retrieval.
Canonical space
To define the canonical space, we initially consolidated the neural data gathered during movie watching and narrative recall, involving 24 participants, 664 voxels, and t TRs (t denotes the total duration of movie watching and narrative recall data), concatenated across three movie-viewing sessions and participants. This consolidation resulted in a two-dimensional data matrix (M664,92119). Subsequently, principal component analysis (PCA) was employed on this complete time series of neural data, represented as Mn,92119, where n signifies the number of principal components. Our focus was on the first three principal components, which together accounted for 12% of the overall variance within the hippocampus.
Subspace for novelty encoding
To ascertain the neural subspace associated with novelty encoding, we examined neural data around event boundaries ranging from −6 to +14 TRs (21 TRs) for 43 events (M24,43,664,21). We classified movie events into nine unique conditions based on a combination of three levels of co-occurrence and valence novelty utilizing one-third percentiles. Following that, a finite impulse response (FIR) model was implemented for each of the nine conditions, generating an FIR time series (M24,9,664,21, where 9 symbolizes the nine novelty conditions) for each participant. The FIR estimation aimed to average neural responses across events within each condition without presuming a specific hemodynamic response function.
Next, we deployed multivariate linear regression on the FIR time series to separate the neural representations of two novelty types. The regression model was formulated as follows:
$${R}_{v,t}left(kright)={beta }_{v,t}left(1right)cdot {{mbox{co}}}-{{mbox{occurrence}}}left(kright)+{beta }_{v,t}left(2right)cdot {valence}left(kright)+{intercept}$$
(10)
In this equation, Rv,t(k) represents the FIR-estimated response of voxel v at time t under condition k, where k corresponds to the co-occurrence and valence novelty levels of each condition organized as [−1, − 1, − 1,0,0,0, + 1, + 1, + 1] and [−1,0, + 1, − 1,0, + 1, − 1,0, + 1], respectively. The regression coefficient βv,t(n) indicates the degree to which the FIR-estimated response of voxel v at time t is influenced by each novelty level n (i.e., co-occurrence: n = 1, and valence: n = 2). To estimate βv,t(n), we establish a regressor matrix, Fv, for voxel v, composed of three rows for co-occurrence novelty, valence novelty, and an intercept.
$${beta }_{v,t}={({F}_{v}{F}_{v}^{T})}^{-1}{F}_{v}{R}_{v,t}$$
(11)
The resulting beta coefficient (β) matrix, referred to as M24,3,664,21, includes two forms of novelty and an intercept. We averaged the β matrix across participants for each novelty type, obtaining M2,664,21. We then utilized PCA on the matrix linked to each novelty type (M2,n,21, where n indicates the number of principal components). Finally, we determined the two PC-based novelty subspaces by employing the eigenvectors of each principal component. To evaluate the statistical significance of the identified subspaces, we created a null distribution of 1,000 random subspaces by permuting the condition labels and modifying the co-occurrence (k) and valence (k) components (e.g., co-occurrence (k): [0,0, − 1, + 1, − 1,0, + 1, + 1, − 1]; and valence (k): [0, − 1, − 1, + 1,0, + 1, − 1,0, + 1]).
Subspace for memory creation
To identify the memorability subspace, we organized the 43 movie events into nine memorability conditions based on the mean memorability scores of movie events across participants and applied an FIR model to the neural data across memorability conditions (M24,9,664,21, where 9 denotes the nine memorability conditions). The memorability (k) reflects the memorability level of each condition, standardized by deducting the average.
$${R}_{v,t}left(kright)={beta }_{v,t}left(3right)cdot {{mbox{memorability}}}left(kright)+{intercept},$$
(12)
where Rv,t(k) indicates the FIR-estimated response of voxel v at time t under the memorability condition k. We then enacted a linear regression model and PCA on the FIR time series using the same methodology as outlined in the distinction of novelty subspaces.
Subspace for memory retrieval
Dissimilar to film occurrences, where all viewers observed the same footage, the incidents remembered differed among them (Supplementary Fig. 1A). Therefore, to manage this diversity, we reclassified these occurrences into six memorability categories, each determined using one-sixth percentiles of the memorability ratings from the data of each participant. To account for the fewer events recalled by participants in contrast to the overall number of events in the movie, we trimmed the nine categories initially utilized for movie occurrences down to six categories for the recalled events.
Initially, we examined the neural data surrounding the commencement of the recall, which spanned from −6 to +14 TRs (21 TRs) (M24,i,664,21, where i signifies the number of recalled events by each participant). Following this, we employed an FIR model to the neural data concerning memorability categories (M24,6,664,21, where 6 indicates the count of memorability categories). A linear regression model was then applied to the FIR time series (M24,664,21).
$${R}_{v,t}left(kright)={beta }_{v,t}left(4right)cdot {{memorability}}_{{recall}}left(kright)+{intercept},$$
(13)
where Rv,t(k) represents the FIR-estimated response of voxel v at time t under the memorability category k. The subsequent methodology for determining retrieval subspaces was the same as that used for the novelty subspace, excluding the variable of interest, and the number of random subspaces varied. While we created a null distribution of 1000 random subspaces for the novelty and memorability subspaces, taking into account the 9! = 362,880 possible combinations from nine categories, only 100 random subspaces were generated for the memory retrieval subspace due to the diminished number of possible combinations, 6! = 720. For comparison, we repeated the identical analysis for the timeframe surrounding the conclusion of recall, extending from −6 to +11 TRs. We measured the explained variance in the three hippocampal subspaces, discovering that 12 PCs were responsible for 80% of the total variance across all subspaces (Supplementary Fig. 3A).
Encoding performance in subspaces
Initially, we projected an averaged FIR time series corresponding to each memory-associated process (Mc,664,21, where c signifies the number of categories) onto separate hippocampal subspaces, producing condition-level hippocampal trajectories (Mc,n,21) (Fig. 4A). While computing the β matrix to construct the memory retrieval subspaces, we first focused on six conditions. However, to maintain consistency in evaluating encoding performances across all subspaces, we broadened the conditions to nine when projecting neural trajectories onto the memory retrieval subspace.
Utilizing projected neural trajectories (Mc,n,21), we created a one-dimensional axis by linking the two furthest neural states, independent of the conditions, and projected other neural states onto this axis for each TR. We posited that if an estimated subspace accurately represents a specific process, the neural states projected onto the coding axis would be systematically organized, illustrating the degree of that process, ranging from low to high. The values of the two most distant states were assigned values of 0 and 1, respectively, and we computed the projected values for the intermediate states. Subsequently, we calculated the R-squared values using an ordinary least squares regression model.
$$hat{S}=beta cdot {coding; regressor}+{intercept},$$
(14)
$${R}^{2}=,frac{sum {({hat{S}}_{i}-bar{S})}^{2}}{sum {({S}_{i}-bar{S})}^{2}},$$
(15)
where S denotes the actual projected values of neural states onto the axis and Ŝ symbolizes the estimated values of these neural states. The R-squared values clarify the extent to which the projected values of neural states could be elucidated by coding regressors for each novelty (e.g., co-occurrence novelty: [−1, − 1, − 1,0,0, + 1, + 1, + 1]; valence novelty: [−1,0, + 1, − 1,0, + 1, − 1,0, + 1]) and memorability ([−4, − 3, − 2, − 1,0, + 1, + 2, + 3, + 4]), inclusive of a constant term.
To evaluate the statistical significance of the encoding performance in the observed subspace, we compared the observed encoding performance with that of random subspaces and counted the number of null encoding performances that surpassed the observed encoding performance. The encoding performances were computed at each TR (Fig. 3Cleft, 4 C, 5 C, 6B) and averaged over 10 TRs prior to or following event boundaries for comparison (Fig. 3Cright, 4D).
Alignment scores between hippocampal subspaces
We characterized the alignment score as the cosine similarity calculated between the eigenvectors of each principal dimension derived through PCA.
$${cosine; similarity}=cos left(theta right)=frac{{E}_{a}cdot {E}_{b}}{{{rm}}{E}_{a}{{rm}}{E}_{b}{{rm}}},$$
(16)
where E signifies an eigenvector of a principal dimension. To assess the statistical validity of subspace alignment, we compared the alignment scores between the observed subspaces (e.g., the co-occurrence and valence novelty subspaces) and those obtained from generated random subspaces. We created a null distribution of 1000 random subspaces for both novelty encoding and memory formation and 100 random subspaces for the memory retrieval process. The statistical significance of the alignment between the memory retrieval subspace and other subspaces was determined by comparing their alignment scores with those between the observed memory retrieval subspace and random subspaces generated for novelty and memorability.
Reporting summary
Additional information on research design is accessible in the Nature Portfolio Reporting Summary linked to this article.
This page was generated programmatically; to read the article in its original location, you can access the link below:
https://www.nature.com/articles/s41467-025-55833-x
and if you wish to remove this article from our site please contact us