Episodic memories are rich in sensory information and often contain integrated information from different sensory modalities. For instance, we can store memories of a recent concert with visual and auditory impressions being integrated in one episode. Theta oscillations have recently been implicated in playing a causal role synchronizing and effectively binding the different modalities together in memory. However, an open question is whether momentary fluctuations in theta synchronization predict the likelihood of associative memory formation for multisensory events. To address this question we entrained the visual and auditory cortex at theta frequency (4 Hz) and in a synchronous or asynchronous manner by modulating the luminance and volume of movies and sounds at 4 Hz, with a phase offset at 0° or 180°. EEG activity from human subjects (both sexes) was recorded while they memorized the association between a movie and a sound. Associative memory performance was significantly enhanced in the 0° compared with the 180° condition. Source-level analysis demonstrated that the physical stimuli effectively entrained their respective cortical areas with a corresponding phase offset. The findings suggested a successful replication of a previous study (Clouter et al., 2017). Importantly, the strength of entrainment during encoding correlated with the efficacy of associative memory such that small phase differences between visual and auditory cortex predicted a high likelihood of correct retrieval in a later recall test. These findings suggest that theta oscillations serve a specific function in the episodic memory system: binding the contents of different modalities into coherent memory episodes.