How to learn

I did competitive gymnastics growing up, and the training was chaos…in a good way. You had to be comfortable flying through the air, twisting, flipping, and hanging upside down. Even one skill, like a giant (spinning in full circles around the bar), had a hundred little steps. You first learn it with your hands strapped to the bar, feeling the motion through your body. Then a coach talks you through it while spotting you, combining physical touch with verbal cues. The whole time, you are watching other gymnasts do it and trying to mimic them, visualizing the move as they do it. You feel how your muscles should move, hear corrections in real time, and see it all come together. It wasn’t just doing: it was watching, feeling, hearing, and imagining.

Simone Biles on uneven bars

Simone Biles flying on the bars (source)

…That’s how you learn anything, I think. By coming at it from different angles. Nobody wakes up one day and can magically do a backflip.

Multisensory learning is the term for learning something with multiple senses at the same time (creative naming, I know). Some examples are learning music by hearing, singing, and playing at the same time or learning giants in gymbastics by listening, looking, and feeling at the same time. Multisensory input doesn’t just help, it’s often how we naturally learn best.

Using multiple senses to synthesize and understand the world actually starts from when we were babies. In a 2000 study by Bahrick and Lickliter, researchers showed 5-month-old babies a video of a hammer tapping out a rhythm. Some babies just watched it (visual only), some just heard it (audio only), and some saw and heard it at the same time (two senses at once). After training with the specified senses, all the groups were tested with visual only. The scientists tracked how long they stared at the video when the rhythm changed, which showed their engagement and understanding.

The visual only and audio only groups were not able to tell when the rhythm changed. For these groups, the babies didn’t get more interested in the screen when the rhythm changed. However, the combined group could tell. So the audio and visual combination wasn’t just helpful, it was actually necessary for the babies to even notice the pattern! I think that this concept applies to learning other things, too. It would be nearly impossible to learn gymnastics without watching someone else do it, for example.

This idea is interesting since it can improve your ability to learn something and also it says something about how our brains work. I am super curious about the best way to learn a specific task. If there is optimal way to learn something and if you can prove that multisensory learning for that thing is faster than the alternative, you are at least getting closer to that optimal way. It’s also fascinating to think about how our brains process information and what it means to learn something. Our different senses are more interrelated then most people think.

A 2006 study by Seitz, Kim, & Shams shows that using audio and visual inputs can help for learning a random task. In the study, adults practiced detecting faint motion on a screen. One group only saw the visuals, while another group saw and heard corresponding moving white noise through a stereo system. The group with both sound and visuals learned the task faster and performed better. Even when the sound was removed later, they still did better. So the sound helped them learn the visual task more deeply.

Here’s a graph from the paper:

Multisensory learning graph from Seitz, Kim, & Shams 2006 Multisensory learning graph from Seitz, Kim, & Shams 2006

This study is interesting since the task itself is nontrivial so it requires the participant to learn over time. And by adding multiple stimuli, it improves the rate of learning.

Using myself as a guinea pig

Although I understand the idea of multisensory learning intuitively, I was curious to see for myself how powerful the effect was. If it is really strong, I should be able to run an experiment on myself and show that multisensory learning actually helps. Right? Disclaimer: this is some janky science. Most research on multisensory learning focuses on perception like noticing motion or detecting a beat. But I wanted to see if it could help with something downstream: actually remembering things. So I designed and tested a multisensory experiment that tested my ability to recall Swahili word translations (I do not know any Swahili). In the end, I found that combining audio and visual made learning this foreign language vocab easier and more enjoyable.

For each test:

  • I selected 10 or 20 random pairs.
  • The app displayed each pair for 3 seconds.
  • In “Visual Only”, the word pair was shown silently.
  • In “Audio + Visual”, the pair was shown and read aloud using the browser’s built-in voice synthesis.
  • After each set, I took a recall test where the Swahili word was shown and I had to type the correct English translation.

I took all tests back to back to minimize variability in mood or energy. During each run, I made sure:

  • Not to say anything out loud.
  • Not to rehearse or repeat previous words in my head.
  • To focus only on the word pair currently on screen.

I also ran the “Audio + Visual” condition first, meaning the “Visual Only” runs came later, when I was more familiar with the task and the vocab. During the test, I found myself making associations between the words using any mental tricks I could come up with… I think that there must be some strategy that would be better for memorizing but I used anything I could think of, such as the sound of the word, some word it sounded like in english, or even the shape of the word.

Multisensory app screenshot

Screenshot from multisensory experiment app I made

If you want to try the test for yourself, click here!

Results

Trial Audio + Visual Visual Only
1 6/10 (60%) 5/10 (50%)
2 8/10 (80%) 6/10 (60%)
3 5/10 (50%) 6/10 (60%)
4 13/20 (65%) 10/20 (50%)
Total 32/50 (64%) 27/20 (54%)

Overall, I did about 10% better with the audio and visual compared to only the visual. There was definitely a lot of randomness throughout the trails so I think this is not a conclusive result, but it points in the right direction.

Some things I noticed while running the tests:

  • Audio helped me read faster. It wasn’t just about hearing the word, the voice made the pair “click” faster in my head. I felt like I could process the info sooner, which effectively gave me more usable time during the 3-second window.

  • “Visual Only” felt more effortful. I had to focus harder and I was more likely to stumble on unfamiliar words.

  • Not all word pairs are equal. Some Swahili words, such as simu (phone), are very close to their English counterparts those were wayeasier to remember. That introduced some randomness across trials.

  • I got slightly tired toward the end, but despite that, the I did better on the longer test with audio than the one without.

This was a lightweight, personal test, not a controlled study, but even a few quick trials showed that layering audio onto visual learning can help. This is also related to my work on Tonely (free app), where I’m exploring how multisensory input—audio, visuals, and haptics—can accelerate music learning like pitch and interval recognition.

Do you think multisensory learning is necessary? Why or why not? How do you best learn things?