Table Of Contents

Previous topic

psychopy.logging - control what gets logged

Next topic

psychopy.misc - miscellaneous routines for converting units etc

This Page

Quick links

psychopy.microphone - Capture and analyze sound

(Available as of version 1.74.00; Advanced features available as of 1.77.00)


AudioCapture() allows easy audio recording and saving of arbitrary sounds to a file (wav format). AudioCapture will likely be replaced entirely by AdvAudioCapture in the near future.

AdvAudioCapture() can do everything AudioCapture does, and also allows onset-marker sound insertion and detection, loudness computation (RMS audio “power”), and lossless file compression (flac). The Builder microphone component now uses AdvAudioCapture by default.

Audio Capture

psychopy.microphone.switchOn(sampleRate=48000, outputDevice=None, bufferSize=None)

You need to switch on the microphone before use, which can take several seconds. The only time you can specify the sample rate (in Hz) is during switchOn().

Considerations on the default sample rate 48kHz:

DVD or video = 48,000
CD-quality   = 44,100 / 24 bit
human hearing: ~15,000 (adult); children & young adult higher
human speech: 100-8,000 (useful for telephone: 100-3,300)
Google speech API: 16,000 or 8,000 only
Nyquist frequency: twice the highest rate, good to oversample a bit

pyo’s downsamp() function can reduce 48,000 to 16,000 in about 0.02s (uses integer steps sizes). So recording at 48kHz will generate high-quality archival data, and permit easy downsampling.

outputDevice, bufferSize: set these parameters on the pyoSndServer
before booting; None means use pyo’s default values
class psychopy.microphone.AdvAudioCapture(name='advMic', filename='', saveDir='', sampletype=0, buffering=16, chnl=0, stereo=True, autoLog=True)

Class extends AudioCapture, plays marker sound as a “start” indicator.

Has method for retrieving the marker onset time from the file, to allow calculation of vocal RT (or other sound-based RT).

See Coder demo > input >


Compress using FLAC (lossless compression).


Return the RMS loudness of the saved recording.


Returns (hz, duration, volume) of the marker sound. Custom markers always return 0 hz (regardless of the sound).

getMarkerOnset(chunk=128, secs=0.5, filename='')

Return (onset, offset) time of the first marker within the first secs of the saved recording.

Has approx ~1.33ms resolution at 48000Hz, chunk=64. Larger chunks can speed up processing times, at a sacrifice of some resolution, e.g., to pre-process long recordings with multiple markers.

If given a filename, it will first set that file as the one to work with, and then try to detect the onset marker.


Plays the current marker sound. This is automatically called at the start of recording, but can be called anytime to insert a marker.

playback(block=True, loops=0, stop=False, log=True)

Plays the saved .wav file, as just recorded or resampled. Execution blocks by default, but can return immediately with block=False.

loops : number of extra repetitions; 0 = play once

stop : True = immediately stop ongoing playback (if there is one), and return

record(sec, filename='', block=False)

Starts recording and plays an onset marker tone just prior to returning. The idea is that the start of the tone in the recording indicates when this method returned, to enable you to sync a known recording onset with other events.

resample(newRate=16000, keep=True, log=True)

Re-sample the saved file to a new rate, return the full path.

Can take several visual frames to resample a 2s recording.

The default values for resample() are for Google-speech, keeping the original (presumably recorded at 48kHz) to archive. A warning is generated if the new rate not an integer factor / multiple of the old rate.

To control anti-aliasing, use pyo.downsamp() or upsamp() directly.


Restores to fresh state, ready to record again


Sets the name of the file to work with.

setMarker(tone=19000, secs=0.015, volume=0.03, log=True)

Sets the onset marker, where tone is either in hz or a custom sound.

The default tone (19000 Hz) is recommended for auto-detection, as being easier to isolate from speech sounds (and so reliable to detect). The default duration and volume are appropriate for a quiet setting such as a lab testing room. A louder volume, longer duration, or both may give better results when recording loud sounds or in noisy environments, and will be auto-detected just fine (even more easily). If the hardware microphone in use is not physically near the speaker hardware, a louder volume is likely to be required.

Custom sounds cannot be auto-detected, but are supported anyway for presentation purposes. E.g., a recording of someone saying “go” or “stop” could be passed as the onset marker.


Interrupt a recording that is in progress; close & keep the file.

Ends the recording before the duration that was initially specified. The same file name is retained, with the same onset time but a shorter duration.

The same recording cannot be resumed after a stop (it is not a pause), but you can start a new one.


Uncompress from FLAC to .wav format.

Speech recognition

Google’s speech to text API is no longer available. AT&T, IBM, and have a similar (paid) service.


Functions for file-oriented Discrete Fourier Transform and RMS computation are also provided.

psychopy.microphone.wav2flac(path, keep=True, level=5)

Lossless compression: convert .wav file (on disk) to .flac format.

If path is a directory name, convert all .wav files in the directory.

keep to retain the original .wav file(s), default True.

level is compression level: 0 is fastest but larger,
8 is slightly smaller but much slower.
psychopy.microphone.flac2wav(path, keep=True)

Uncompress: convert .flac file (on disk) to .wav format (new file).

If path is a directory name, convert all .flac files in the directory.

keep to retain the original .flac file(s), default True.

psychopy.microphone.getDft(data, sampleRate=None, wantPhase=False)

Compute and return magnitudes of numpy.fft.fft() of the data.

If given a sample rate (samples/sec), will return (magn, freq). If wantPhase is True, phase in radians is also returned (magn, freq, phase). data should have power-of-2 samples, or will be truncated.


Compute and return the audio power (“loudness”).

Uses numpy.std() as RMS. std() is same as RMS if the mean is 0, and .wav data should have a mean of 0. Returns an array if given stereo data (RMS computed within-channel).

data can be an array (1D, 2D) or filename; .wav format only. data from .wav files will be normalized to -1..+1 before RMS is computed.