psychopy.voicekey - Real-time sound processing

(Available as of version 1.83.00)


Hardware voice-keys are used to detect and signal acoustic properties in real time, e.g., the onset of a spoken word in word-naming studies. PsychoPy® provides two virtual voice-keys, one for detecting vocal onsets and one for vocal offsets.

All PsychoPy® voice-keys can take their input from a file or from a microphone. Event detection is typically quite similar is both cases.

The base class is very general, and is best thought of as providing a toolkit for developing a wide range of custom voice-keys. It would be possible to develop a set of voice-keys, each optimized for detecting different initial phonemes. Band-pass filtered data and zero-crossing counts are computed in real-time every 2ms.


class psychopy.voicekey.OnsetVoiceKey(sec=0, file_out='', file_in='', **config)[source]

Class for speech onset detection.

Uses bandpass-filtered signal (100-3000Hz). When the voice key trips, the best voice-onset RT estimate is saved as self.event_onset, in sec.


duration to record in seconds


name for output filename (for microphone input)


name of input file for sound source (not microphone)

config: kwargs dict of parameters for configuration. defaults are:

‘msPerChunk’: 2; duration of each real-time analysis chunk, in ms

‘signaler’: default None

‘autosave’: True; False means manual saving to a file is still

possible (by calling .save() but not called automatically upon stopping

‘chnl_in’microphone channel;

see psychopy.sound.backend.get_input_devices()

‘chnl_out’: not implemented; output device to use

‘start’: 0, select section from a file based on (start, stop) time

‘stop’: -1, end of file (default)

‘vol’: 0.99, volume 0..1

‘low’: 100, Hz, low end of bandpass; can vary for M/F speakers

‘high’: 3000, Hz, high end of bandpass

‘threshold’: 10

‘baseline’: 0; 0 = auto-detect; give a non-zero value to use that

‘more_processing’: True; compute more stats per chunk including

bandpass; try False if 32-bit python can’t keep up

‘zero_crossings’: True


Core function to handle a chunk (= a few ms) of input.

There can be small temporal gaps between or within chunks, i.e., slippage. Adjust several parameters until this is small: msPerChunk, and what processing is done within ._process().

A trigger (_chunktrig) signals that _chunktable has been filled and has set _do_chunk as the function to call upon triggering. .play() the trigger again to start recording the next chunk.


Calculate and store basic stats about the current chunk.

This gets called every chunk – keep it efficient, esp 32-bit python


Set self.baseline = rms(silent period) using _baselinetable data.

Called automatically (via pyo trigger) when the baseline table is full. This is better than using chunks (which have gaps between them) or the whole table (which can be very large = slow to work with).


Set remaining defaults, initialize lists to hold summary stats


Set the signaler to be called by trip()


Data source: file_in, array, or microphone


Set up the pyo tables (allocate memory, etc).

One source -> three pyo tables: chunk=short, whole=all, baseline. triggers fill tables from self._source; make triggers in .start()


Trip if recent audio power is greater than the baseline.


Sleep for sec or until end-of-input, and then call stop().

save(ftype='', dtype='int16')

Save new data to file, return the size of the saved file (or None).

The file format is inferred from the filename extension, e.g., flac. This will be overridden by the ftype if one is provided; defaults to wav if nothing else seems reasonable. The optional dtype (e.g., int16) can be any of the sample types supported by pyo.

property slippage

Ratio of the actual (elapsed) time to the ideal time.

Ideal ratio = 1 = sample-perfect acquisition of msPerChunk, without any gaps between or within chunks. 1. / slippage is the proportion of samples contributing to chunk stats.




Start reading and processing audio data from a file or microphone.

property started

Boolean property, whether .start() has been called.


Stop a voice-key in progress.

Ends and saves the recording if using microphone input.


Start, join, and wait until the voice-key trips, or it times out.

Optionally wait for some extra time, plus, before calling stop().

class psychopy.voicekey.OffsetVoiceKey(sec=10, file_out='', file_in='', delay=0.3, **kwargs)[source]

Class to detect the offset of a single-word utterance.

Record and ends the recording after speech offset. When the voice key trips, the best voice-offset RT estimate is saved as self.event_offset, in seconds.

sec: duration of recording in the absence of speech or

other sounds.

delay: extra time to record after speech offset, default 0.3s.

The same methods are available as for class OnsetVoiceKey.

Signal-processing functions

Several utility functions are available for real-time sound analysis.

psychopy.voicekey.smooth(data, win=16, tile=True)[source]

Running smoothed average, via convolution over win window-size.

tile with the mean at start and end by default; otherwise replace with 0.

psychopy.voicekey.bandpass(data, low=80, high=1200, rate=44100, order=6)[source]

Return bandpass filtered data.


Basic audio-power measure: root-mean-square of data.

Identical to std when the mean is zero; faster to compute just rms.


Like rms, but also subtracts the mean (= slower).


Return a vector of length n-1 of zero-crossings within vector data.

1 if the adjacent values switched sign, or 0 if they stayed the same sign.

psychopy.voicekey.tone(freq=440, sec=2, rate=44100, vol=0.99)[source]

Return a np.array suitable for use as a tone (pure sine wave).

psychopy.voicekey.apodize(data, ms=5, rate=44100)[source]

Apply a Hanning window (5ms) to reduce a sound’s ‘click’ onset / offset.

Sound file I/O

Several helper functions are available for converting and saving sound data from several data formats (numpy arrays, pyo tables) and file formats. All file formats that pyo supports are available, including wav, flac for lossless compression. mp3 format is not supported (but you can convert to .wav using another utility).

psychopy.voicekey.samples_from_table(table, start=0, stop=-1, rate=44100)[source]

Return samples as a np.array read from a pyo table.

A (start, stop) selection in seconds may require a non-default rate.

psychopy.voicekey.table_from_samples(samples, start=0, stop=-1, rate=44100)[source]

Return a pyo DataTable constructed from samples.

A (start, stop) selection in seconds may require a non-default rate.

psychopy.voicekey.table_from_file(file_in, start=0, stop=-1)[source]

Read data from files, any pyo format, returns (rate, pyo SndTable)

psychopy.voicekey.samples_from_file(file_in, start=0, stop=-1)[source]

Read data from files, returns tuple (rate, np.array(.float64))

psychopy.voicekey.samples_to_file(samples, rate, file_out, fmt='', dtype='int16')[source]

Write data to file, using requested format or infer from file .ext.

Only integer rate values are supported.


psychopy.voicekey.table_to_file(table, file_out, fmt='', dtype='int16')[source]

Write data to file, using requested format or infer from file .ext.

Back to top