pya
Collection of classes and functions for processing audio signals in python and jupyter notebooks, for synthesis, effects, analysis and plotting.
Subpackages
Submodules
Package Contents
Classes
Audio spectrogram (STFT) class, attributes refers to scipy.signal.stft. With an addition |
|
Mel filtered Fourier spectrum (MFCC) class, |
|
pya audio recorder |
|
Unit Generator for to create Asig with predefined signal |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
Functions
|
Basic version of the plot for pya, this can be directly used |
|
Create a grid plot of pya objects which have plot() methods, |
|
|
|
Linear mapping |
|
Convert midi number into cycle per second |
|
Convert cycle per second into midi number |
|
Convert db to amplitude |
|
Convert amplitude to db |
|
Return spectrum of a given signal. This method return spectrum matrix if input signal is multi-channels. |
|
Return the normalized input array |
|
Load an audio buffer using audioread. |
|
Convert an integer buffer to floating point values. |
Return a formatted string about available audio devices and their info |
|
|
|
|
Pad signal with certain width, support 1-3D tensors. |
|
Check if input is a power of 2 return a bool result. |
|
Find the closest pow of 2 that is great or equal or x, |
|
Round up if >= .5 |
|
|
|
Frame a signal into overlapping frames. |
|
Compute the magnitude spectrum of each frame in frames. |
|
Compute the power spectrum of each frame in frames, |
|
Convert a value in Hertz to Mels |
|
Convert a value in Hertz to Mels |
|
|
|
|
|
- class pya.Asig(sig, sr=44100, label='', channels=1, cn=None)
Audio signal class. Asig enables manipulation of audio signals in the style of numpy and more. Asig offer functions for plotting (via matplotlib) and playing audio (using the pya.Aserver class)
- sig
Array for the audio signal. Can be mono or multichannel.
- Type:
numpy.array
- sr
Sampling rate
- Type:
int
- label
A string label to give the object a unique identifier.
- Type:
str
- channels
Number of channels
- Type:
int
- cn
cn short for channel names is a list of string of size channels, to give each channel a unique name. channel names can be used to subset signal channels in a more readible way, e.g. asig[:, [‘left’, ‘front’]] subsets the left and front channels of the signal.
- Type:
list of str, None
- mix_mode
used to extend numpy __setitem__() operation to frequent audio manipulations such as mixing, extending, boundary, replacing. Current Asig supports the mix_modes: bound, extend, overwrite. mix_mode should not be set directly but is set temporarilty when using the .bound, .extend and .overwrite properties.
- Type:
str or None
- property channels
Return the number of channels
- property samples
Return the length of signal in samples
- property cn
Channel names getter
- property x
this mode allows destination sig size in assignment to be extended through setitem
- Type:
Extend mode
- property b
this mode allows to truncate a source signal in assignment to a limited destination in setitem.
- Type:
Bound mode
- property o
this mode cuts and replaces target selection by source signal on assignment via setitem
- Type:
Overwrite mode
- extend
- bound
- overwrite
- _load_audio_file(fname)
Load audio file, and set self.sig to the signal and self.sr to the sampling rate. Currently support two types of audio loader: 1) Standard library for .wav, .aiff, and ffmpeg for other such as .mp3.
- Parameters:
fname (str) – Path to file.
- save_wavfile(fname='asig.wav', dtype='float32')
Save signal as .wav file, return self.
- Parameters:
fname (str) – name of the file with .wav (Default value = “asig.wav”)
dtype (str) – datatype (Default value = ‘float32’)
- _set_col_names()
- __getitem__(index)
- Accessing array elements through slicing.
int, get signal row asig[4];
slice, range and step slicing asig[4:40:2] # from 4 to 40 every 2 samples;
list, subset rows, asig[[2, 4, 6]] # pick out index 2, 4, 6 as a new asig
tuple, row and column specific slicing, asig[4:40, 3:5] # from 4 to 40, channel 3 and 4
Time slicing (unit in seconds) using dict asig[{1:2.5}, :] creates indexing of 1s to 2.5s.
Channel name slicing: asig[‘l’] returns channel ‘l’ as a new mono asig. asig[[‘front’, ‘rear’]], etc…
bool, subset channels: asig[:, [True, False]]
- Parameters:
index (Number or slice or list or tuple or dict) – Slicing argument.
- Returns:
a – __getitem__ returns a subset of the self based on the slicing.
- Return type:
- __setitem__(index, value)
- setitem: asig[index] = value. This allows all the methods from getitem:
numpy style slicing
string/string_list slicing for subsetting channels based on channel name self.cn
time slicing (unit seconds) via dict.
bool slicing to filter out specific channels.
- In addition, there are 4 possible modes: (referring to asig as ‘dest’, and value as ‘src’
- standard pythonic way that the src und dest dimensions need to match
asig[…] = value
- bound mode where src is copied up to the bounds of dest
asig.b[…] = value
- extend mode where dest is dynamically extended to make space for src
asig.x[…] = value
- overwrite mode where selected dest subset is replaced by specified src regardless the length.
asig.o[…] = value
- row index:
list: e.g. [1,2,3,4,5,6,7,8] or [True, …, False] (modes b and x possible)
int: e.g. 0 (i.e. a single sample, so no need for extra modes)
slice: e.g. 100:5000:2 (can be used with all modes)
dict: e.g. {0.5: 2.5} (modes o, b possible, x only if step==1, or if step==None and stop=None)
- resample(target_sr=44100, rate=1, kind='linear')
Resample signal based on interpolation, can process multichannel signals.
- Parameters:
target_sr (int) – Target sampling rate (Default value = 44100)
rate (float) – Rate to speed up or slow down the audio (Default value = 1)
kind (str) – Type of interpolation (Default value = ‘linear’)
- Returns:
_ – Asig with resampled signal.
- Return type:
- play(rate=1, **kwargs)
Play Asig audio via Aserver, using Aserver.default (if existing) kwargs are propagated to Aserver:play(onset=0, out=0)
- Parameters:
rate (float) – Playback rate (Default value = 1)
**kwargs (str) –
- ‘server’Aserver
Set which server to play. e.g. s = Aserver(); s.boot(); asig.play(server=s)
- Returns:
_ – return self
- Return type:
- shift_channel(shift=0)
- Shift signal to other channels. This is particular useful for assigning a mono signal to a specific channel.
shift = 0: does nothing as the same signal is being routed to the same position
shift > 0: shift channels of self.sig ‘right’, i.e. from [0,..channels-1] to channels [shift,shift+1,…]
shift < 0: shift channels of self.sig ‘left’, i.e. the first shift channels will be discarded.
- Parameters:
shift (int) – shift channel amount (Default value = 0)
- Returns:
_ – Rerouted asig
- Return type:
- mono(blend=None)
Mix channels to mono signal. Perform sig = np.sum(self.sig_copy * blend, axis=1)
- Parameters:
blend (list) – list of gain for each channel as a multiplier. Do nothing if signal is already mono, raise warning (Default value = None)
- Returns:
_ – A mono Asig object
- Return type:
- stereo(blend=None)
Blend all channels of the signal to stereo. Applicable for any single-/ or multi-channel Asig.
- Parameters:
blend (list or None) – Usage: For mono signal, blend=(g1, g2), the mono channel will be broadcated to left, right with g1, g2 gains. For stereo signal, blend=(g1, g2), each channel is gain adjusted by g1, g2. For multichannel: blend = [[list of gains for left channel], [list of gains for right channel]] Default value = None, resulting in equal distribution to left and right channel
Example
- asig[:,[‘c1’,’c2’,’c3’]].stereo[[1, 0.707, 0], [0, 0.707, 1]]
mixes channel ‘c1’ to left, ‘c2’ to center and ‘c3’ to right channel of a new stereo asig. Note that for equal loudness left**2+right**2=1 should be used
- Returns:
_ – A stereo Asig object
- Return type:
- rewire(dic)
Rewire channels to flexibly allow weighted channel permutations.
- Parameters:
dic (dict) – key = tuple of (source channel, destination channel) value = amplitude gain
Example
{(0, 1): 0.2, (5, 0): 0.4}: rewire channel 0 to 1 with gain 0.2, and 5 to 1 with gain 2 leaving other channels unmodified
- Returns:
_ – Asig with rewired channels..
- Return type:
- pan2(pan=0.0)
Stereo panning of asig to a stereo output. Panning is based on constant power panning, see pan below Behavior depends on nr of channels self.channels * multi-channel signals (self.channels>2) are cut back to stereo and treated as * stereo signals (self.channels==2) are channelwise attenuated using cos(angle), sin(angle) * mono signals (self.channels==1) result in stereo output asigs.
- Parameters:
pan (float) – panning between -1. (left) to 1. (right) (Default value = 0.)
- Returns:
_ – Asig
- Return type:
- remove_DC()
remove DC offset
- Parameters:
none –
- Returns:
_ – channelwise DC-free Asig.
- Return type:
- norm(norm=1, in_db=False, dcflag=False)
Normalize signal
- Parameters:
norm (float) – normalize threshold (Default value = 1)
in_db (bool) – Normally, norm takes amplitude, if in_db, norm’s unit is in dB.
dcflag (bool) – If true, remove DC offset (Default value = False)
- Returns:
_ – normalized Asig.
- Return type:
- gain(amp=None, db=None)
Apply gain in amplitude or dB, only use one or the other arguments. Argument can be either a scalar or a list (to apply individual gain to each channel). The method returns a new asig with gain applied.
- Parameters:
amp (float or None) – Amplitude (Default value = None)
db (float or int or None) – Decibel (Default value = None)
- Returns:
_ – Gain adjusted Asig.
- Return type:
- rms(axis=0)
Return signal’s RMS
- Parameters:
axis (int) – Axis to perform np.mean() on (Default value = 0)
- Returns:
_ – RMS value
- Return type:
float
- plot(fn=None, offset=0, scale=1, x_as_time=True, ax=None, xlim=None, ylim=None, **kwargs)
Display signal graph
- Parameters:
fn (func or None) – Keyword or function (Default value = None)
offset (int or float) – Offset each channel to create a stacked view (Default value = 0)
scale (float) – Scale the y value (Default value = 1)
xlim (tuple or list) – x axis range (Default value = None)
ylim (tuple or list) – y axis range (Default value = None)
**kwargs – keyword arguments for matplotlib.pyplot.plot()
- Returns:
_ – self, you can use plt.show() to display the plot.
- Return type:
- get_duration()
Return the duration in second.
- get_times()
Get time stamps for left-edge of sample-and-hold-signal
- __eq__(other)
Check if two asig objects have the same signal. But does not care about sr and others
- __repr__()
Report key attributes
- __mul__(other)
Magic method for multiplying. You can either multiply a scalar or an Asig object. If muliplying an Asig, you don’t always need to have same size arrays as audio signals may different in length. If mix_mode is set to ‘bound’ the size is fixed to respect self. If not, the result will respect to whichever the bigger array is.
- __rmul__(other)
- __truediv__(other)
Magic method for division. You can either divide a scalar or an Asig object. Use division with caution, audio signal is common to reach 0 or near, avoid zero division or extremely large result.
If dividing an Asig, you don’t always need to have same size arrays as audio signals may different in length. If mix_mode is set to ‘bound’ the size is fixed to respect self. If not, the result will respect to whichever the bigger array is.
- __rtruediv__(other)
- __add__(other)
Magic method for adding. You can either add a scalar or an Asig object. If adding an Asig, you don’t always need to have same size arrays as audio signals may different in length. If mix_mode is set to ‘bound’ the size is fixed to respect self. If not, the result will respect to whichever the bigger array is.
- __radd__(other)
- __sub__(other)
Magic method for subtraction. You can either minus a scalar or an Asig object. If subtracting an Asig, you don’t always need to have same size arrays as audio signals may different in length. If mix_mode is set to ‘bound’ the size is fixed to respect self. If not, the result will respect to whichever the bigger array is.
- __rsub__(other)
- find_events(step_dur=0.001, sil_thr=-20, evt_min_dur=0, sil_min_dur=0.1, sil_pad=[0.001, 0.1])
Locate meaningful ‘events’ in the signal and create event list. Onset detection.
- Parameters:
step_dur (float) – duration in seconds of each search step (Default value = 0.001)
sil_thr (int) – silent threshold in dB (Default value = -20)
evt_min_dur (float) – minimum duration to be counted as an event (Default value = 0)
sil_min_dur (float) – minimum duration to be counted as silent (Default value = 0.1)
sil_pad (list) – this allows you to add a small duration before and after the actual found event locations to the event ranges. If it is a list, you can set the padding (Default value = [0.001)
0.1] –
- Returns:
_ – This method returns self. But the list of events can be accessed through self._[‘events’]
- Return type:
- select_event(index=None, onset=None)
This method can be called after find_event (aka onset detection).
- Parameters:
index (int or None) – Index of the event (Default value = None)
onset (int or None) – Onset of the event (Default value = None)
- Returns:
_ – self
- Return type:
- plot_events()
- fade_in(dur=0.1, curve=1)
Fade in the signal at the beginning
- Parameters:
dur (float) – Duration in seconds to fade in (Default value = 0.1)
curve (float) – Curvature of the fader, power of the linspace function. (Default value = 1)
- Returns:
_ – Asig, new asig with the fade in signal
- Return type:
- fade_out(dur=0.1, curve=1)
Fade out the signal at the end
- Parameters:
dur (float) – duration in seconds to fade out (Default value = 0.1)
curve (float) – Curvature of the fader, power of the linspace function. (Default value = 1)
- Returns:
_ – Asig, new asig with the fade out signal
- Return type:
- iirfilter(cutoff_freqs, btype='bandpass', ftype='butter', order=4, filter='lfilter', rp=None, rs=None)
iirfilter based on scipy.signal.iirfilter
- Parameters:
cutoff_freqs (float or [float, float]) – Cutoff frequency or frequencies.
btype (str) – Filter type (Default value = ‘bandpass’)
ftype (str) – Tthe type of IIR filter. e.g. ‘butter’, ‘cheby1’, ‘cheby2’, ‘elip’, ‘bessel’ (Default value = ‘butter’)
order (int) – Filter order (Default value = 4)
filter (str) – The scipy.signal method to call when applying the filter coeffs to the signal. by default it is set to scipy.signal.lfilter (one-dimensional).
rp (float) – For Chebyshev and elliptic filters, provides the maximum ripple in the passband. (dB) (Default value = None)
rs (float) – For Chebyshev and elliptic filters, provides the minimum attenuation in the stop band. (dB) (Default value = None)
- Returns:
_ – new Asig with the filter applied. also you can access b, a coefficients by doing self._[‘b’] and self._[‘a’]
- Return type:
- plot_freqz(worN, **kwargs)
Plot the frequency response of a digital filter. Perform scipy.signal.freqz then plot the response.
TODO :param worN: :param **kwargs:
- envelope(amps, ts=None, curve=1, kind='linear')
Create an envelop and multiply by the signal.
- Parameters:
amps (array) – Amplitude of each breaking point
ts (array) – Indices of each breaking point (Default value = None)
curve (int) – Affecting the curvature of the ramp. (Default value = 1)
kind (str) – The type of interpolation (Default value = ‘linear’)
- Returns:
_ – Returns a new asig with the enveloped applied to its signal array
- Return type:
- adsr(att=0, dec=0.1, sus=0.7, rel=0.1, curve=1, kind='linear')
Create and applied a ADSR evelope to signal.
- Parameters:
att (float) – attack (Default value = 0)
dec (float) – decay (Default value = 0.1)
sus (float) – sustain (Default value = 0.7)
rel (float) – release. (Default value = 0.1)
curve (int) – affecting the curvature of the ramp. (Default value = 1)
kind (str) – The type of interpolation (Default value = ‘linear’)
- Returns:
_ – returns a new asig with the enveloped applied to its signal array
- Return type:
- window(win='triang', **kwargs)
Apply windowing to self.sig
- Parameters:
win (str) – Type of window check scipy.signal.get_window for avaiable types. (Default value = ‘triang’)
**kwargs – keyword arguments for scipy.signal.get_window()
- Returns:
_ – new asig with window applied.
- Return type:
- window_op(nperseg=64, stride=32, win=None, fn='rms', pad='mirror')
TODO add docstring
- Parameters:
nperseg – (Default value = 64)
stride – (Default value = 32)
win – (Default value = None)
fn – (Default value = ‘rms’)
pad – (Default value = ‘mirror’)
- overlap_add(nperseg=64, stride_in=32, stride_out=32, jitter_in=None, jitter_out=None, win=None, pad='mirror')
TODO
- Parameters:
nperseg – (Default value = 64)
stride_in – (Default value = 32)
stride_out – (Default value = 32)
jitter_in – (Default value = None)
jitter_out – (Default value = None)
win – (Default value = None)
pad – (Default value = ‘mirror’)
- to_spec()
Return Aspec object which is the rfft of the signal.
- to_stft(**kwargs)
Return Astft object which is the stft of the signal. Keyword arguments are the arguments for scipy.signal.stft().
- to_mfcc(n_per_frame=None, hopsize=None, nfft=None, window='hann', nfilters=26, ncep=13, ceplifter=22, preemph=0.95, append_energy=True)
Return Amfcc object.
- plot_spectrum(offset=0, scale=1.0, xlim=None, **kwargs)
Plot spectrum of the signal
- Parameters:
offset (float) – If self.sig is multichannels, this will offset each channels to create a stacked view for better viewing (Default value = 0.)
scale (float) – scale the y_axis (Default value = 1.)
xlim (tuple) – range of x_axis (Default value = None)
**kwargs – keywords arguments for matplotlib.pyplot.plot()
- Returns:
_ – self
- Return type:
- spectrogram(*argv, **kvarg)
Perform sicpy.signal.spectrogram and returns: frequencies, array of times, spectrogram
- get_size()
Return signal array shape and duration in seconds.
- append(asig, amp=1)
Apppend an asig with another. Conditions: the appended asig should have the same channels. If appended asig has a different sampling rate, resample it to match the orginal.
- add(sig, pos=None, amp=1, onset=None)
Add a signal
- Parameters:
sig (asig) – Signal to add
pos (int, None) – Postion to add (Default value = None)
amp (float) – Aplitude (Default value = 1)
onset (float or None) – Similar to pos but in time rather sample, given a value to this will overwrite pos (Default value = None)
- Returns:
_ – Asig with the added signal.
- Return type:
- flatten()
Flatten a multidimentional array into a vector using np.ravel()
- pad(width, tail=True, constant_values=0)
Pads the signal
- Parameters:
width (int) – The number of sampels to add to the tail or head of the array.
tail (bool) – By default it is True, if trail pad to the end, else pad to the start.
- Returns:
_ – Asig of the pad signal.
- Return type:
- custom(func, **kwargs)
custom function method. TODO add example
- class pya.Astft(x, sr=None, label=None, window='hann', nperseg=256, noverlap=None, nfft=None, detrend=False, return_onesided=True, boundary='zeros', padded=True, cn=None)
Audio spectrogram (STFT) class, attributes refers to scipy.signal.stft. With an addition attribute cn being the list of channel names, and label being the name of the Asig
- to_sig(**kwargs)
Create signal from stft, i.e. perform istft, kwargs overwrite Astft values for istft
- Parameters:
**kwargs (str) –
- optional keyboard arguments used in istft:
’sr’, ‘window’, ‘nperseg’, ‘noverlap’, ‘nfft’, ‘input_onesided’, ‘boundary’.
also convert ‘sr’ to ‘fs’ since scipy uses ‘fs’ as sampling frequency.
- Returns:
_ – Asig
- Return type:
- plot(fn=lambda x: ..., ax=None, offset=0, scale=1.0, xlim=None, ylim=None, show_bar=True, **kwargs)
Plot spectrogram
- Parameters:
fn (func) – a function, by default is bypass
ch (int or str or None) – By default it is None,
ax (matplotlib.axes) – you can assign your plot to specific axes (Default value = None)
xlim (tuple or list) – x_axis range (Default value = None)
ylim (tuple or list) – y_axis range (Default value = None)
**kwargs – keyward arguments of matplotlib’s pcolormesh
- Returns:
_ – self
- Return type:
- __repr__()
Return repr(self).
- class pya.Aspec(x, sr=44100, label=None, cn=None)
Audio spectrum class using rfft
- get_duration()
Return the duration in second.
- to_sig()
Convert Aspec into Asig
- weight(weights, freqs=None, curve=1, kind='linear')
TODO
- Parameters:
weights –
freqs – (Default value = None)
curve – (Default value = 1)
kind – (Default value = ‘linear’)
- plot(fn=np.abs, ax=None, offset=0, scale=1, xlim=None, ylim=None, **kwargs)
Plot spectrum
- Parameters:
fn (func) – function for processing the rfft spectrum. (Default value = np.abs)
x_as_time (bool, optional) – By default x axis display the time, if faulse display samples
xlim (tuple or list or None) – Set x axis range (Default value = None)
ylim (tuple or list or None) – Set y axis range (Default value = None)
offset (int or float) – This is the absolute value each plot is shift vertically to each other.
scale (float) – Scaling factor of the plot, use in multichannel plotting.
**kwargs – Keyword arguments for matplotlib.pyplot.plot()
- Returns:
_ – self
- Return type:
- __repr__()
Return repr(self).
- class pya.Amfcc(x, sr=None, label='', n_per_frame=None, hopsize=None, nfft=None, window='hann', nfilters=26, ncep=13, ceplifter=22, preemph=0.95, append_energy=True, cn=None)
Mel filtered Fourier spectrum (MFCC) class, this class is inspired by jameslyons/python_speech_features, https://github.com/jameslyons/python_speech_features Steps of mfcc:
Frame the signal into short frames.
For each frame calculate the periodogram estimate of the
power spectrum. * Apply the mel filterbank to the power spectra, sum the energy in each filter. * Take the DCT of the log filterbank energies. * Keep DCT coefficients 2-13, discard the rest. * Take the logarithm of all filterbank energies.
- x
x can be two forms, the most commonly used is an Asig object. Such as directly acquired from an Asig object via Asig.to_stft().
- Type:
Asig or numpy.ndarray
- sr
sampling rate, this is only necessary if x is not Asig.
- Type:
int
- duration
Duration of the signal in second,
- Type:
float
- label
A string label as an identifier.
- Type:
str
- n_per_frame
Number of samples per frame
- Type:
int
- hopsize
Number of samples of each successive frame.
- Type:
int
- nfft
FFT size, default to be next power of 2 integer of n_per_frame
- Type:
int
- window
Type of the window function (Default value=’hann’), use scipy.signal.get_window to return a numpy array. If None, no windowing will be applied.
- Type:
str
- nfilters
The number of mel filters. Default is 26
- Type:
int
- ncep
Number of cepstrum. Default is 13
- Type:
int
- cepliter
Lifter’s cepstral coefficient. Default is 22
- Type:
int
- frames
The original signal being reshape into frame based on n_per_frame and hopsize.
- Type:
numpy.ndarray
- frame_energy
Total power spectrum energy of each frame.
- Type:
numpy.ndarray
- filter_banks
An array of mel filters
- Type:
numpy.ndarray
- cepstra
An array of the MFCC coeffcient, size: nframes x ncep
- Type:
numpy.ndarray
- property nframes
- property timestamp
- property features
The features refer to the cepstra
- __repr__()
Return repr(self).
- static preemphasis(x, coeff=0.97)
Pre-emphasis filter to whiten the spectrum. Pre-emphasis is a way of compensating for the rapid decaying spectrum of speech. Can often skip this step in the cases of music for example
- Parameters:
x (numpy.ndarray) – Signal array
coeff (float) – Preemphasis coefficient. The larger the stronger smoothing and the slower response to change.
- Returns:
_ – The whitened signal.
- Return type:
numpy.ndarray
- static mel_filterbanks(sr, nfilters=26, nfft=512, lowfreq=0, highfreq=None)
Compute a Mel-filterbank. The filters are stored in the rows, the columns correspond to fft bins. The filters are returned as an array of size nfilt * (nfft/2 + 1)
- Parameters:
sr (int) – Sampling rate
nfilters (int) – The number of filters, default 20
nfft (int) – The size of FFT, default 512
lowfreq (int or float) – The lowest band edge of the mel filters, default 0 Hz
highfreq (int or float) – The highest band edge of the mel filters, default sr // 2
- Returns:
_ – A numpy array of size nfilt * (nfft/2 + 1) containing filterbank. Each row holds 1 filter.
- Return type:
numpy.ndarray
- static lifter(cepstra, L=22)
Apply a cepstral lifter the the matrix of cepstra. This has the effect of increasing the magnitude of the high frequency DCT coeffs.
Liftering operation is similar to filtering operation in the frequency domain where a desired quefrency region for analysis is selected by multiplying the whole cepstrum by a rectangular window at the desired position. There are two types of liftering performed, low-time liftering and high-time liftering. Low-time liftering operation is performed to extract the vocal tract characteristics in the quefrency domain and high-time liftering is performed to get the excitation characteristics of the analysis speech frame.
- Parameters:
cepstra (numpy.ndarray) – The matrix of mel-cepstra
L (int) – The liftering coefficient to use. Default is 22, since cepstra usually has 13 elements, 22 L will result almost half pi of sine lift. It essential try to emphasis to lower ceptral coefficient while deemphasize higher ceptral coefficient as they are less discriminative for speech contents.
- plot(cmap='inferno', show_bar=True, offset=0, scale=1.0, xlim=None, ylim=None, x_as_time=True, nxlabel=8, ax=None, **kwargs)
Plot Amfcc.features via matshow, x is frames/time, y is the MFCCs
- Parameters:
figsize ((float, float), optional, default: None) – Figure size, width, height in inches, Default = [6.4, 4.8]
cmap (str) – colormap for matplotlib. Default is ‘inferno’.
show_bar (bool, optional) – Default is True, show colorbar.
x_as_time (bool, optional) – Default is True, show x axis as time or sample index.
nxlabel (int, optional) – The amountt of labels on the x axis. Default is 8 .
- class pya.Aserver(sr=44100, bs=None, device=None, channels=2, backend=None, **kwargs)
Pya audio server Based on pyaudio, works as a FIFO style audio stream pipeline, allowing Asig.play() to send audio segement into the stream.
Examples:
>>> from pya import * >>> ser = Aserver() >>> ser.boot() AServer: sr: 44100, blocksize: ..., Stream Active: True, Device: ... >>> asine = Ugen().sine() >>> asine.play(server=ser) Asig('sine'): 1 x 44100 @ 44100Hz = 1.000s cn=['0']
- property device
- default
- static startup_default_server(**kwargs)
- static shutdown_default_server()
- __repr__()
Return repr(self).
- get_devices(verbose=False)
Return (and optionally print) available input and output device
- set_device(idx, reboot=True)
Set audio device
- Parameters:
idx (int) – Index of the device
reboot (bool) – If true the server will reboot. (Default value = True)
- boot()
boot Aserver = start stream, setting its callback to this callback.
- quit()
Aserver quit server: stop stream and terminate pa
- play(asig, onset=0, out=0, **kwargs)
Dispatch asigs or arrays for given onset.
- _play_callback(in_data, frame_count, time_info, flag)
callback function, called from pastream thread when data needed.
- stop()
- __del__()
- class pya.Arecorder(sr=44100, bs=256, device=None, channels=None, backend=None, **kwargs)
Bases:
pya.Aserver
pya audio recorder Based on pyaudio, uses callbacks to save audio data for pyaudio signals into ASigs
Examples:
>>> from pya import Arecorder >>> import time >>> ar = Arecorder().boot() >>> ar.record() >>> time.sleep(1) >>> ar.stop() >>> print(ar.recordings) [Asig(''): ... x ... @ 44100Hz = ...
- property device
- set_tracks(tracks, gains)
Define the number of track to be recorded and their gains.
- Parameters:
tracks (list or numpy.ndarray) – A list of input channel indices. By default None (record all channels)
gains (list of numpy.ndarray) – A list of gains in decibel. Needs to be same length as tracks.
- reset()
- boot()
boot recorder
- _recorder_callback(in_data, frame_count, time_info, flag)
Callback function during streaming.
- record()
Activate recording
- pause()
Pause the recording, but the record_buffer remains
- stop()
Stop recording, then stores the data from record_buffer into recordings
- __repr__()
Return repr(self).
- class pya.Ugen
Bases:
pya.Asig.Asig
Unit Generator for to create Asig with predefined signal
- sine(freq=440, amp=1.0, dur=None, n_rows=None, sr=44100, channels=1, cn=None, label='sine')
Generate Sine signal Asig object.
- Parameters:
freq (int, float) – signal frequency (Default value = 440)
amp (int, float) – signal amplitude (Default value = 1.0)
dur (int, float) – duration in second. dur and num_samples only use one of the two. (Default value = 1.0)
num_samples (int) – number of sample. dur and num_samples only use one of the two(Default value = None)
sr (int) – sampling rate (Default value = 44100)
channels (int) – number of channels (Default value = 1)
cn (list of string) – channel names as a list. The size needs to match the number of channels (Default value = None)
label (string) – identifier of the object (Default value = “sine”)
- Return type:
Asig object
- cos(freq=440, amp=1.0, dur=None, n_rows=None, sr=44100, channels=1, cn=None, label='cosine')
Generate Cosine signal Asig object.
- Parameters:
freq (int, float) – signal frequency (Default value = 440)
amp (int, float) – signal amplitude (Default value = 1.0)
dur (int, float) – duration in second. dur and num_samples only use one of the two. (Default value = 1.0)
num_samples (int) – number of sample. dur and num_samples only use one of the two(Default value = None)
sr (int) – sampling rate (Default value = 44100)
channels (int) – number of channels (Default value = 1)
cn (list of string) – channel names as a list. The size needs to match the number of channels (Default value = None)
label (string) – identifier of the object (Default value = “cosine”)
- Return type:
Asig object
- square(freq=440, amp=1.0, dur=None, n_rows=None, duty=0.5, sr=44100, sample_shift=0.5, channels=1, cn=None, label='square')
Generate square wave signal Asig object.
- Parameters:
freq (int, float) – signal frequency (Default value = 440)
amp (int, float) – signal amplitude (Default value = 1.0)
dur (int, float) – duration in second. dur and num_samples only use one of the two. (Default value = 1.0)
num_samples (int) – number of sample. dur and num_samples only use one of the two(Default value = None)
duty (float) – duty cycle (Default value = 0.4)
sr (int) – sampling rate (Default value = 44100)
channels (int) – number of channels (Default value = 1)
cn (list of string) – channel names as a list. The size needs to match the number of channels (Default value = None)
label (string) – identifier of the object (Default value = “square”)
- Return type:
Asig object
- sawtooth(freq=440, amp=1.0, dur=None, n_rows=None, width=1.0, sr=44100, channels=1, cn=None, label='sawtooth')
Generate sawtooth wave signal Asig object.
- Parameters:
freq (int, float) – signal frequency (Default value = 440)
amp (int, float) – signal amplitude (Default value = 1.0)
dur (int, float) – duration in second. dur and num_samples only use one of the two. (Default value = 1.0)
num_samples (int) – number of sample. dur and num_samples only use one of the two(Default value = None)
width (float) – tooth width (Default value = 1.0)
sr (int) – sampling rate (Default value = 44100)
channels (int) – number of channels (Default value = 1)
cn (list of string) – channel names as a list. The size needs to match the number of channels (Default value = None)
label (string) – identifier of the object (Default value = “sawtooth”)
- Return type:
Asig object
- noise(type='white', amp=1.0, dur=None, n_rows=None, sr=44100, channels=1, cn=None, label='noise')
Generate noise signal Asig object.
- Parameters:
type (string) – type of noise, currently available: ‘white’ and ‘pink’ (Default value = ‘white’)
amp (int, float) – signal amplitude (Default value = 1.0)
dur (int, float) – duration in second. dur and num_samples only use one of the two. (Default value = 1.0)
num_samples (int) – number of sample. dur and num_samples only use one of the two(Default value = None)
sr (int) – sampling rate (Default value = 44100)
channels (int) – number of channels (Default value = 1)
cn (list of string) – channel names as a list. The size needs to match the number of channels (Default value = None)
label (string) – identifier of the object (Default value = “square”)
- Return type:
Asig object
- pya.basicplot(data, ticks, channels, offset=0, scale=1, cn=None, ax=None, typ='plot', cmap='inferno', xlim=None, ylim=None, xlabel='', ylabel='', show_bar=False, **kwargs)
Basic version of the plot for pya, this can be directly used by Asig. Aspec/Astft/Amfcc will have different extra setting and type.
- Parameters:
data (numpy.ndarray) – data array
channels (int) – number of channels
axis (matplotlib.axes, optional) – Plot image on the matplotlib axis if it was given. Default is None, which use plt.gca()
typ (str, optional) – Plot type.
- pya.gridplot(pya_objects, colwrap=1, cbar_ratio=0.04, figsize=None)
Create a grid plot of pya objects which have plot() methods, i.e. Asig, Aspec, Astft, Amfcc. It takes a list of pya_objects and plot each object into a grid. You can mix different types of plots together.
Examples
# plot all 4 different pya objects in 1 column, amfcc and astft use pcolormesh so colorbar will # be displayed as well gridplot([asig, amfcc, aspec, astft], colwrap=2,
cbar_ratio=0.08, figsize=[10, 10]);
- Parameters:
pya_objects (iterable object) – A list of pya objects with the plot() method.
colwrap (int, optional) – Wrap column at position. Can be considered as the column size. Default is 1, meaning 1 column.
cbar_ratio (float, optional) – For each column create another column reserved for the colorbar. This is the ratio of the width relative to the plot. 0.04 means 4% of the width of the data plot.
figsize (tuple, optional) – width, height of the entire image in inches. Default size is (6.4, 4.8)
- Returns:
fig – The plt.figure() object
- Return type:
plt.figure()
- pya.audio_read(fp)
- exception pya._error
Bases:
Exception
Common base class for all non-exit exceptions.
- pya.linlin(x, smi, sma, dmi, dma)
Linear mapping
- Parameters:
x (float) – input value
smi (float) – input range’s minimum
sma (float) – input range’s maximum
dmi (float) – input range’s minimum
dma –
- Returns:
_ – mapped output
- Return type:
float
- pya.midicps(m)
Convert midi number into cycle per second
- pya.cpsmidi(c)
Convert cycle per second into midi number
- pya.dbamp(db)
Convert db to amplitude
- pya.ampdb(amp)
Convert amplitude to db
- pya.spectrum(sig, samples, channels, sr)
Return spectrum of a given signal. This method return spectrum matrix if input signal is multi-channels.
- Parameters:
sig (numpy.ndarray) – signal array
samples (int) – total amount of samples
channels (int) – signal channels
sr (int) – sampling rate
- Returns:
frq (numpy.ndarray) – frequencies
Y (numpy.ndarray) – FFT of the signal.
- pya.normalize(d)
Return the normalized input array
- pya.audio_from_file(path, dtype=np.float32)
Load an audio buffer using audioread. This loads one block at a time, and then concatenates the results.
- pya.buf_to_float(x, n_bytes=2, dtype=np.float32)
Convert an integer buffer to floating point values. This is primarily useful when loading integer-valued wav data into numpy arrays. .. seealso::
buf_to_float
- Parameters:
x (np.ndarray [dtype=int]) – The integer-valued data buffer
n_bytes (int [1, 2, 4]) – The number of bytes per sample in x
dtype (numeric type) – The target output type (default: 32-bit float)
- Returns:
x_float – The input data buffer cast to floating point
- Return type:
np.ndarray [dtype=float]
- pya.device_info()
Return a formatted string about available audio devices and their info
- pya.find_device(min_input=0, min_output=0)
- pya.padding(x, width, tail=True, constant_values=0)
Pad signal with certain width, support 1-3D tensors. Use it to add silence to a signal TODO: CHECK pad array
- Parameters:
x (np.ndarray) – A numpy array
width (int) – The amount of padding.
tail (bool) – If true pad to the tail, else pad to the start.
constant_values (int or float or None) – The value to be padded, add None will pad nan to the array
- Returns:
_ – Padded array
- Return type:
np.ndarray
- pya.is_pow2(val)
Check if input is a power of 2 return a bool result.
- pya.next_pow2(x)
Find the closest pow of 2 that is great or equal or x, based on shift_bit_length
- Parameters:
x (int) – A positive number
- Returns:
_ – The cloest integer that is greater or equal to input x.
- Return type:
int
- pya.round_half_up(number)
Round up if >= .5
- pya.rolling_window(a, window, step=1)
- pya.signal_to_frame(sig, n_per_frame, frame_step, window=None, stride_trick=True)
Frame a signal into overlapping frames.
- Parameters:
sig (numpy.ndarray) – The audio signal
n_per_frame (int) – Number of samples each frame
frame_step (int) – Number of samples after the start of the previous frame that the next frame should begin.
window (numpy.ndarray or None) – A window array, e.g,
stride_trick (bool) – Use stride trick to compute the rolling window and window multiplication faster
- Returns:
_ – an array of frames.
- Return type:
numpy.ndarray
- pya.magspec(frames, NFFT)
Compute the magnitude spectrum of each frame in frames. If frames is an NxD matrix, output will be Nx(NFFT/2+1).
- Parameters:
frames (numpy.ndarray) – The framed array, each row is a frame, can be just a single frame.
NFFT (int) – FFT length. If NFFT > frame_len, the frames are zero_padded.
- Returns:
_ – If frames is an NxD matrix, output will be Nx(NFFT/2+1). Each row will be the magnitude spectrum of the corresponding frame.
- Return type:
numpy.ndarray
- pya.powspec(frames, NFFT)
Compute the power spectrum of each frame in frames, first comeputer the magnitude spectrum
- Parameters:
frames (numpy.ndarray) – Framed signal, can be just a single frame.
NFFT (int) – The FFT length to use. If NFFT > frame_len, the frames are zero-padded.
- Returns:
_ – Power spectrum of the framed signal. Each row has the size of NFFT / 2 + 1 due to rfft.
- Return type:
numpy array
- pya.hz2mel(hz)
Convert a value in Hertz to Mels
- Parameters:
hz (number of array) – value in Hz, can be an array
Returns –
-------- –
_ (number of array) – value in Mels, same type as the input.
- pya.mel2hz(mel)
Convert a value in Hertz to Mels
- Parameters:
hz (number of array) – value in Hz, can be an array
Returns –
-------- –
_ (number of array) – value in Mels, same type as the input.
- class pya.DummyBackend(dummy_devices=None)
Bases:
pya.backend.base.BackendBase
Helper class that provides a standard way to create an ABC using inheritance.
- dtype = 'float32'
- range = 1
- bs = 256
- get_device_count()
- get_device_info_by_index(idx)
- get_default_input_device_info()
- get_default_output_device_info()
- open(*args, input_flag, output_flag, rate, frames_per_buffer, channels, stream_callback=None, **kwargs)
- process_buffer(buffer)
- terminate()
- class pya.PyAudioBackend(format=pyaudio.paFloat32)
Bases:
pya.backend.base.BackendBase
Helper class that provides a standard way to create an ABC using inheritance.
- _boot_delay = 0.5
- bs = 512
- get_device_count()
- get_device_info_by_index(idx)
- get_default_input_device_info()
- get_default_output_device_info()
- open(rate, channels, input_flag, output_flag, frames_per_buffer, input_device_index=None, output_device_index=None, start=True, input_host_api_specific_stream_info=None, output_host_api_specific_stream_info=None, stream_callback=None)
- process_buffer(buffer)
- terminate()
- pya.determine_backend(force_webaudio=False, port=8765)
- pya.startup(**kwargs)
- pya.shutdown(**kwargs)