Full API documentation: nodes
Filter the input data through the most significatives of its principal components.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transposed of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Full API documentation: PCANode
Whiten the input data by filtering it through the most significatives of its principal components. All output signals have zero mean, unit variance and are decorrelated.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.v
- Transpose of the projection matrix (available after training).
- self.d
- Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Full API documentation: WhiteningNode
Perform Principal Component Analysis using the NIPALS algorithm. This algorithm is particularyl useful if you have more variable than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be unfeasable. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.
Internal variables of interest
- self.avg
- Mean of the input data (available after training).
- self.d
- Variance corresponding to the PCA components.
- self.v
- Transposed of the projection matrix (available after training).
- self.explained_variance
- When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).
More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).
Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).
Full API documentation: NIPALSNode
Perform Independent Component Analysis using the FastICA algorithm. Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode).
Reference: Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
History:
Full API documentation: FastICANode
Perform Independent Component Analysis using the CuBICA algorithm. Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).
Reference: Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: CuBICANode
Perform Independent Component Analysis using the TDSEP algorithm. Note that TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.
Reference: Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).
Internal variables of interest
- self.white
- The whitening node used for preprocessing.
- self.filters
- The ICA filters matrix (this is the transposed of the projection matrix after whitening).
- self.convergence
- The value of the convergence threshold.
Full API documentation: TDSEPNode
Perform Independent Component Analysis using the JADE algorithm. Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
JADE does not support the telescope mode.
Main references:
- Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.
- Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.
Original code contributed by: Gabriel Beckers (2008).
History:
Full API documentation: JADENode
Extract the slowly varying components from the input data. More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Instance variables of interest
- self.avg
- Mean of the input data (available after training)
- self.sf
- Matrix of the SFA filters (available after training)
- self.d
- Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]
Special arguments for constructor
- include_last_sample
If False the train method discards the last sample in every chunk during training when calculating the covariance matrix. The last sample is in this case only used for calculating the covariance matrix of the derivatives. The switch should be set to False if you plan to train with several small chunks. For example we can split a sequence (index is time):
x_1 x_2 x_3 x_4in smaller parts like this:
x_1 x_2 x_2 x_3 x_3 x_4The SFANode will see 3 derivatives for the temporal covariance matrix, and the first 3 points for the spatial covariance matrix. Of course you will need to use a generator that connects the small chunks (the last sample needs to be sent again in the next chunk). If include_last_sample was True, depending on the generator you use, you would either get:
x_1 x_2 x_2 x_3 x_3 x_4in which case the last sample of every chunk would be used twice when calculating the covariance matrix, or:
x_1 x_2 x_3 x_4in which case you loose the derivative between x_3 and x_2.
If you plan to train with a single big chunk leave include_last_sample to the default value, i.e. True.
You can even change this behaviour during training. Just set the corresponding switch in the train method.
Full API documentation: SFANode
Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components. The get_quadratic_form method returns the input-output function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).
Full API documentation: SFA2Node
Perform Independent Slow Feature Analysis on the input data.
Internal variables of interest
- self.RP
- The global rotation-permutation matrix. This is the filter applied on input_data to get output_data
- self.RPC
- The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
- self.covs
A mdp.utils.MultipleCovarianceMatrices instance containing the current time-delayed covariance matrices of the input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal.
self.covs[n-1] is the covariance matrix relative to the n-th time-lag
Note: they are not cleared after convergence. If you need to free some memory, you can safely delete them with:
>>> del self.covs- self.initial_contrast
- A dictionary with the starting contrast and the SFA and ICA parts of it.
- self.final_contrast
- Like the above but after convergence.
Note: If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.
References: Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf
Full API documentation: ISFANode
Perform Non-linear Blind Source Separation using Slow Feature Analysis.
This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).
The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:
>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)
doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info).
If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.
If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:
>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)
this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.
References: Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf
Full API documentation: XSFANode
Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.
FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:
More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.
Internal variables of interest
- self.avg
- Mean of the input data (available after training)
- self.v
- Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).
Full API documentation: FDANode
Perform Factor Analysis.
The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.
The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.
Internal variables of interest
- self.mu
- Mean of the input data (available after training)
- self.A
- Generating weights (available after training)
- self.E_y_mtx
- Weights for Maximum A Posteriori inference
- self.sigma
- Vector of estimated variance of the noise for all input components
More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.
Full API documentation: FANode
Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668
Full API documentation: RBMNode
Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels.
By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input.
Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800
Internal variables of interest:
- self.w
- Generative weights between hidden and observed variables
- self.bv
- bias vector of the observed variables
- self.bh
- bias vector of the hidden variables
For more information on RBMs with labels, see
- Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
- Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
Full API documentation: RBMWithLabelsNode
Learn the topological structure of the input data by building a corresponding graph approximation.
The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.
More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.
Attributes and methods of interest
Full API documentation: GrowingNeuralGasNode
Perform a Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- The LLE projection of the training data (defined when training finishes).
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.
References: Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.
Original code contributed by: Jake VanderPlas, University of Washington,
Full API documentation: LLENode
Perform a Hessian Locally Linear Embedding analysis on the data.
Internal variables of interest
- self.training_projection
- the HLLE projection of the training data (defined when training finishes)
- self.desired_variance
- variance limit used to compute intrinsic dimensionality.
Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.
Original code contributed by: Jake Vanderplas, University of Washington
Full API documentation: HLLENode
Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that:
y_i = b_0 + b_1 x_1 + ... b_N x_N ,
for i = 1 ... M, minimizes the square error given the training x‘s and y‘s.
This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).
Internal variables of interest
- self.beta
- The coefficients of the linear regression
Full API documentation: LinearRegressionNode
Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)
Full API documentation: QuadraticExpansionNode
Perform expansion in a polynomial space.
Full API documentation: PolynomialExpansionNode
Expand input space with Gaussian Radial Basis Functions (RBFs).
The input data is filtered through a set of unnormalized Gaussian filters, i.e.:
y_j = exp(-0.5/s_j * ||x - c_j||^2)
for isotropic RBFs, or more in general:
y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))
for anisotropic RBFs.
Full API documentation: RBFExpansionNode
Expands the input samples by applying to them one or more functions provided.
The functions to be applied are specified by a list [f_0, ..., f_k], where f_i, for 0 <= i <= k, denotes a particular function. The input data given to these functions is a two-dimensional array and the output is another two-dimensional array. The dimensionality of the output should depend only on the dimensionality of the input. Given a two-dimensional input array x, the output of the node is then [f_0(x), ..., f_k(x)], that is, the concatenation of each one of the computed arrays f_i(x).
This node has been designed to facilitate nonlinear, fixed but arbitrary transformations of the data samples within MDP flows.
Example:
>>> import mdp
>>> from mdp import numx
>>> def identity(x): return x
>>> def u3(x): return numx.absolute(x)**3 #A simple nonlinear transformation
>>> def norm2(x): #Computes the norm of each sample returning an Nx1 array
>>> return ((x**2).sum(axis=1)**0.5).reshape((-1,1))
>>> x = numx.array([[-2., 2.], [0.2, 0.3], [0.6, 1.2]])
>>> gen = mdp.nodes.GeneralExpansionNode(funcs=[identity, u3, norm2])
>>> print(gen.execute(x))
>>> [[-2. 2. 8. 8. 2.82842712]
>>> [ 0.2 0.3 0.008 0.027 0.36055513]
>>> [ 0.6 1.2 0.216 1.728 1.34164079]]
Original code contributed by Alberto Escalante.
Full API documentation: GeneralExpansionNode
Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.
- positions of RBFs
- position of the nodes of the neural gas
- sizes of the RBFs
- mean distance to the neighbouring nodes.
Important: Adjust the maximum number of nodes to control the dimension of the expansion.
More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).
Full API documentation: GrowingNeuralGasExpansionNode
Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).
The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.
Attributes and methods of interest
Full API documentation: NeuralGasNode
This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative
Full API documentation: SignumClassifier
A simple perceptron with input_dim input nodes.
Full API documentation: PerceptronClassifier
A simple version of a Markov classifier. It can be trained on a vector of tuples the label being the next element in the testing data.
Full API documentation: SimpleMarkovClassifier
Node for simulating a simple discrete Hopfield model
Full API documentation: DiscreteHopfieldClassifier
Employs K-Means Clustering for a given number of centroids.
Full API documentation: KMeansClassifier
Make input signal meanfree and unit variance
Full API documentation: NormalizeNode
Perform a supervised Gaussian classification.
Given a set of labelled data, the node fits a gaussian distribution to each class.
Full API documentation: GaussianClassifier
Nearest-Mean classifier.
Full API documentation: NearestMeanClassifier
K-Nearest-Neighbour Classifier.
Full API documentation: KNNClassifier
Compute the eta values of the normalized training data.
The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.
The eta value is a more intuitive measure of temporal variation, defined as:
eta(x) = T/(2*pi) * sqrt(delta(x))
If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.
EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.
Reference: Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.
Important: if a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.
Full API documentation: EtaComputerNode
Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.
This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.
Full API documentation: HitParadeNode
Inject multiplicative or additive noise into the input data.
Original code contributed by Mathias Franzius.
Full API documentation: NoiseNode
Special version of NoiseNode for Gaussian additive noise.
Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.
Full API documentation: NormalNoiseNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
X(2) Y(2) X(2) Y(2) X(4) Y(4) X(6) Y(6)
X(3) Y(3) --> X(3) Y(3) X(5) Y(5) X(7) Y(7)
X(4) Y(4) X(4) Y(4) X(6) Y(6) X(8) Y(8)
X(5) Y(5) ... ... ... ... ... ... ]
X(6) Y(6)
X(7) Y(7)
X(8) Y(8)
... ... ]
It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.
Full API documentation: TimeFramesNode
Copy delayed version of the input signal on the space dimensions.
For example, for time_frames=3 and gap=2:
[ X(1) Y(1) [ X(1) Y(1) 0 0 0 0
X(2) Y(2) X(2) Y(2) 0 0 0 0
X(3) Y(3) --> X(3) Y(3) X(1) Y(1) 0 0
X(4) Y(4) X(4) Y(4) X(2) Y(2) 0 0
X(5) Y(5) X(5) Y(5) X(3) Y(3) X(1) Y(1)
X(6) Y(6) ... ... ... ... ... ... ]
X(7) Y(7)
X(8) Y(8)
... ... ]
This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.
See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelayNode
TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.
Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.
Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelaySlidingWindowNode
Node to cut off values at specified bounds.
Works similar to numpy.clip, but also works when only a lower or upper bound is specified.
Full API documentation: CutoffNode
Node which uses the data history during training to learn cutoff values.
As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.
When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.
Full API documentation: AdaptiveCutoffNode
Node which stores a history of the data during its training phase.
The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.
Note that data is only stored during training.
Full API documentation: HistogramNode
Execute returns the input data and the node is not trainable.
This node can be instantiated and is for example useful in complex network layouts.
Full API documentation: IdentityNode
Convolve input data with filter banks.
The filters argument specifies a set of 2D filters that are convolved with the input data during execution. Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.
Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.
This node depends on scipy.
Full API documentation: Convolution2DNode