Presenting our paper
I. Fehervari, A. Sobe, W. Elmenreich. Biologically Sound Neural Networks for Embedded Systems Using OpenCL. Proceedings of the International Conference on NETworked sYStems (NETYS 2013), Marrakech, Morocco, Springer 2013.
in the format of a short announcement was an interesting challenge. The task was to get the other researchers to read our paper by only talking about it for 5 minutes. Furthermore, the audience was wide-ranged from all topics of distributed systems. So, I had to introduce spiking neural networks and the motivation for using them on a distributed embedded system before pointing to the approach of implementing them with OpenCL:
Neural networks are widely used in machine learning and many implementations exist to process images, process information, etc. Biologically sound neural networks are more powerful than standard ANN models, because the encoding is done in a spike train, conveying also information in the time domain.
Thus, spiking neural networks have nice properties, but they require significant computing power to emulate them.

Example structure with 10x10x10 neurons. Typical structures are much larger requiring a high number of parallel calculations
For embedded systems, computation is a critical resource. We propose to use OpenCL for massive parallelization of the neural network model. OpenCL is a framework for programming software running on GPUs. But this is not enough, the most complex part comes from updating neurons and the state of the influenced neighbors. We therefore propose a connection model where each neuron is only connected to its neighbors, up to a given hop distance. Using this model we were able to simulate 1 million neurons instead of 100.000 (which is big for usual networks). The performance gain is already excellent, but we even went further.
OpenCL supports local memory for so called task groups and a second-level shared memory for all tasks. Shared memory is slower, therefore, we redesigned the implementation in such a way that it only uses the local memory of OpenCL. This final measure improves the latency well enough to run our system with a high number of neurons on an embedded node such as a robot or a smart camera attached to a drone.







