my story with Deep Learning and Neural Networks — part I
Note: Originally published onBlogger on November 25th, 2015.
I started working on Deep Learning from 2008, about this time of the year. I was lucky enough to have Yann LeCun as mentor, and one of his talented students: Clement Farabet. Together we set up to make convolutional neural network accelerators. It was just before the field was called “Deep Learning”. But let us zoom out a bit more:
Since my PhD, during the years 1998–2004, I was working in the area of Neuromorphic Engineering. I was advised by some of the best in the field: Andreas Andreou, my super-talented PhD Advisor, Ralph-Etienne-Cummings— also a close advisor, Gert Cauwenberghs — a third close colleague, and bit indirectly also by Kwabena Boahen — one of the most famous of the “Neuromorphs”. I was basically standing on the shoulders of giants. I owe them the rest of the my life, interest. And you are reading this memoire because of them!
Anyway, Neuromorphic Engineering is the field that takes inspiration from Biology to create engineered system that can execute all the tasks that biology is best at, like understanding complex data: images, videos, learn, etc.
Biology, or maybe just “neural networks”.
Zooming back even further, during my BS at the University of Trieste Italy, I was studying models of the human visual system. I then learned how complex the human brain is, and I started back then caressing the idea of devoting my life to replicating the human brain into synthetic systems. And it was a good life goal, coupled with the fact that by replicating something, one really has to understand how it all works! During my PhD studies 1998–2004, understanding neural networks became my life goal. At the time I was connected to various luminaries in the field of electrical engineering, robotics, neuroscience, psychology, from the Neuromorphic Engineering international research group. It really widen my interests, to be able to listen to all the scientific problems in all these seemingly unconnected fields.
But later I learned more and more that everything is connected. It is like a giant brain. Like a giant neural network! more about this will be discussed later.
Some of Neuromorphic engineering goals were to replicate the human ability to understand the environment. Humans do this mostly visually, after all vision can extend your “sensing sphere” the furthest, further than touch, or your body, and even further than the sound your ears can sense. But all of our senses gives the incredible ability to survive in the environment. The world humans lived in was a lot less safe only a few thousands years ago… In the years 1998–2004, during my PhD, I worked on many artificial eyes, or special image sensors capable of getting the right visual information at the right time, possibly compressing the enormous amount of visual data.
At the time I was interested in working with the industry, but it all seemed so far away from my goals and from the devices I was creating. Industrial production and research on image sensors of the time were all about pushing better cell-phone cameras. Not unlike today. And it seemed so incremental to only think about adding more pixels and increasing speed. But, yes, it was and it is a revolution. I was glad to be connected to the best researchers in the world innovating image sensors, such as Eric Fossum and Gunhee Han, Jun Ohta, to name a few.
At the time the best neuromorphic image sensors and vision system were created by some of my fellow students and advisors at Johns Hopkins: the talented Jacob Vogelstein, Viktor Gruev, and most notably Shih-Chi Liu and Tobi Delbruck. For example, the high dynamic vision sensor and silicon retina by Tobi was one of the best designed sensors.
But since about 2003, I became interested more and more in “what is in the picture?”, rather than just taking pretty photos, even with our fancy neuromorphic cameras! The reason was it was hard to really squeeze more computation into image sensors. The inevitability of 2D chip manufacturing was keeping us confined in a 2D world. Visionaries like my advisor Andreas knew then that we needed to go 3D, a recent trend in sensors and memories!
At the time, if you wanted to do more in vision, designing better image sensors was really not the way to go. I was also not impressed with Computer Vision in general. I did not like or appreciate trying to hand-engineer solution to all vision problems and trying to break it down into parts like it was historically done by gestalt psychology and math-oriented computer scientists. It all seemed to me it can be summarized in the (in)famous goal of the MIT paper from 1966, to completely reproduce human vision in one summer project. Such goal, to-date, it has proven a difficult task to tackle!
I was instead attracted to the way the human brain solves the problem.
The human visual system — the best vision system in the universe, to my eyes! At least the best to my human eyes, in its ability to perform so many tasks, and feed our on the intelligence of our brains. Often I thought the reason our brain are so developed is mostly because of our visual system, or the co-evolution of both. Here! Food for debate. The human visual system is what lets us move about the environment with ease. I really wanted to reproduce it, to use it in robots and machines, and make computers see in a way similar to how humans see.
This is now the goal of my life. And in order to do that, to delve into the problem of intelligence, of language, and of technical evolution. All these topics seem to fuse together now. I do not know the details, but some futuristic pictures are clear in my mind. More will be told.
Zooming back to 2006–7, one of my research goals was to create a neural system that can reproduce the ability of our vision to recognize objects. Then I was truly inspired by Thomas Serre work and thesis, led by the famous Tomaso Poggio. Their MIT works was a true inspiration, and with Shoushun Chen we worked hard at reproducing some of their model in hardware.
Then, one day day in early 2008, I was talking to our distinguished colleague Bernabe Linares-Barranco, who casually mentioned the work of Yann LeCun. And Yann just moved to NYU, and I was then at Yale, and he kindly agreed to come and present a seminar. And what an inspiration he was. He had the most simple model of neurons an hardware developer was asking for! And ways to train large systems! It had “learning”, a concept that escaped my PhD experience, although my close friends and colleagues Roman Genov and Shantanu Chakrabartty were working on Support-Vector Machines and neural networks back in 1998–2005. It all made sense to me right away.
Learning, simple neuron models, scalable systems, bio-inspired, models of the human visual system!
Since the year 2008, we started to work on artificial hardware based on programmable logic devices (Xilinx FPGA) to implement deep neural network in hardware. We were very lucky to work with Clement Farabet, who introduced us to Torch, and made our lab later become one of the tool developers. Torch made it easy to create neural network and train them in software. It was a big step forward from previous CNN libraries, which were less hackable and understandable.
Over the next few years, from 2008–2011 we worked with Clement Farabet on custom hardware, until he moved to work for his own software company, then acquired in 2014 by Twitter (congratulations!). In 2011 our lab (e-Lab) moved to Purdue university and we started to train a new generation of machine learning experts: Jonghoon Jin, Aysegul Dundar, Alfredo Canziani, and more hardware experts, such as Vinayak Gokhale. Berin Martini continued to perfect the hardware system over the last few years, until Vinayak Gokhale invented a completely new architecture in 2014–2015. Our machine learning experts and myself worked on visual tracking, then in unsupervised learning with Clustering Learning and k-means clustering techniques, and also on compilers for our custom hardware machine. We wrote a lot of software for Torch7, and we maintained demonstration code and software well before Caffe and other tools came into life. In the Summer of 2013, all e-Lab worked together to port our hardware system into Xilinx Zynq devices, true embedded systems combining AMR cores and FPGA fabric. We developed low-level code to talk to the device in Linux Device Drivers, and then implemented an entire hardware system and software compiler. We called this system nn-X and presented it for the first time at NIPS 2014, where our hardware demonstration received a wave of attention and recognition in the media.
At the same time, it was clear we were experts in a key future technology, Deep Learning, and I decided to fund the company Teradeep, in order to commercialize our hardware devices in programmable devices, and also transition them to custom microchips. The grand goal was to make all devices be able to see and perceive the world just like humans do.
The goal was to create a specialized microchip to run neural networks on a battery powered device, like a cell-phone. And on data-center servers, and on systems for autonomous cars.
It was early days, most industry did not know what Deep Learning was, it all just seemed “another one of those algorithms” to them. The vision was well alive in our hearts, but was hard to incept into many company officers. It would daunt them later that they missed the starting gun. — END OF PART I — FOOTNOTE: 1: This is my story. I hope it can help others to get perspective. Through someone else’s experience, if that is even possible.
2: Massive thanks also go to the Office of Naval Research, ONR and Thomas McKenna, who funded a lot of our research activities.
Originally published on Blogger on November 25th, 2015. For more recent commentary, see: https://medium.com/towards-data-science/can-we-copy-the-brain-9ddbff5e0dde
About the author
I have almost 20 years of experience in neural networks in both hardware and software (a rare combination). See about me here: Medium, webpage, Scholar, LinkedIn, and more…
Donations
If you found this article useful, please consider a donation to support more tutorials and blogs. Any contribution can make a difference!