The last 5 years have been a great awakening in the space of machine learning and the subset discipline that up to this point had been called AI. When Hinton et al discovered the power of GPU's toward improving the training rate of their neural network models they allowed radical improvements in experimentation using those models while also enabling them to train on vastly larger data sets then were practical (read: cost effective) in the past.
The later innovation of drop out as a means of reducing over fitting in the trained results of such networks allowed older models to be significantly more flexible in avoiding over focus on features in a given data set that made the final trained network too specific to the original trained set.
The extension of neural network models to incorporate GPU acceleration, drop out and multiple layers then enabled the exploration of neural network approaches of all manner of real world training scenarios that model more closely than ever the systems that our biological brains have evolved to learn over time.
A broad analysis of the landscape of problems and training solutions now presents. Neural networks have been made that include DBM's, convolutional and many other types of approaches to the learning process like LSTM recurrent neural networks that have returned state of the art or better than human level performance. Success on various types of data sets be they visual , linguistic or auditory or other data have come from precisely creating architectures around neural networks that optimize for the type of data set and thus allow generalized partially supervised and unsupervised learning.
The reality that the architecture must be chosen shows that models (architecture) are still important beyond the model free (and still mostly mysterious) approaches used by the networks themselves to train and find relationships between features in arbitrary data sets. The wide variety of architectures emerges as a means of solving the underlying feature extraction and categorization problems of a given data set and as such beg a question....are there problem types that are not amenable to solution by the "deep learning" approach now finding so much success?
The answer is immediately obvious when we look at what happens after we've trained a given neural network architecture on a targeted data set....the neural networks trained to recognize objects in images are useless for predicting instruments in music. In order for them to be effective they would require retraining on audio data set as a first step, possible rearchitecture of the neural network itself as a second step. The non full generality of the architecture in this regard reveals an inflexibility...further, in order for any trained network to make inferential connections beyond simple classification there is a need to tie the predictions of two networks together. For example, a recent paper enabled the ability to combine an image recognizing neural network with a language model network to label objects in images and even describe the image by describing the multiple objects that may be present...this is a first step toward the necessary combining of neural networks that can enable reasoning.
The word "carrot", a sequence of 6 letters cognitively emerges the image of a tuber...with an orange appearance...we know this because those 6 letters (as a whole) are mapped to images , tastes , sounds, and smells associated with the object they are related to yet...each of those attributes of what a carrot is are found in disparate networks...the connection between them being simply relational and not deep. "Carrot" doesn't describe any features of carrots yet it *instantly* takes us to a host of attributes that do. "Carrot" is simply a label but one that quickly allows gathering of the attributes and thus emerge immediate recognition of the associated object. The attributes themselves are deeply complex in various ways...our images of carrots vary dramatically in our memory from long to short from fat to thin from more yellow to deep orange ...our built up characterization of how a carrot looks, feels, tastes, smells are all enabled by deep ...multi layered feature extraction in each of those dimensions of sensory import where as the fact that we are dealing with a "carrot" across those attribute sets is a shallow learned comparison.
Thus truly understanding what a carrot is involves both deep and short learning respectively, deep to characterize the features in each sensory dimension that we can experience the carrot object and then a label that links all those sensory dimension associations. I call the latter learning that I believe is critical to allowing high level reasoning to even be possible Shallow and Broad Learning, SABL.
SABL learning models are efficient where deep models are not, a SABL learning algorithm is an efficient hierarchical learner, it spans over vast disparate entity spaces and creates connections to a relatively feature poor entity. For example the 6 letter combination that defines a carrot is feature poor. "carrot" is very different from "parrot" in terms of the associated entity reference....one is a tuber and the other a bird...linguistically speaking we define them as different by changing only a single character...in the linguistic dimension one character is what makes the difference ....in the associated sub tree of features that define each respective object.
To leverage a deep process to characterize this is not only unnecessary it would be inefficient. Several years ago neuroscientists revealed that object in the world were able to be mapped to specific neurons. This revelation was stunning to many in the space...the idea that there was a "Marilyn Monroe" neuron that could by being inhibited prevent us from recalling who she was even after being shown pictures of her....however when I read of this result it only made sense. As entity tagging is a SABL and not a Deep learning problem. I had believed this because I'd already theorized and implemented a SABL learning algorithm in the ADA (action delta assessment) algorithm that was built into the AgilEntity frame work starting in 2011.
ADA spanning the shallow and broad action landscape
ADA was designed as an autonomous extension to the first explicit algorithm created for the Action Oriented Workflow paradigm. In the original implementation, a set of 8 actions discretized all events that could happen to business objects. In the AgilEntity framework the different classes of these objects were generally called Entities. I realized in the original invention of the paradigm that the action landscape enabled a relatively low dimension ability to span all types of business object management requirements for any type of application. Leveraging this would allow businesses to discover and take advantage of a landscape of action, where business users, clients, executives could all interact with business objects and the system could be designed to allow efficient routing of business objects between users in order to optimize completion of the actions that were important to engage against them. Read this article if you are unaware of the AOW paradigm.
ADA is a SABL model because it enables spanning disparate Entities as designed by software engineers to represent software application needs but it is limited only to the depth of Actions which are 8 variants deep. Narrow and deep models we've seen have been designed with far greater numbers of layers of feature extraction....from a a handful to dozens or more depending on the type of dataset that the NN model is being designed to optimally learn. SABL models do not require this level of architectural specificity because they focus on large spans across entities crossed by shallow spans of some orthogonal feature space.
The original conception was to allow manual creation of workflows that would prescribe which users were deemed as agents to help commit actions for given Entity types. The underlying fine grained permissions would then allow objects to bounce around the workflow until they were committed. Incentives for completion of actions were provided by leveraging social oversight to make all action requests visible to all users permitted or subscribed to monitor the life cycle of specific objects or entire Entities of a given time. This was an innovation as when it was completed in late 2004 it was the only such system that was fully web based and did not necessitate that workflow designers knew any programming language. At the time business process applications were heavy on utilizing xml to create workflows and define complex contingencies for requested actions on business objects...so AOW was a complete rejection of this by using action and social oversite. Still , even as I was completing this innovation I realized a way to make the system even more efficient. I wondered if it were possible to make the process of learning about workflows and users autonomous, this emerged the changes that would lead to the development of the ADA algorithm and implementation some 7 years later.
Where AOW in the explicit form required knowing to detail the users involved in stages for workflows... ADA was agnostic...up to the resolution of groups of users contacts. A contact list could contain arbitrary number of users of various capability discovered via social interaction...ADA would be able to leverage the wide ranging permissions of these users to discover new action routing heuristics compared to the original AOW which was restricted only too manually constructed stages. Beyond construction the ADA algorithm calculated a vector called the action delta for how strongly a given user was correlated to completion of a given action for a given business Entity or specific instance of that Entity. The ADA algorithm would thus emerge shallow and broad knowledge across the action landscape of users in various workflows.
Where deep learning models start with the formal construction of a neural architecture for a given dimension of sensory import shallow and broad models like AOW/ADA start with the construction of workflows that opt in users to interaction for purpose of performing actions on business objects. Architectural construction of deeper understanding happens via Entity relationship modeling which for most cases in business object design for applications is not deep and when it is tends to be recursive to the same Entity rather than to different Entities. This is important as recursion to different types implies making deep feature relationships in the same way as deep learning networks do but are more complicated to construct using an SABL approach...thankfully a host of business workflow problems...the vast majority in fact are solved using SABL methods like ADA.
The big advantage of SABL models like AOW/ADA reveal in the variety of business Entities and relationships that define the thousands of software solutions that are constantly being developed across thousands of business verticals. Each creates it's own zoo of relationships between arbitrarily created entities. In a way each new application , depending on it's architectural design is a new data set generating actions unique to that vertical and application. A deep neural network would need to generalize across tens of thousands of entity relationships ...some of them often different only by small degrees making over fitting high...with a SABL approach because Entities are only being related to the degree necessary to enable routing of actions to optimal agents there is a sparse spanning of a very large landscape instead of a deep spanning of only a small subset of a landscape.
As deep learning continues to find much traction in solving problems that up to now had resisted efficient solving using automated means...the full landscape of learning models reveals SABL approaches as also critically important, first to more efficiently solve business workflow problems that are not amenable to solve using deep neural networks but also to realize how tying together multiple deep networks requires SABL subnetworks in order to emerge the higher order reasoning that will truly take the current generation of architectures away from data set specific neural networks and to truly generalized AI.