So the last few years has seen impressive performance in machine learning models leveraging deep model processes involving multiple layers of neural networks emerging an ability to highly characterize a target image in the "style" of a given input image to produce an output image that appears as if it were created in an artistic way by the algorithm.
The apps. and filters leveraging these neural networks (convolutional being the ones most effective at this proto creative action) are quickly appearing in various apps.
However, for creating art....particularly creating novel art that is not just the result of a complex mathematical process against a single source and a single target image....such approaches are an utter failure....for example, as an illustrator I can be given two or 3 input images of a given character from different perspectives and on the basis of that small set of input create a wide variety of new images ....of that same character with high degree of verisimilitude.
How?
Where a convolution neural network requires a direct transformation between the target image and the input "style" image in order to create an output which appears to basically be the target as rendered through the style...for creating an entirely new representation of the same character IN a set of images a completely different approach is required.
The machine learning model will have to first be able to extract predictions on the dimensional nature of the character in the images.....if the images don't contain only one character they further need to disambiguate (in a small set this is trivial for us to do but hard as sin for an untrained network to do).
Once the model had a rough understanding of the dimensions of the subject it can then create some arbitrary perspective and then render a novel representation (using it's own desired "style" would make it even harder) and then rendering the output.
Such an improvement in image processing learning models is going to require an ability to take a short input set and create intermediate interpolations that obey dimensional rules of perspective while keeping proportions correct through those perspectives (so coordinate transformation)...it as our brains do it would have to emerge this capability without actually evolving algorithms for doing coordinate transformation in a mathematical sense but to do so the way we do...via an intuitive sense that doesn't rely on active mathematical calculation....further the model would have to find some way of keep in "mind" a chosen perspective long enough to allow rendering it from that "mind" without mixing it with other possible creative outcome.
I think this next level of creative expression in image processing neural networks will require some merging of visual processing and image processing networks as well as tying those together using a short and broad learning super model that can emerge a simple salience landscape that can emerge the option span for perspective and style of rendering at least to get such creativity from a general purpose cognitive model and not a custom architected one like the many that have found success creating mixed (convolved) images. Thus I assert to do this task the cognitive model MUST have a salience loop akin to the one below ....a dynamic cognition cycle for at least the image processing sub cycle of cognition.
Outside of a general purpose solution that leverages a salience loop to solve this problem of novel creativity...there may be a way to perform the same by architecting a complex interaction of networks...but I posit such architectures would be too unwieldy for machine learning researchers to discover the way they've discovered so much of the usefulness of their solutions....by trial and error. The complexity of using a fixed architecture approach is inversely proportional to the generality of the solution produced...it may work but it would be tightly coupled to the designed start problem.
And so with this realization I propose a 4th hypothesis as extension to the Salience Theory of Dynamic Cognition that I posted in 2013.
Dynamic cognition of the kind that will emerge general creative intelligence MUST leverage SABL (shallow and broad learning) entity relations as well as deep learning relations tied together via a salience driven driving process (leveraging autonomic and emotional modulation). AI which does not attempt to replicate efficient SABL cross connection of seemingly disparate deep networks focused on specific sensory dimensional datasets will not emerge neither novel creative nor self aware (conscious) intelligence.
Comments