How AI based programming
could work

Benedikt Jenik - July 29, 2016

Preface: this whole thing is (on purpose [1]) written before doing any kind of research into feasibility and whether parts already exist, therefore you should take everything with a grain of salt. While most of it will probably turn out differently, I expect it to be helpful to set a direction for something that could be really interesting and give a few pointers for the way there. So, let's go:

I expect programming to become more declarative and a lot less exact. To elaborate on that, we first need to have a look at how programming works today: you usually have some input or a system state and want to get to a result or another system state. You do this by defining a series of exact steps that gets from the input to the result. Of course, this isn't done directly in system instructions anymore - there are multiple levels of abstraction in between, but all of these abstractions are exactly specified relative to the layer below without any ambiguity. This means we still exactly specify a series of small steps the computer then follows. To summarize: there is the underlying assumption that the only thing a computer should do, is exactly follow a series of steps and if there are none it should do nothing.

There have been attempts to create a way of programming that works without giving exact steps by using a more declarative method involving logic statements and constraints and running deduction or a solver over it. The problem here is that if the program is underspecified deduction may not go anywhere and the solver will give us a lot more in addition to the "correct" / wanted result. If we take a pick it most likely is the wrong one, which brings us to the next underlying assumption: if the program leaves a choice / a degree of freedom it is expected that the computer will not do the right thing.

Of course, these assumptions are perfectly valid and reasonable, at least for the computers we have today, the way we program, and the problems we use them for - and a lot are going to stay. But let's have a look beyond that and turn these assumptions around: what if programming wouldn't involve defining exact steps, but instead just roughly defining what we have and what we want and maybe giving a few hints, and having the computer generally do the right thing - wouldn't that be awesome?

Sounds impossible, right? Not necessarily. Let's first have a look at a few things that have been happening in recent years and after that some ideas that could be worth exploring to get there.

Observations and expectations

The first computers were used for a variety of tasks: as advanced calculators, for managing a company's finances and even for controlling rockets and calculating a route to the moon. All these application areas have one thing in common: there already was one known correct solution to the problem - it just had to be automated because performing the task by hand was too tedious and not accurate and fast enough. In addition, the way these computers were used had one very interesting characteristic: the result wanted from these computers was defined as being the result of that exact calculation. To understand what I'm trying to say with that sentence, we need to jump back to today:

Today, a lot of these calculation tasks still exist, but most use cases have moved up one level: the focus is no longer on getting the result of following a specific set of instructions - today we just have a goal without caring how to get there. It goes even further than that: for a lot of goals like entertaining people, selling as much stuff as possible, or even smaller scale ones like having a usable interface we don't even know the right way to get there. Yet we still try to squeeze this in our exact step by step "we know what to go for and how to get there"-programming model.

When looking at chip development, we also can see some big changes coming up there: transistors are getting so small that we are close to the physical limit of still being able to guarantee that they function correctly all the time. This is extremely at odds with a programming model that relies on exact results and every step being executed correctly to get to the correct result. What if we instead could use and build hardware on which calculations only had to be ballpark correct only most of the times? This could not only solve the scaling issue, it could also enable looking at completely new ways of computing, for example by embedding biological elements.

In addition to changes in requirements and challenges in chip development, systems have to process more and more data that is noisier than ever.

The thing we are looking for has to be able to handle any kind of noise - in the input data and during processing in the form of only ballpark accurate results and occasional errors very well. In addition, it should have a way to figure out how to do "the right thing" without being given the exact steps. One area that is quite suited for that are recent advancements in machine learning, especially neural networks. What if instead of just calling them in some parts of our code, we build a whole computing and programming model on top of them.

What to look at to get there

A little disclaimer: I don't expect neural networks to be the only solution for this kind of problem, they may not even be the right one, but at the moment they are the most promising one. To be able to understand the following parts you should have some knowledge about neural networks.

I believe there are three major areas to work on: the first is programming interfaces, meaning what our code will look like in the new programming model. The second, neural network interfaces, is about internal data representation in neural networks and how to connect them to the outside world / make use of existing technology, and automatic connections between neural networks. And finally I will describe a few things that should be changed in neural networks to enable everything else.

Programming Interfaces

Imagining what code should look like is hard without having something to build on - especially because it greatly depends on the capabilities of the underlying computing model. And yet it is very important, because it gives the rest a target to go for, and defines what would be necessary to get there.

My first intuition would be to go for a combination of a declarative approach (to be able to define what we have, what we want and a few constraints the net has to work around) and component based thinking (with the assumption that there are existing neural network parts that have been trained to do or recognize certain things). Inspiration for the declarative part could come from taking a peek at the syntax of existing declarative languages like Prolog in addition to maybe some ideas from different probabilistic programming approaches.

This will be the part that most likely is wrong (and it definitely goes beyond the capabilities we currently have in the area of neural networks), but I will try to give a first attempt how such code could look - in this case to make some money with a photo site:

Using Web { Domain = "FunnyClickbaitNMore.com" }
Using Content { Source = "./.../images.data" }
Using Ads { Source = "someadserver.com/ads.data" }

net = Net[requestID] {
    Using HTML
    IN Web.Requests[requestID]
    IN Content.AllAvailableElements
    IN Ads.AllAvailableElements
    OUT HTML.Page
}

Web { RequestHandler = net }

Goal = max(Ads.Revenue[requestID])

This code still contains a fair share of plumbing as it makes use of a lot of existing infrastructure. Using loads a component, the part in {} sets the value of a few parameters in the scope of the component. A component would generally consist of two parts: normal code and normal interfaces to work with existing outside infrastructure and neurons a neural network could connect to. These neurons could either be a part of some pre-trained neural network that can be reused, or a part of a neural interface to the normal code - for example the first few layers of an image processing net, if we have images as input.

The Net scope is where the magic happens - it defines the neural network we want to train to solve our task. The scope is there to select the neurons our new neural network should connect to - every neuron group inside this scope is accessible to the neural network. The IN annotation defines the neural interfaces that provide the input data, OUT sets where the result will be read from. In addition, in this example we have a component that is loaded inside the scope of the new neural network. This means it has access to all neurons in there. In this case HTML could for example be a pre-trained image to HTML code network. The image side is obviously not very useful, but since this network is able to generate HTML code, our new net could tap in a few layers above the image and reuse the rest.

Finally, there is a bit of glue code around to wire everything up - [requestID] sets the scope for each invocation of the network. In this case we want to call the network for every web request that happens and have it generate a site to display to the user. In Web.Requests[requestID] this parameter is reused to just give it access to the relevant request - ideally if the request contains the word "car" the net could learn that it would be useful to show pictures and ads of cars.

Goal then defines the reward or error used for updating the neural network.

I have to admit this is quite complicated. Because of this, I prepared a second example that is a bit closer to today's capabilities and therefore easier to understand. In this case we want to train a neural network for binary image segmentation. This means we have an input image and a mask that colors all relevant pixels, for example, all pixels that are part of a car.

input = Using Images { Source = "./.../images.list" }
reference = Using Images { Source = "./.../masks.list" }

Net[id] {
    IN input[id]
    OUT result[id]
}

Goal = min(delta(reference[id], result[id]))

The component Images provides a way of loading a list of images, and a neural interface (which in this case could be similar to an image data layer we know from today's neural network libraries) that gives access to one of the images, specified using [id]. In this case id is an unbound variable, which means it just should have the same value at the same time at all places it is used. The training process will then go over all possible values, meaning all images.

As you might notice, we do not specify the actual layout of the neural network - this should be happening automatically during training. We still would be able to do it in this case, but in the other example before, it would be very hard to come up with anything useful by hand. Automating the layout of a neural network will be one of the more challenging tasks to get there - a few ideas to get started with will be in the following sections.

Neural Network Interfaces

There are two major topics to work on in the area of neural network interfaces: training neural networks to interface with existing tools, and automatic training of interfaces between neural networks.

The interfaces to existing tools or data can be very easy, as in the case of images, and quite complicated, like the imaginary HTML example before. In general, there are two viewpoints from which this task can be started.

The first and currently more popular one is the outside view that is looking at how to feed data into neural networks usually using a very simple encoding like pixel values or character series and how to get the result out again. There already has been significant research for different tasks including images, having neural networks generate and parse text and even have them read code and compute the result.

The second, and in my opinion, far more interesting approach would be to take the internal viewpoint and start thinking about how neural networks could be trained to take advantage of existing tools and access data themselves instead of having it fed. For example, humans do not usually read a text by having it fed character by character to them, instead we try to get larger parts of multiple characters and jump back and forward - maybe this could also be helpful for neural networks. Another example would be writing code - nobody can just write down a series of characters and be done with it. We also jump around there, have auto-complete and even a feedback loop by just running the part we already have - giving a neural network access to these features could also be very helpful.

Automatic training of interfaces between neural networks will be one of the core challenges to get everything to work. One very naive approach could be to just connect everything to everything and then gradually drop connections that only carry an insignificant weight. The downside of this is that it could be very slow, and in addition it does not create an advanced multilayer structure that could be necessary for good results most of the time. Another approach could be starting with nothing and dynamically adding neurons and connections if necessary - the difficulty here is to find out where and when.

Neural Networks Themselves

There are a number of things that could be interesting to work on based on current neural network ideas, even if not working towards this new way of programming. I believe one of the biggest problems is the use of Error Propagation and Gradient Descent. This will more and more be an issue as networks get deeper and more irregular like in this use case. I expect the path forward to be in finding a way to create some kind of locally emergent behavior.

In the meantime, it could be useful to get an idea what the error space of neural networks looks like. To do this, one could collect many different parameter weight configurations and the corresponding error value and find a way to get information out of that. This could either be done by developing a way of mapping the high dimensional feature space down to three dimensions to be able to inspect them visually or at least finding metrics that also work in higher dimensions. One idea (that will most likely not work, but gives an example, what to go for) is building a 3D map by doing something like a PCA on all weight parameters down to 2 values and using the loss or error as a third, the height value. With this one could have a look at the resulting height profile and maybe get find something that the optimal results have in common. The problem with using the PCA probably will be that the optimal and all other values will be all over the place leading to a huge mess. But maybe there is another transformation that would work in this case.

As metrics on higher dimensional parameters, the shape of the valleys applied to local and global minima would be interesting. My wild guess would be that optima that come from overfitting would be in very narrow and steep valleys, while the optima from real generalization would be in wider valleys. Of course, this is probably completely wrong, but it gives you an idea why these kinds of metrics could be useful for building better optimization algorithms.

Another very important goal would be to find a way of training neural networks without defining the layout in advance, but instead have the training process find a suitable one. As a starting point there should be a metric for the usefulness of a single neuron and for a single connection. For the connection the weight, meaning its distance from zero could work, because a low value would mean the information going over this connection has little influence and therefore probably is not very relevant. The usefulness of a neuron could then be defined as the combined usefulness of all its connections, maybe minus or divided by a score for similarity to other neurons in the layer, because of two neurons of identical connections we probably just need one.

The training then could initially start with just one neuron between the input and the output and dynamically adding and removing them until getting close to some threshold of usefulness. A new layer - again with one neuron - could then be created using another metric. Maybe something like if one layer is a lot bigger than both of its neighbors. In the long-term I expect it could be better to move away from the layer structure. Especially when moving towards dedicated hardware instead of float-crunching on GPUs. In addition, some of the more complex layouts may not even be possible with a clean layer structure.

This text is by no means complete - the reason for writing it is to give a few ideas where to start and what to work on, to me and maybe even to you. If you are working on something similar or parts of it or have a few ideas, suggestions or questions feel free to contact me at [email protected]. (I even prepared a nice mailto: link that already pre-fills half of the email so you only have to write the content. You should go for it - just click on the mail address.)