Tuesday, April 10, 2012

Bucket-brigading neural networks

I've recently been playing around with some Python code to explore a hunch I've had for a couple of years: that you can train a feed-forward neural network by simply indicating whether a output in response to an input was "good" or "bad".

I'd always imagined that I would hook up a small robot with a embedded neural network, giving myself a remote control with a button like this:


The robot would rove around, and whenever it did something "bad" (e.g. ran into a wall that it should have registered on its sensors) I'd press the button and it would train itself using that "bad" input->output pairing - e.g. that "move forward" when the front sonar sensor is registering an obstruction is "bad". I could also have a "good" button if it did something like turn just before a wall, for instance, to reinforce the correct behaviours.

This appealed to me as it was also very similar to how I (attempt to) train our cat...

Yes, that is our cat. No, that was not a training session...
Anyway, I have migrated this hunch to the GitHub repository BadCat. It has taken a few twists and turns along the way, but I have been able to "train" some very elementary neural networks using a simple set of rules based on the original hunch. I ended up taking a few pointers out of genetic algorithms theory just for fun too.

The algorithm works in the following way:

  1. Read the "sensors"
  2. Apply sensor readings to a learning tool (neural network), get the output
  3. Try out the output "in the real world"
  4. If the result of trying out the output is "bad":
    1. Slightly mutate the output
    2. Goto 3 above
  5. Train the network with the resultant (original or mutated) output
The mutation amount increases the longer the output is "bad", based on the assumption that the original output will be close to the desired already, but allowing the output to chance dramatically if the robot is stuck in a new situation. The "good" input->output pairs form part of a fixed length queue of recent memory that is used for regular training.

This approach is similar to the "bucket brigade" rule-reinforcement technique that can be used to train expert systems. It is also not dissimilar to reinforcement learning principles, except that the observation-action-reward mechanism is implicit instead of being explicit - the action is the output generated based on the observation and the weighting of the neural network and the reward (or penalty) is externally sourced and applied to the network only when needed.*

I am looking forward to trying this out a real mobile robot as soon as I can order my Pi and I will keep you up-to-date on how it turns out.

* Oh, and just to be clear, I am not a robotics or AI PHD student and this is not part of a proper academic research paper. It is very likely that what I am doing here has been done before so I make no claim to extraordinary originality or breakthrough genius - just consider this some musings and a pinch of free Python code :)

2 comments:

  1. Excellent post. It’s actually a great post, be grateful you for this brilliant knowledge, I really appreciate it,

    ReplyDelete
  2. While doing so, this model of bogus prada replica likewise shown up already in the market, as well as counterfeited model lv, fake Guru purses and handbags. Jingan To make the earth vogue investment capital. Jing'an Brow spot is usually a small grouping business oriented initiatives within structure work keep attain attractiveness with the Jing. It truly is grasped, Wheelock Area 4-storey gucci replica belts food, deputy bottom will likely be started out along with the extravagance model retail store small business. Until now, in excess of 10 throughout the world well known Below wholesale Fake Purses and handbags intended for completed. But also in check out connected with Wheelock on the entire subject of 11 square measures connected with class Some sort of company gucci replica while using topic centers for being make it possible for 1-2 overall separate flagship retail outlet. Dooney in addition to Bourke (branded soon after Andrew d Dooney in addition to Frederic Bourke) started off the corporation Dooney in addition to Bourke (that is amazing). This company is fashioned within a rather substantial village termed gucci replica, Connecticut with 1975. There're renowned with regards to Dooney Purses and handbags in addition to handbags and wallets. This Dooney Purses and handbags in addition to handbags and wallets usually are largely created from household leather, although here are a few merchandise created from crocodile far too. This Dooney Purses and handbags, together with the different merchandise also come in several colorings in addition to sizing's. By far the most well known Dooney Purses and handbags develop the substantial Debbie intended for Dooney in addition to M intended for Bourke branded everywhere over the louis vuitton replica sale, so that you are not able to taken wrongly the item for most different manufacturers ingenuity. This Dooney Purses and handbags are fashioned with the company gal in your mind. Many people merge an incredible glimpse having efficiency.

    ReplyDelete