Self Driving Car Tutorial (Part N) : Developing convolutional neural network

I have been talking to a friend about building a self driving car and told him that I can show him how to build it it few short sessions. He seemed interested. So, I decided to write up some tutorials to show how its done.I am starting from the end- because its the one on which I am currently working. This and a lot of interesting stuff is taught in Udacity Self Driving Car Engineer Nanodegree program. Check that out at Udacity.com.


This is how it looks when I used the network (with some max-pooling and dropout layers) for a fully autonomous drive on Udacity simulator:










I am taking the convolutional neural network developed at NVIDIA research (this is a tutorial - so we should take existing research work rather than creating our own) in this tutorial. The paper can be found at NVIDIA Self Driving Car.

Below is how their neural network looks.


I'll go step by step how to build the network. The network is shown in the bottom up structure in the image. At the bottom we provide a 66x200 size image that has 3 color layers (RGB). Then it is normalized. We'll start from the normalized layer. So our input size is 66x200 and depth is 3.

First lets take a camera image.



This image is of size: 160x320. We resize it to 100x200 and crop out top 34 pixels. This can be done using OpenCV like below:

image = cv2.imread("./sample.jpg")
img = cv2.resize(image, (200,100))
crp=img[34:,:]
plt.imshow(crp)



This can be done in the model so that the cropping is done on the GPU:

model.add(Cropping2D(cropping=((68,0),(0,0))))


And we get an image like this with shape 3@66x200:




Now we will use keras to build the neural network. In my setup I am using tensorflow as the keras back end. Lets create a sequential network:

input_shape = (66, 200, 3)
net = Sequential()

Now lets add the normalization layer:

model.add(Lambda(lambda x: x / 255.0 - 0.5, input_shape=(160,320,3)))

From the network image above we need a 5x5 convolutional layer - we'll use ReLU activation which is a function that basically sets all negative values to zero:

layer1 = Convolution2D(24, 5, 5, 
              input_shape=input_shape, border_mode='valid', activation='relu')
net.add(layer1)
#output size = 24@31x94

From network we see that we have 4 more convolutional layers:

net.add(Convolution2D(36, 5, 5, border_mode='valid', activation='relu'))
#output size = 36@14x47

net.add(Convolution2D(48, 5, 5, border_mode='valid', activation='relu'))
#output size = 48@5x22

net.add(Convolution2D(64, 3, 3, border_mode='valid', activation='relu'))
#output size = 64@3x20

net.add(Convolution2D(64, 3, 3, border_mode='valid', activation='relu'))
#output size = 64@1x18

Now we add the flatten layer:

net.add(Flatten())

Simple huh? Now we add a fully connected layer (Dense layer) of size 1156:

net.add(Dense(115))

We then add remaining 4 dense layer as shown in the network image:
net.add(Dense(100))
net.add(Dense(50))
net.add(Dense(10))
net.add(Dense(1))

That's it. We have build our network. Lets compile the network and see the summary of the network to make sure we have done it right:

net.compile(loss='mean_squared_error', optimizer='adam')
net.summary()
Here is the summary output:



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
convolution2d_1 (Convolution2D)  (None, 62, 196, 24)   1824        convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 58, 192, 36)   21636       convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 54, 188, 48)   43248       convolution2d_2[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 52, 186, 64)   27712       convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D)  (None, 50, 184, 64)   36928       convolution2d_4[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 588800)        0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 1156)          680653956   flatten_1[0][0]                  
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 100)           115700      dense_1[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 50)            5050        dense_2[0][0]                    
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 10)            510         dense_3[0][0]                    
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 1)             11          dense_4[0][0]                    
====================================================================================================
Total params: 680,906,575
Trainable params: 680,906,575
Non-trainable params: 0
________________________
Looks good. Now we can generate inputs by driving our car with camera attached and a way to measure steering angle and train the network.

The output of the network is steering angle. So given a new image the network will tell what should be the cars steering angle. With right training the car should be bale to steer a car given there is mechanical / electrical components to steer the wheel.

Now sit back and relax while the car is being driven by the network.


8 comments:

  1. You should add a normalizer, some max pooling and dropouts so it doesn't overfit. And it would lover your parameters number so it would train faster.
    Also try the ELU activators instead the relu they perform a bit better

    ReplyDelete
    Replies
    1. Oh, I appologize then, it makes sense that way. I wanted to add some notes witch gives perfomance boost to that model. Nice writeup by the way! Keep it up :)

      Delete
    2. Thanks. Not sure where did my other comment go :(.

      Delete
    3. Thanks! And you are right. I have used 2x2 max pooling and dropouts in my solution. But for this writing I have matched the NVIDIA architecture exactly to avoid any confusion. I'll try to use ELU to see if it works better. [Copy back old comment that is not being shown]

      Delete
  2. Thanks for that writeup! I'm using the NVIDIA model as well and wanted to double-check my parameters. My car is having some trouble at 3 specific points of the track, so I'll see what I can do. Where did you add your max pooling and dropout layers? I had dropout only after each convolutional layer, but I don't know if that would be correct.

    ReplyDelete
    Replies
    1. I have added only one max-pooling layer after the first convolution layer and dropout after flatten layer. I usded the images from data.zip only and trained the network for 12 hours on AWS GPU instance (20 epoc). After that the basically has memorized the entire track and ran overnight last night without any problem. Here is a recording for about 10 minutes - https://www.youtube.com/watch?v=SmGhT3ol9pg .

      David showed some techniques in the live video- removing top and bottom portion of the image using cropping [model.add(Cropping2D(cropping=((70,25),(0,0))))] and using left and right image as center image by adding and subtracting 0.2 from the steering angle. Instead of 1156 I used 256 on the first dense layer for the final model.

      Delete
  3. Since there are generally endless cars and the expenses for capacity are huge, the administration organizations and banks are completely keen on selling these cars quick and modest! In this way, they closeout everything off. Best GPS Trackers for Car

    ReplyDelete
  4. You simply need to ask from an auto administration focus in your general vicinity on the off chance that they will introduce the tire for nothing on the grounds that there are focuses that introduce tires for nothing as long as you purchase another tire set while you can anticipate a little charge from others. Best Car Subwoofers

    ReplyDelete

Student registration reached 365 in less than a week

As I have posted previously, I asked students to register for Autonomous Vehicle course and within less than a week we have 365 registration...