I have been talking to a friend about building a self driving car and told him that I can show him how to build it it few short sessions. He seemed interested. So, I decided to write up some tutorials to show how its done.I am starting from the end- because its the one on which I am currently working. This and a lot of interesting stuff is taught in Udacity Self Driving Car Engineer Nanodegree program. Check that out at Udacity.com.
This is how it looks when I used the network (with some max-pooling and dropout layers) for a fully autonomous drive on Udacity simulator:
I am taking the convolutional neural network developed at NVIDIA research (this is a tutorial - so we should take existing research work rather than creating our own) in this tutorial. The paper can be found at NVIDIA Self Driving Car.
Below is how their neural network looks.
I'll go step by step how to build the network. The network is shown in the bottom up structure in the image. At the bottom we provide a 66x200 size image that has 3 color layers (RGB). Then it is normalized. We'll start from the normalized layer. So our input size is 66x200 and depth is 3.
First lets take a camera image.
This image is of size: 160x320. We resize it to 100x200 and crop out top 34 pixels. This can be done using OpenCV like below:
image = cv2.imread("./sample.jpg") img = cv2.resize(image, (200,100)) crp=img[34:,:] plt.imshow(crp)
This can be done in the model so that the cropping is done on the GPU:
model.add(Cropping2D(cropping=((68,0),(0,0))))
And we get an image like this with shape 3@66x200:
Now we will use keras to build the neural network. In my setup I am using tensorflow as the keras back end. Lets create a sequential network:
input_shape = (66, 200, 3) net = Sequential()
Now lets add the normalization layer:
model.add(Lambda(lambda x: x / 255.0 - 0.5, input_shape=(160,320,3)))
From the network image above we need a 5x5 convolutional layer - we'll use ReLU activation which is a function that basically sets all negative values to zero:
layer1 = Convolution2D(24, 5, 5, input_shape=input_shape, border_mode='valid', activation='relu') net.add(layer1) #output size = 24@31x94
From network we see that we have 4 more convolutional layers:
net.add(Convolution2D(36, 5, 5, border_mode='valid', activation='relu')) #output size = 36@14x47 net.add(Convolution2D(48, 5, 5, border_mode='valid', activation='relu')) #output size = 48@5x22
net.add(Convolution2D(64, 3, 3, border_mode='valid', activation='relu'))
#output size = 64@3x20
net.add(Convolution2D(64, 3, 3, border_mode='valid', activation='relu'))
#output size = 64@1x18
Now we add the flatten layer:
net.add(Flatten())
Simple huh? Now we add a fully connected layer (Dense layer) of size 1156:
net.add(Dense(115))
We then add remaining 4 dense layer as shown in the network image:
net.add(Dense(100)) net.add(Dense(50)) net.add(Dense(10)) net.add(Dense(1))
That's it. We have build our network. Lets compile the network and see the summary of the network to make sure we have done it right:
net.compile(loss='mean_squared_error', optimizer='adam') net.summary()
Here is the summary output:
____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== convolution2d_1 (Convolution2D) (None, 62, 196, 24) 1824 convolution2d_input_1[0][0] ____________________________________________________________________________________________________ convolution2d_2 (Convolution2D) (None, 58, 192, 36) 21636 convolution2d_1[0][0] ____________________________________________________________________________________________________ convolution2d_3 (Convolution2D) (None, 54, 188, 48) 43248 convolution2d_2[0][0] ____________________________________________________________________________________________________ convolution2d_4 (Convolution2D) (None, 52, 186, 64) 27712 convolution2d_3[0][0] ____________________________________________________________________________________________________ convolution2d_5 (Convolution2D) (None, 50, 184, 64) 36928 convolution2d_4[0][0] ____________________________________________________________________________________________________ flatten_1 (Flatten) (None, 588800) 0 convolution2d_5[0][0] ____________________________________________________________________________________________________ dense_1 (Dense) (None, 1156) 680653956 flatten_1[0][0] ____________________________________________________________________________________________________ dense_2 (Dense) (None, 100) 115700 dense_1[0][0] ____________________________________________________________________________________________________ dense_3 (Dense) (None, 50) 5050 dense_2[0][0] ____________________________________________________________________________________________________ dense_4 (Dense) (None, 10) 510 dense_3[0][0] ____________________________________________________________________________________________________ dense_5 (Dense) (None, 1) 11 dense_4[0][0] ==================================================================================================== Total params: 680,906,575 Trainable params: 680,906,575 Non-trainable params: 0 ________________________
Looks good. Now we can generate inputs by driving our car with camera attached and a way to measure steering angle and train the network.
The output of the network is steering angle. So given a new image the network will tell what should be the cars steering angle. With right training the car should be bale to steer a car given there is mechanical / electrical components to steer the wheel.
Now sit back and relax while the car is being driven by the network.