Ford’s Voice-Driven Car May Be Coming Soon

Pretty soon, you might be able to tell your vehicle what to do, and it’ll listen! At least that’s what this new patent that Ford has been granted reveals.

Ford has been granted a patent for self-driving tech that will allow you to control your vehicle with just voice commands. No, not just change the radio station or turn on the seat heaters, but have the vehicle accelerate, change lanes, stop, etc…

In this article, we’ll be discussing Vehicle Language Processing by Ford, US patent 10,957,317. The publication date is March 23^rd, 2021 and the filing date is Oct. 18^th, 2020. This patent has been granted as novel by the USPTO.

Background

In its simplest form, Natural Language Processing (NLP) is a computing method where computers can communicate with humans via natural speech. A person can make a request to their phone, and the phone will respond with a phrase or action based on an algorithm with a confidence score. We do it all the time. The confidence score is a value assigned to an output of a NLP system where the computer says “…I’m 90% sure this is what you’re talking about, so here you go…”.

It’s an extremely complex system that has dramatically improved over the last 20 years. If anyone remembers the late mid 2000s remembers talking to a phone and it does the exact opposite of what you requested. It wasn’t even frustrating, you just never used it again.

Well, those days are long gone. NLP has advanced to a point where we can converse with our electronics almost seamlessly. Now that NLP works as it does, it’s only logical to apply it to a car.

Intro

Ford, along with almost every other car manufacturer, already uses NLP in their cars to do things like adjusting the radio, to call people, or control the climate controls. These are pretty low-risk situations. If your car changes to the wrong radio station, no big deal. But, what if you could control the movement of your car with your voice?

Ford is introducing a system to do just that. You’ll be able to control the brakes, acceleration, steering wheel, etc… with just your voice. Isn’t that a little scary? In addition to controlling your car, Ford is also heavily addressing ambient noise with this system, where the car can filter out the background noise of a driver’s command for a clearer understanding of speech.

Intended Novelty

The intended novelty here relates to the process of understanding natural language. The system uses a vehicle noise model (filter), which is trained to understand voices and any other non-useful noise coming from the car. The system will take a voice command and noise, filter the noise, and supply the voice command with a confidence score, where the confidence score is determined after the vehicle noise model.

Why

Ford dosen’t have an explicit problem statement, but it’s pretty obvious that they’re working on improving voice commands – especially if the voice commands control the car. If I was a betting man, I’d say Ford wanted to use an already-existing speech-to-text system, but it didn’t work well enough to justify operating a car with it. So, they created this noise filter to improve the confidence scores of a command phrase, which may allow the system to actually operate a car.

What

First, the system will only work in an autonomous mode. Ford define their position on what a semi- and fully-autonomous vehicle is:

…an autonomous mode is defined as one in which each of vehicle propulsion (e.g., via a powertrain including an internal combustion engine and/or electric motor), braking, and steering are controlled by one or more vehicle computers; in a semi-autonomous mode the vehicle computer(s) control(s) one or two of vehicle propulsion, braking, and steering. In a non-autonomous vehicle, none of these are controlled by a computer.

It’s no secret cars are going autonomous, and Ford states they already have systems that can control the path of a car based on ‘external environment’ data. The car is designed to detect road information, including other cars and pedestrians. Then, the car can calculate a path to a destination based on the detected information. They do this by creating a ‘path polynomial’. None of this is a surprise.

The reason for this document is the driver can now control the path polynomial with their voice. The system will use a speech-to-text system, then apply a NLP algorithm to determine what you said. Then, the car will act on what you said. Ford offered a concise example:

…spoken language commands such as “turn left”, “speed up”, “slow down”, etc. can be spoken by an occupant. These spoken language commands can be acquired and processed to determine vehicle commands that can be received by a computing device and interpreted to provide information that can be used to direct the operation of the vehicle.

Figure 1 (above) shows the entire system in question. In this figure, we’re not worried about the network, server, or mobile device. We only care about Vehicle 110 and everything in it. The V-to-I interface is a voice-to-interface system. This is the general term for the voice-to-text-to-NLP-to-command system. Notice how this interface is attached to the powertrain, brakes, steering, computer, and sensors. This means you’ll be able to control the engine, brakes, and steering with your voice. Crazy!

Ford also states that the NLP systems can be Apple’s Siri or Amazon’s Alexa. It’s probably advisable for them to use one of these systems because they’re probably the best ones based on the amount of money these companies pour into their NLP systems — though licensing is probably pricey.

The sensors around the car will affect how the car will react to your command. For example, if you say “car, accelerate to 200 mph” and the car knows there’s another car in front of it, your car won’t accelerate based on your stupidity. Ford states the sensors can be:

…altimeters, cameras, LIDAR, radar, ultrasonic sensors, infrared sensors, pressure sensors, accelerometers, gyroscopes, temperature sensors, pressure sensors, hall sensors, optical sensors, voltage sensors, current sensors, mechanical sensors such as switches, etc.

Very interestingly, along with normal sensing data on steering position, brake pressure, etc., the sensors can also sense weather information:

… [the] sensors can detect phenomena such as weather conditions (precipitation, external ambient temperature, etc.), the grade of a road, the location of a road (e.g., using road edges, lane markings, etc.), or locations of target objects such as neighboring vehicles.

So, let’s say you’re on the highway, and let’s say you’ve told the car to change lanes, and the sensors have all their environmental data — what’s next? In this case, the computer will assign a lane, assign a speed, and assign the adjacent lane with an ‘empty or full’ tag. If the lane is empty, the car will change lanes. If not, the car will stay where it is.

“…in traffic scene, vehicle state can include “in lane 204”, “speed=target speed”, and “adjacent lane 206=empty”… the rule-based state machine can output a vehicle command equal to “perform left lane change” to computing device…”

Not only can the car act on your command, but it can also talk back to you. In the example above, if the lane is open, the car will tell you it’s going to change lanes in a certain time frame.

…[the system] can also output a message to an occupant of a vehicle 110, to inform the occupant of a vehicle command…. [for example, the system may response with] a message that reads “left lane change in t seconds”…

In a deeper dive into the idea, Ford will apply a confidence score to the command. This is important because this is where the novelty is. Remember, a confidence score is just a value assigned to a command, where the value says “I’m pretty sure this is what you’re talking about, let’s hope I’m right or we’re both in trouble.”

The table below shows an example phrase with assigned confidence scores. The higher the score, the more confident the car will be to react to your command. And yes, Ford has named their car ‘Henry.’ How fitting!

The interesting part of this particular idea is Ford is also applying a vehicle noise model (really, it’s a filter), where the car will know a voice and know every other noise that shouldn’t be used. It’ll filter out the noise, then provide the confidence score. I can almost guarantee you that this model improved confidence scores by a significant amount, so it’ll be a great component of this system.

In short, the vehicle noise model will detect ambient noise such as road noise, wind noise, other conversations, music, etc., along with the voice, to produce a ‘noisy spoken language command’. The system will then separate the noise and the command, and only supply the command to the NLP system. Pretty cool that they’ve got an idea to solve this issue.

Conclusion

In the end, Ford is developing a system to control your car with your voice and improve voice recognition with a noise filter. It may seem mundane to an outside viewer, but this is an important step toward a usable voice control system for your car. Even more so, if this does end up working well, it’ll probably be applied to every voice command in a car, not just operating commands (if it hasn’t already).

If you’re nervous about this prospect, remember this is Ford we’re talking about. They’re not going to release anything like this unless they’re 100% certain it works 99.9% of the time.

But, let’s also not forget that your commands will probably be logged on Ford’s servers. Everything you say, just like Siri and Alexa, will be saved on their computers. I’m certain it’ll be used to train future systems to operate more accurately, but how do you feel about this?

Even more so, the big question is: would you use this in your car?