The use of multimodal (speech plus manual) control of the sensors on combinations of one, two, three or five simulated unmanned vehicles (UVs) is explored. Novice controllers of simulated UVs complete a series of target checking tasks. Two experiments compare speech and gamepad control for one, two, three or five UVs in a simulated environment. Increasing the number of UVs has an impact on subjective rating of workload (measured by NASA-Task Load Index), particularly when moving from one to three UVs. Objective measures of performance showed that the participants tended to issue fewer commands as the number of vehicles increased (when using the gamepad control), but, while performance with a single UV was superior to that of multiple UVs, there was little difference across two, three or five UVs. Participants with low spatial ability (measured by the Object Perspectives Test) showed an increase in time to respond to warnings when controlling five UVs. Combining speech with gamepad control of sensors on UVs leads to superior performance on a secondary (respond-to-warnings) task (implying a reduction in demand) and use of fewer commands on primary (move-sensors and classify-target) tasks (implying more efficient operation). Statement of Relevance: Benefits of multimodal control for unmanned vehicles are demonstrated. When controlling sensors on multiple UVs, participants with low spatial orientation scores have problems. It is proposed that the findings of these studies have implications for selection of UV operators and suggests that future UV workstations could benefit from multimodal control.