We are building a system for retrieving video clips via natural language queries. An automated video analysis system tracks the movements of people in surveillance video and stores tracked data in a database. A natural language interpretation system converts queries such as "from the dining room to the sink" into semantic path filters, which are used to find tracks in the database that match the description.
The system can be used to search large video archives for specific human behaviors and can be adapted to other forms of geospatial data such as GPS logs.