New search engines are improving the quality of results by delving deeper into the storehouse of materials available online, by sorting and presenting those results better, and by tracking your long-term interests so that they can refine their handling of new information requests. In the future, search engines will broaden content horizons as well, doing more than simply processing keyword queries typed into a text box. They will be able to automatically take into account your location–letting your wireless PDA, for instance, pinpoint the nearest restaurant when you are traveling. New systems will also find just the right picture faster by matching your sketches to similar shapes. They will even be able to name that half-remembered tune if you hum a few bars.

First, prospective content is identified and collected on an ongoing basis. Special software code called a crawler is used to probe pages published on the Web, retrieve these and linked pages, and aggregate pages in a single location. In the second step, the system counts relevant words and establishes their importance using various statistical techniques. Third, a highly efficient data structure, or tree, is generated from the relevant terms, which associates those terms with specific Web pages. When a user submits a query, it is the completed tree, also known as an index, that is searched and not individual Web pages. The search starts at the root of the index tree, and at every step a branch of the tree (representing many terms and related Web pages) is either followed or eliminated from consideration, reducing the time to search in an exponential fashion.

Much of the digital content today remains inaccessible because many systems hosting (holding and handling) that material do not store Web pages as users normally view them. These resources generate Web pages on demand as users interact with them. Typical crawlers are stumped by these resources and fail to retrieve any content. This keeps a huge amount of information–approximately 500 times the size of the conventional Web, according to some estimates–concealed from users. Efforts are under way to make it as easy to search the " hidden Web" as the visible one.

Good sources of information on personal interests are the records of a user’s Web browsing behavior and other interactions with common applications in their systems. As a person opens, reads, plays, views, prints or shares documents, engines could track his or her activities and employ them to guide searches of particular subjects. This process resembles the implicit search function developed by Microsoft. PowerScout and Watson are the first systems introduced capable of integrating searches with user-interest profiles generated from indirect sources. PowerScout has remained an unreleased laboratory system, but Watson seems to be nearing commercialization. Programmers are now developing more sophisticated software that will collect interaction data over time and then generate and maintain a user profile to predict future interests.