OpenStack 02/11/2014 (a.m.)
DARPA seeks the Holy Grail of search engines
DoD announces that they want to go beyond Google. Lots more detail in the proposal description linked from the article. Interesting tidbits: [i] the dark web is a specific target; [ii] they want the ability to crawl web pages blocked by robots.txt; [iii] they want to be able to search page source code and comments.
- - By Paul Merrell
The scientists at DARPA say the current methods of searching the Internet for all manner of information just won't cut it in the future.
Today the agency announced a program that would aim to totally revamp Internet search and "revolutionize the discovery, organization and presentation of search results."
Specifically, the goal of DARPA's Memex program is to develop software that will enable domain-specific indexing of public web content and domain-specific search capabilities. According to the agency the technologies developed in the program will also provide the mechanisms for content discovery, information extraction, information retrieval, user collaboration, and other areas needed to address distributed aggregation, analysis, and presentation of web content.
Memex also aims to produce search results that are more immediately useful to specific domains and tasks, and to improve the ability of military, government and commercial enterprises to find and organize mission-critical publically available information on the Internet.
"The current one-size-fits-all approach to indexing and search of web content limits use to the business case of web-scale commercial providers," the agency stated.
- Limited scope and richness of indexed content, which may not include relevant components of the deep web such as temporary pages, pages behind forms, etc.; an impoverished index, which may not include shared content across pages, normalized content, automatic annotations, content aggregation, analysis, etc.
- Basic search interfaces, where every session is independent, there is no collaboration or history beyond the search term, and nearly exact text input is required; standard practice for interacting with the majority of web content, which remains one-at-a-time manual queries that return federated lists of results.
The Memex program will address the need to move beyond a largely manual process of searching for exact text in a centralized index, including overcoming shortcomings such as:
Memex would ultimately apply to any public domain content; initially, DARPA said it intends to develop Memex to address a key Defense Department mission: fighting human trafficking. Human trafficking is a factor in many types of military, law enforcement and intelligence investigations and has a significant web presence to attract customers. The use of forums, chats, advertisements, job postings, hidden services, etc., continues to enable a growing industry of modern slavery. An index curated for the counter-trafficking domain, along with configurable interfaces for search and analysis, would enable new opportunities to uncover and defeat trafficking enterprises.
DARPA said the Memex program gets its name and inspiration from a hypothetical device described in "As We May Think," a 1945 article for The Atlantic Monthly written by Vannevar Bush, director of the U.S. Office of Scientific Research and Development (OSRD) during World War II. Envisioned as an analog computer to supplement human memory, the memex (a combination of "memory" and "index") would store and automatically cross-reference all of the user's books, records and other information.
This cross-referencing, which Bush called associative indexing, would enable users to quickly and flexibly search huge amounts of information and more efficiently gain insights from it. The memex presaged and encouraged scientists and engineers to create hypertext, the Internet, personal computers, online encyclopedias and other major IT advances of the last seven decades, DARPA stated.
Posted from Diigo. The rest of Open Web group favorite links are here.
No comments:
Post a Comment