By Earnest Cavalli

WIRED Blog Network

A new immersive web platform called Vivaty Scenes lets users create tiny virtual worlds and decorate them with content from around the Internet.

After adding Vivaty Scenes, which entered public beta Tuesday, to a Facebook or AOL Instant Messenger account, users can set up a customizable “room” where they can host chat sessions or small virtual gatherings within a web browser.

Read full article

It’s a Friday afternoon, and I’d like to clean up my desktop with a list o’ links I’ve found interesting over the past few weeks:

The Personal Bee aggregator for VC & Startup news. Feels a bit like some of the concepts behind Newroo/Fox. Here are several stories I got to from there:

  • the MIT/Lemelson Prize for inventors goes to an LCD pioneer from Menlo Park.
  • Sequoia, via Kedrosky:

    E&Y: Why is it crazy that LPs are willing to invest so much in venture capital?

    Leone: The returns have been miserable. If you take away a couple of exits, such as Google and MySpace, there haven’t been meaningful returns generated. There are [venture] firms that have never generated a positive return or have not even returned capital in 10 years that are raising money successfully. And that surprises the heck out of me. People talk about the top quartile — its not about the top quartile, it’s barely about the top decile, or even a smaller subset than that.

  • Khosla Ventures actually does still invest in computer-related stuff, not just the cool new life- and green- sciences, from this BusinessWeek blogpost by Justin Hibbard:

    one of his inaugural portfolio companies was SkyBlue Technologies, Inc. The Redwood City (Calif.) startup was founded a year ago by Stanford U. computer science professor Monica S. Lam and her fellow researchers, who are developing open-source virtualization software that lets systems administrators remotely manage PCs. Traditionally, companies have used programs like CA’s Unicenter or HP OpenView for this task. Virtualization sacrifices some performance to keep the management program running independently from the PC operating system, which can become unstable. It’s a clever use of an under-exploited technology that has had a recent resurgence on server computers and has produced at least one recent hit startup, VMWare. SkyBlue calls its class of software ready-to-run (R2R) and has launched a portal site, itCasting, to promote collaboration on R2R software. William J. Raduchel, CEO of Ruckus Network and former CTO at AOL Time Warner, is on SkyBlue’s board. The company raised $1 million last August and $2.26 million from Khosla and others in March.

    [In other recent manageability news, Intel announced vPro, a desktop featureset hopefully-analgous to Centrino, raising the possibility of yet-more feature wars such as XML processing smarts on the server side. ]

The New York Times recently had a piece on academics investigating the IBM-sponsored “services science” field

ComputerWorld’s Gary Anthes, a dedicated reporter on the research-and-innovation beat, wrote a piece on the looming anniversaries of the oldest CS departments.:

  • John Canny, chairman of the electrical engineering and computer science department, University of California, Berkeley: “Computers aren’t very valuable yet, because the applications they perform are still elementary and routine. It’s actually remarkable how much we spend on IT, considering how little it does. The most widespread applications are still e-mail and Microsoft Office. That should tell us something.

    What we really need to be thinking about is what people are doing with computers and how we could help them to do those things much better. Since most people are doing knowledge tasks, that means machines understanding their owners’ work processes much more deeply, finding semantically appropriate resources with or without being asked, critiquing choices and suggesting better ones, and tracking synergies with other groups within a large organization. Computers will leverage the human resources in the company more at a knowledge level. They will directly tie what they do to the creative processes of employees. The economic impact of that would be much bigger than anything we have seen so far. ”

  • Jaime Carbonell, director of the Language Technologies Institute, Carnegie Mellon University: “Artificial intelligence. Although those words may be somewhat out of fashion these days, much of the deep excitement and universally useful apps descend therefrom. For example: speech understanding and synthesis in handheld devices, in cars, in laptops; machine translation of text and spoken language; new search engines that find what you want, not just Web pages that contain query words; self-healing software, including adaptive networks that reconfigure for reliability; robotics for mine safety, planetary exploration; prosthetics for medical/nursing care and manufacturing; game theory for electronic commerce, auctions and their design to ensure fairness and market liquidity and maximize aggregate social wealth.”
  • Bernard Chazelle, professor of computer science, Princeton University: I roll my eyes when I hear students say, “CS is boring, so I’ll go into finance.” Do they know how dull it is to spend all-nighters running the numbers for a merger-and-acquisition deal? No.
  • Canny: We’re losing in quality — principally to bioengineering, which is now the best students’ top choice — and diversity. It’s a problem of social relevance. Minorities and women moved fastest into areas such as law and medicine that have obvious and compelling social impact. We’ve never cared much about social impact in CS.
  • Chazelle: Much of the curriculum is antiquated. Why are we still demanding fluency in assembly language today for our CS majors? Some curricula seem built almost entirely around the mastery of Java. This is criminal.

    The curriculum is changing to fulfill the true promise of CS, which is to provide a conceptual framework for other fields. Students need to understand there’s more, vastly more, to CS than writing the next version of Windows. For example, at Princeton, we have people who major in CS because they want to do life sciences or policy work related to security, or even high-tech music. In all three cases, we offer tracks that allow them to acquire the technical background to make them intellectually equipped to pursue these cross-disciplinary activities at the highest level.

  • Carbonell: CS needs a great communicator who lives the excitement, is deeply respected by his or her peers, and can reach out and communicate clearly with any educated person via his books. We have no such person in CS. Perhaps Raj Reddy [a Carnegie Mellon computer science professor] has the right kind of talents.

Finally, please don’t miss Bill Burnham’s excellent survey of opportunites to push ‘persistent search’ forward.

This afternoon, PubSub and Broadband Mechanics are announcing a “structured blogging initiative” at the Syndicate conference. The press release even includes a quote in support from us here at CommerceNet:

CommerceNet believes strongly in the vision of bootstrapping a more intelligent Web by embedding semi-structured information with easy-to-author techniques like microformats. Through our own research in developing tools for finding, sharing, indexing, and broadcasting microformatted data, we appreciate the challenges these companies have overcome to offer tools that will interoperate as widely as possible. We applaud their recent decision to support the community in all of the core areas where commonly accepted schemas already exist, such as calendar entries, contact information, and reviews.

Given that we’re strong supporters of, why did we take this stand? First and foremost, for the reasons stated above: because they’re committing to shipping tools that make it easier to produce microcontent using microformats. Even if they were supporting any number of other formats, we’d be glad to welcome any new implementations to the fold.

Of course, we’d prefer to minimize any confusion, too. Many other implementations exist for microformats and are copiously documented and discussed in public forums at Clearly, the (re-)launch of a public .org site titled StructuredBlogging with aspirations to non-profit status of its own could lead to perceptions that there’s some sort of “vs.” battle going on.

That might even have been true, a few months ago when the idea-of-structured-blogging was still conflated with a debatable proposal for structured-blogging-the-format that hid chunks of isolated XML within otherwise readable documents using a <SCRIPT> tag. The major news here today that we’d like to celebrate is that they’re in favor of using microformats for all of their core, commonly-used schemas like reviews, events, and lists.

Now, is the old format still in their code tree when you grab their alpha plugin? Sure, and there will always be room for developers who really, really want to cons up their own schema out of thin air. The microformats-rest mailing list is grappling with the same problem, focusing on XOXO as a solution for now.

The more intriguing implication of their work at is their microcontent description (MCD) format — even if it’s all hReview at the bottom, there’s room for custom UIs for reviewing movies that are different from reviewing restaurants, and we’ll see if that’s where these explorations lead to…

The dramatic saga of Commerce One’s fundamental Web Services patents has apparently concluded with a happy-enough ending. Based on pioneering work by Robert Glushko and Marty Tenenbaum, among many others, this patent portfolio began with work spun out of CommerceNet (the nonprofit consortium) as Veo Systems, Inc.

Last year, there was a court-ordered auction of the bankrupt company’s patents to a then-unidentified high bidder who some feared would begin to use the patents to shake down the nascent Web Services industry. In time, it appeared that Novell acquired those rights, and this week, a new consortium “of five technology and consumer electronics companies – I.B.M., Novell, Philips, Red Hat and Sony – who share an interest in promoting the spread and adoption of the free Linux operating system” named Open Invention Network made them available under royalty-free licenses.

Company to Start Offering Free Use of Patents It Holds
November 10, 2005

A new patent-holding company, the Open Invention Network, is expected to begin operations today with the unusual business plan of buying certain patents and licensing them without charge.

The company has the financial backing of five technology and consumer electronics companies – I.B.M., Novell, Philips, Red Hat and Sony – who share an interest in promoting the spread and adoption of the free Linux operating system.

The chief executive of the Open Invention Network, Gerald Rosenthal, is a lawyer and a former director of I.B.M.’s intellectual property licensing program.

At I.B.M., Mr. Rosenthal led the lucrative technology-licensing program, which has routinely earned $1 billion or more in recent years. He will be pursuing a different strategy at the new company.

“By itself, this is not a money-making enterprise,” he said. “Our goal is to enable the Linux ecosystem to grow.”

As users or distributors of the free operating system, the five corporate supporters of the Open Invention Network all have a vested interest in fending off threats to Linux.

Legal challenges to Linux users have already surfaced, but none have slowed the adoption of Linux, which is used everywhere from corporate data centers to consumer devices like digital music players. Yet the legal risk, analysts say, is an uncertainty in the outlook for Linux.

In March 2003, I.B.M. was sued by the SCO Group, a Utah company, which said that I.B.M. had illegally contributed code to Linux from the Unix operating system. SCO had obtained the licensing rights to the Unix software and contends that Linux, a variant of Unix, violates its rights. SCO is seeking $1 billion in damages. I.B.M. denies the charges in the case, which is pending.

Another worry for Linux users is the rise of specialized intellectual property firms that acquire software patents, and then make money by licensing the patents as widely as possible.

That concern arose when the patents of a bankrupt dot-com company, Commerce One, were auctioned in December for $15.6 million.

Computing specialists worried that the patents dealt with technology that was broadly used in Internet commerce, and, if aggressively enforced, could result in many companies having to pay license fees. The winning bidder, however, was later identified as a lawyer representing JGR, a subsidiary of Novell.

Novell, a Linux distributor, is placing those patents in the new portfolio of the Open Invention Network, based in Pound Ridge, N.Y.

Patents owned by the Open Invention Network will be available free to any company, organization or individual that agrees not to assert its patents against others who have signed a license with the new patent-holding company. The Open Invention Network will continue to identify and acquire patents related to Linux.

Mr. Rosenthal sees the new company as a guardian of innovation in an environment for technology development.

“If you look at the Internet and Linux,” he said, “a lot of it has been the result of collaborative work without anyone really owning the intellectual property.”

Z Lab

The Z Lab is the research Centre of Z Productions. Since its move from Marseille to Cardiff (UK) in September 1995, the research and development programme focused on the co-evolution of humans and machines.

The machines are presented in art exhibitions, live performances and videos.

Sexy Robots in Venice | Gridskipper

That’s right — these two robots are doing just what you think they’re doing. And I’m showing it! To the children! This is Venice, where some robots fuck and others simply weep. Didn’t somebody say that in Death in Venice? No, well, as the Venice Biennale proves, perhaps someone should have. Specifically, the Welsh pavilion at the biennale features an installation by artist Paul Granjon called “Robotarium.” In the exhibit, two ‘sexed’ robots wander around until they go into ‘heat,’ at which point they attempt to locate each other and engage in what passes for lusty intercourse among the Roomba set: The male then starts moving his penis while the female adjusts her position to facilitate the operation. The robots emit various sounds during the mating cycle.

[print version] Academia’s quest for the ultimate search tool | CNET

Academia’s quest for the ultimate search tool

By Stefanie Olsen quest for the ultimate search tool/2100-1038_3-5831050.html

Story last modified Mon Aug 15 04:00:00 PDT 2005

The University of California at Berkeley is creating an interdisciplinary center for advanced search technologies and is in talks with search giants including Google to join the project, CNET has learned.

The project is one of many efforts at U.S. universities designed to address the explosive growth of Internet search and the complex issues that have arisen in the field.

U.C. Berkeley, birthplace of early search highflier Inktomi and the school where Google CEO Eric Schmidt got his computer science doctoral degree, is bringing together roughly 20 faculty members from various departments to cross-pollinate work on search technology, said Robert Wilensky, the center’s director. The principal areas of focus: privacy, fraud, multimedia search and personalization.

What’s new:
Continuing a long tradition of academic exploration in Net technology, U.C. Berkeley will soon open an interdisciplinary center for developing new search technologies. The school is talking to a number of search companies, including Google, about participating.

Bottom line:
Drawing on the expertise of faculty from various departments, Berkeley’s center will focus on privacy, fraud, multimedia and personalization as these topics relate to the increasingly diverse and in-depth information available on the Internet.

More stories on this topic

“We want to solve the problems that have been engendered by the success of search,” Wilensky said in an interview. Wilensky is a professor of computer science and information management at Berkeley.

Plans are still being worked out for the center’s physical space, but Wilensky said he hopes designs will be completed within the next few months and the center opened early next year. He also said he’s talking to Google and other search players about membership.

“If you have 20 researchers interested in search, then getting them together where they are cross-fertilizing ideas, you make something bigger than its parts. You can create a nuclear reaction,” he said.

Google declined to comment. (Google representatives have instituted a policy of not talking with CNET reporters until July 2006 in response to privacy issues raised by a previous story.)

The success of the $5 billion-a-year search-advertising business is fueling Internet research and development in many ways. The business has not only bolstered the likes of Yahoo and Google with billion-dollar annual revenues to be spent in new areas but it’s also revived hundreds of smaller dot-coms and inspired leagues of upstarts to venture into areas of specialty search.

Looking for the next generation to be born? There’s no better place to visit than academia, where today’s most successful search companies got their start. “A big source of new ideas comes out of universities,” said Geoff Yang, a venture capitalist at Redpoint Ventures, which has backed such companies as AskJeeves and TiVo.

Google and Yahoo were practically hatched in the same dorm room at Stanford University by two pairs of graduate students roughly six years apart. Lycos, a one-time search leader, came out of Carnegie Mellon University. Newer projects include Vivisimo, a clustering search tool from CMU professor Raul Valdes-Perez.

The search problems of today are different from those of five years ago. With books, scholarly papers and television programs being digitized and put online, the technology necessary to search through the material needs to be that much better. People need a way to trust the information they find and to ask more-complex questions with search tools so they can extract knowledge or ideas.

Jaime Carbonell, director of CMU’s Language Technologies Institute, said his research team is perfecting a technology for personalized search that would solve some of the privacy concerns surrounding the wide-scale collection of sensitive data, such as names and query histories. CMU’s project takes an auxiliary approach to software already being tested by commercial players like Yahoo and Google, which are collecting and storing search histories on their own networks.

CMU developed an add-on application that people download to a PC. It allows users to maintain and modify personal information, such as query history, preferences and favored sites, within a search profile. A search engine would be able to query the profile, along with the user’s search term, to deliver a set of tailored results each time, thereby keeping personal information off the network and on the client’s desktop.

Carbonell said the technology will be ready within a year, and CMU could either offer it as open-source software or license it to industry players.

CMU is also working under a government grant on a longer-term project called Javelin, focused on question-and-answer search technology. Google, MSN, Ask Jeeves and others already help people find quick answers for word definitions or encyclopedia facts like “What is the population of Los Angeles?” But for complex queries like “What is the cheapest flight from San Francisco to London?” or “Which university has the largest computer science department?” finding answers is still like doing long division.

“This is dynamic information,” Carbonell said. “You must parse the question, look for answers in multiple places and do a comparison. There are multiple steps, and we’re looking at how to do it in one step and provide a trace for the user.”

He said it will likely take another four of five years to build such functionality that can scale computationally for wide consumer usage and deliver the kind of efficiencies the government and Internet users expect. The universities of Texas and Pennsylvania are also exploring different approaches to the same problem.

Stanford continues in its role as a breeding ground for search projects. Since 2003, Google has purchased at least two projects hatched at Stanford–personalization search tool Kaltix and a project from Anna Patterson, a Stanford computer science research associate. Stanford associate professor Andrew Ng, among others, is working on artificial-intelligence techniques for extracting knowledge from text in a search index.

Other projects have turned into young businesses. SearchFox is a Web upstart co-founded in December by James Gibbons, a longtime Stanford professor and former dean of its School of Engineering. The privately held company has created a collaborative search engine that lets people share favorite links and create personalized search indices.

Stanford, the Massachusetts Institute of Technology and many other universities are working to solve problems presented by the library of tomorrow, which will be largely digitized. Sifting through and organizing billions of digital documents will require new search technology.

MIT, for example, has teamed with the World Wide Web Consortium to create next-generation search technology using the Semantic Web, in an overarching project called Simile.

Under that umbrella, an MIT graduate student has developed a tool called Piggybank, software that plugs in to the Mozilla Foundation’s Firefox Web browser. Piggybank lets people surf the Web, tag visited sites with keywords and build a local, annotated collection that can then be published to a site called the bank. Therefore, it turns into a “Semantic Web browser” so users can expand the scope of understanding around existing information on the Web.

“A generalized data archive lets you make data work together in ways you couldn’t before,” said MacKenzie Smith, associate director for technology in the MIT libraries.

In a demonstration of what the tool could do, Piggybank integrated data from, a movie site and Google maps to show where coffee shops are located relative to restaurants and movie theaters. The tool also lets users save such information to a “database” record (rather than a bookmark) so that it can later be searched by its attributes or designated keywords.

MIT hopes to deploy the technology and other advances from Simile for use by faculty and students.
Indie studio takes wing
Graphical plans for Microsoft
Sensors, sensors, everywhere
Studying Linux in Microsoft lab
Big search on campus
Previous Next

At Berkeley’s center, Wilensky has ambitious plans to solve problems within a broader definition of search. That means analyzing and organizing diverse forms of information–anything from images and video to e-commerce–and helping people synthesize it and extract knowledge.

One major area of development will be in trust and privacy. For example, how believable is the content dug up on Google or how do you know an eBay seller is truly trustworthy?

Wilensky said his group has proved that on average, eBay seller ratings are skewed based on what’s called retaliatory ratings in which people slam those who slam them. Others with black marks will disappear only to re-emerge later with a clean slate. As a result, Wilensky said, his team has built an algorithm called “EM trust” (for expectation maximization) using a statistical model for rating how honest an online seller may or may not be. That development might be applied to Web sites as well.

The center will be modeled after Berkeley’s Wireless Research Center in downtown Berkeley, which enjoys the backing of big mobile companies. It will include such faculty as Jitendra Malik, professor and chair of U.C. Berkeley’s Department of Electrical Engineering, and David Forsyth, professor of computer sciences, who are both working on computer-vision research.

ACM News Service

“LinuxWorld SF: OSDL Announces Patent Commons Project”
IDG News Service (08/10/05); Nystedt, Dan
The Open Source Development Labs (OSDL) in concerned that software patents are having a detrimental effect on open-source collaboration, and mitigating that threat is the goal behind the Patent Commons initiative the organization announced on Aug. 9. The effort will involve the collection of software licenses and patents pledged to the open-source community within a single repository for developers. The Patent Commons will also serve to lower the threat of patent-related lawsuits and ease the administrative burden of approving individual licenses, thus encouraging more companies and individuals to contribute their intellectual property to the open-source community. Vendors who make such pledges are basically promising not to pursue litigation against developers or users. The Patent Commons also ensures patent holders that an organization committed to open-source software is looking after their patent enforcement rights. The project will initially concentrate on the development of a library and database to store software patents and patent licenses, in addition to patents pledged by companies. The OSDL said other legal items, such as indemnification programs offered by open-source software vendors, will also be aggregated.

LinuxWorld SF: OSDL announces Patent Commons project – Computerworld
News Story by Dan Nystedt
AUGUST 10, 2005

The Open Source Development Labs, a group dedicated to promoting Linux, announced a new initiative called Patent Commons, to collect the software licenses and patents pledged to the open source community into a central repository to make them easier to access by developers, and encourage more patent holders to pledge their intellectual property to the cause.

The move will increase the utility of the growing number of patent pledges and promises in the past year by providing a central location for open-source software developers, OSDL said yesterday. It will also reduce the threat of patent-related lawsuits, the group said.

The move will also encourage more companies and individuals to pledge their intellectual property to the open-source community by reducing the administrative headaches posed by granting individual licenses, which the OSDL said is a barrier to the formal licensing of patents.

The OSDL hopes the measure will help encourage more companies to pledge their IP to the open source community, a group that includes IBM Corp., Nokia Corp., Novell Inc., Red Hat Inc. and Sun Microsystems Inc.
Vendors like these that pledge their patents to the Patent Commons project are, in general, promising not to file lawsuits against developers or users. At the same time, patent holders will be assured that the right to enforce the patents is watched over by an organization dedicated to open-source software, OSDL said.

The group said software patents are a huge potential threat to the ability of people to work together on open source.

The project will initially work on a library and database to house software patent licenses and patents, as well as patent pledges made by companies. It will also collect other legal items, such as indemnification programs offered by vendors of open-source software, the group said.

The Patent Commons project is still in the planning stages, the OSDL said, adding it expects to announce more details in coming months

Although Intel funds the labs, it doesn’t own the intellectual property, and the research is widely shared and published, Teixeria says. Intel won’t disclose how much it’s spending on its university research projects, but its overall R&D budget is expected to exceed $5 billion this year.

The real goal, Teixeria says, is to see if the labs can unearth something that Intel might then be able to take in-house and develop further.

“It’s this notion of both helping to grow the technology and seeing where there is a usage for it within Intel,” he says.

Does “accelerate” include pointing to technologies that have already become startups, like Xen and variants of sensors? :-)

The reference to parallel search of vast, unindexed data is intriguing — it reminds me of the scale of challenges the wayback machine is facing for the Internet Archive — how could P2P help a 40TB+ search problem, given that one is willing to trade off longer response times against much lower (centralized) costs/better-shared costs?

ACM News Service

“Intel Goes to School”
Computerworld (03/28/05) P. 40; Vijayan, Jaikumar

Intel Research is funding a quartet of university “lablets” to identify and investigate technologies that merit “acceleration and amplification,” according to company representative Kevin Teixeria. He says Intel has no claim on the intellectual property produced by the labs, because it is interested in “helping to grow the technology and seeing where there is a usage for it within Intel.” Intel’s UC Berkeley lablet is focusing on systems that employ wirelessly networked sensors to collect a wealth of information about the environment, and the TinyOS operating system and TinyDB query-processing technologies have been notable breakthroughs. Researchers are currently devising the Tiny Application Sensor Kit, a suite of tools that lab director Joseph Hellerstein says will simplify the deployment of applications that use sensor networks. A second Intel lablet at the University of Washington is combining radio frequency identification (RFID) technologies and data mining software into the System for Human Activity Recognition and Prediction, which is supposed to predict human behavior by monitoring the objects people touch and how they are used. A key tool of this research is the RFID-enabled iGlove that extracts data from objects with affixed RFID tags. Another lablet based at England’s University of Cambridge under the supervision of Derek McAuley is looking into highly distributed applications, examples of which include Xen, a “virtual-channel processing” technology that allows a single system to support multiple operating systems and users more efficiently than software-based virtualization. The Carnegie Mellon University lablet’s area of concentration is software for widely distributed storage systems, with emphasis on interactive searching of massive archives of non-indexed data, and the acceleration and enhanced accuracy of searches via embedded processors.

The Carnegie Mellon Intel lablet is investigating software for widely distributed storage systems. Researchers working with Seagate Technology LLC are trying to enable interactive searching of terabyte-size collections of nonindexed data.

As part of that effort, researchers are studying how to speed up searches and make them more accurate by embedding processors either close to or on storage devices so they can examine and discard irrelevant data close to the source.

Yahoo’s new search master | Between the Lines |
-Posted by Dan Farber @ 11:48 am

The arms race for scientists with expertise in various areas of search, data mining and data analysis is in full flower, as in the tug of war between Google and Microsoft over the services of Kai Fu Lee. Meanwhile, Yahoo is making a major investment from its nearly $500 million annual engineering spend to build out its own world-class research group.
In fact, based on my conversation yesterday with Prabhakar Raghavan, the new head of Yahoo’s research group, Yahoo has its sights set on Nobel prizes and making breakthroughs to ensure the future of the company. I don’t think he was exaggerating. Search and creating more personalized user experiences that take advantage of underlying data and relationships is still in an infant phase. Yahoo, Google, Microsoft, Amazon and other major players understand that the spoils will go to those who provide answers, rather than links, and develop ways in which billions of consumers and creators of content can participate in an economic and social value chain.

Raghavan, who spent 14 years doing search and data mining-related research at IBM and was lured from his stint as CTO of enterprise search vendor Verity, told me that he intends “to go after the best in world and to get them.” He said that Yahoo will be able to attract top talent because of its stable and profitable business and the opportunity to impact Yahoo’s audience, who account for 12 to 15 percent of all the Web activity worldwide (Yahoo’s numbers). “We have an amazing outreach,” Raghavan said. “Ten terabytes of data, which for a scientist is pretty appealing.” Raghavan is also well connected in the research community-he is editor in chief of the prestigious Journal of the ACM.
For scientists with expertise in information retrieval, computational linguistics, machine learning, matrix and graph algorithms, unsupervised clustering, data mining and related areas, it’s like the U.S. housing market. They’ll have multiple bidders and command a premium. Raghavan said that Yahoo would stay away from high-profile hires and focus more on university researchers, as well as hiring college student interns and grooming them for jobs. Yahoo also recently formed a research center in association with the University of California, Berkeley. That said, he was able to recruit a well-regarded colleague, Andrew Tomkins from IBM.
Raghavan noted that his group will be active and open to the research community. “We will publish our research and interact with peers–it’s critical to the success of a research organization. There is an obvious aspect of marketing, PR and being visible contributors of ideas to the community. That said, we will not take every trade secret and publish it. It’s a challenge other industry leaders have solved before. We will publish and be judicious about how we do it.”
Raghavan has been in the job just over a month, but he has been impressed by what he called the “thirst for ideas that flow form research to the business.” He acknowledged that moving research into products is a challenge. He listed improving search, building a better advertising platform, making better sense of social media, large-scale distributed computing, and developing incentive structures and tools as his goals.
Regarding search, Raghavan said, “We have two views of better search. Most people are not interested in search-they want to get things done. The future has to be more friendly to people getting tasks done. You don’t want to spend two weeks of evenings sitting at a keyboard and piecing together a vacation plan. You want a system to go out and find the answers, based on future technology that goes beyond crawling and indexing pages.”
That future technology, according to Raghavan, is diving into the “deep Web” and semi-structured queries. “I hesitate to use the buzzword of ‘Semantic Web’–but it is about entity extraction, XML queries, unstructured queries, semantic ambiguity. We have to build a view of the world. When you issue a query, it has richer view than a text index. We’ll start to see manifestations of this in five years.”
On the back end, Raghavan wants to solve the problems like spam and to “align the commercial incentives of a billion content providers with social good intent.” He pointed to the field of mechanism design, a sub-field of microeconomics and game theory, as key to creating economic models that encourage people to participate in a clean, well-lighted digital marketplace with billions of content creators and consumers.
“We want to inspire the audience to give more data and more. If someone creates a snippet of music and others remix it and it finally becomes a hit, how do you divvy up the proceeds amongst all the constituents? That [economic incentive network] has to be figured out. There is a lot of microeconomics that is not fully understood, and it’s one of the areas we want to understand. There will be Nobel Prize in economics award for this stuff, and I wouldn’t be unhappy if it came from our group.”
Along those lines, Raghavan and Jon Kleinberg authored a paper recently entitled “Query Incentive Networks,” which looks at networks of interacting agents as economic systems, in which “users seeking information or services can pose queries, together with incentives for answering them, that are propagated along paths in the network.”
Yahoo wants to turn its fragmented set of services, content and marketplaces into a cohesive whole and to aggregate, distribute and monetize the creative output of its users. “We have a plethora of opportunities looking at different social networks, such as blogs, instant messaging, My Web, Yahoo 360, and other services, across Yahoo properties,” Raghavan said. Yahoo’s social search engine My Web 2.0, for example, allows Yahoo users to archive, tag and annotate search results and share them with other people using the service. Users can also search their contacts’ My Web and browse content that others on Yahoo’s network have shared.
But determining what data from the pools of Yahoo services and billions of inputs is useful to people and will create a breakthrough in the user experience is one of his team’s challenges. “It’s a classic problem in statistical machine learning-you might have 200 data points, but how do you zero in on the three that make a difference?”
As part of Yahoo’s Research initiative to harness the activity on its properties in ways that create new revenue streams and sticky user experiences, Raghavan’s team will be racing its competitors to come up with standards and methods for determining value, incentive systems, frictionless payments and rights management. “We will let the market determine what is interesting and those who contribute the interesting stuff will get rewarded,” Raghavan said.
However, without standards across user networks, every site will be a cul-de-sac. An incentive system on one site will not interoperate with another site. It’s like requiring users to have a different card for every kind of ATM machine. I asked Raghavan whether users should have access and control to the data collected by Yahoo. “Users should have control of what data is collected or given up and knowledge of what is done with it,” he said. “Giving every person their clickstream doesn’t make a lot of sense-most don’t want it-but they should have knowledge and control.”
However, Raghavan supported the concept of being able to exchange your data collection-such as your Amazon or Yahoo shopping clickstream and forms input-with another site. “The data belongs to the user because it’s about the user, but we are not at a point today where multiple shopping sites can exchange data. It’s metadata challenge, but it’s more of a standards activity, not a research issue.
In addition, his group is working on aspects of personalization. “Personalizing is a loaded word, and it sometimes gets trivialized. It’s not about customizing the colors on the MyYahoo page,” Raghavan said. “It’s more of a social phenomenon that takes into account what others are doing, especially people like yourself. Content, context and community coming together is a long-standing dream in our business-we are all going after it. But, the catch is when the user is not only a consumer but also creator of content. It leads to interesting possibilities in tandem with data mining and the user experience. You have to decide what content to show that users will find valuable, and not irritate users with too much content.”
Raghavan has also spent time looking at how to mine blogs for predicting the movement of products and developing new user experiences. “We are looking at sources of information– text, photos, podcasts–whatever we can mine from the back end. Then we look at what users want, and bring the two together to create an application from all chatter going on,” Raghavan said. “We can dream up cool experiences, but they have to be grounded in product reality. As we develop technology, markets start to react, so mining begets a reaction from market and begets more mining, so we are constantly working on more scenarios.”
Underpinning all of Yahoo’s–as well as every other megasite’s–dreams of growing to billions of active, transacting, content creating and consuming users is the ability to build an efficiency platform with millions of computers and data sets distributed around the globe. With 345 million unique users per month across 25 countries and in 13 languages, Yahoo, as well as its competitors–especially Google–has some experience in planetary scale computing.
While the progress over the last ten years of the Web has been significant, we are still in the Stone Age of search, social networks, incentive models and personalization. With the competitive juices flowing in research labs, and wide open commercial opportunities, the next ten years will be more about answers than links, but not without some serious flailing%u2026

…not all of which are spinoffs of its own; some are strategic investments in external inventions they believe they need. This is another great isgn of the realignment of expectations for corporate venture activities…

Technology Start-Ups Get Chance
To Grow With Siemens Backing
October 13, 2005; Page B4

Deep within Siemens AG, one of the world’s largest companies, lies a tiny technology incubator.

In Berkeley, Calif., eight small companies, each with the financial backing of Siemens, are in the start-up stage. Siemens backs the companies — most of which share space in the same building and some of which began with the proverbial “inventor-in-a-garage” status — through a program it started six years ago called Technology to Business.

The program provides companies with seed-stage financing of around $500,000 and helps with early commercialization. In return, Siemens gets a percentage of each company and access to new technologies that can aid the German engineering giant’s own businesses.

“TTB was created as a model to bring technology and innovation that are outside into Siemens,” said Stefan Heuser, president and chief executive of TTB since last year. “It’s an outside-in approach. We’re like an early-stage investor.”

The program looks for technologies that fit into Siemens’s businesses, but doesn’t prevent the small companies from eventually seeking outside venture financing and selling to other customers. The companies can move out on their own when they are ready. However, some entrepreneurs who hitch up with TTB do so knowing Siemens will make a solid customer for their products.

For instance, Amine Haoui, CEO of wireless-sensor company Sensys Networks, came aboard TTB two years ago, hoping Siemens would have a number of applications for his technology.

“I felt very strongly that, in our type of business, a partnership with a large corporation from the get-go would be very useful,” said Mr. Haoui, whose company makes traffic-monitoring systems used by government agencies. “With Siemens being the dominant traffic vendor in the world, it made a lot of sense to me. We’d know the market a lot better and get faster access to customers.”

Within two months of receiving financing from Siemens, Mr. Haoui and his team were getting a grand tour of Siemens’s industrial divisions in Germany and the U.S. “We got access to people it would have taken two years to [meet] on our own,” Mr. Haoui said.

Aleks Goellue, CEO of PINC Solutions, whose technology helps companies track products before they are shipped to customers, said it wasn’t a difficult choice to couple with Siemens’s TTB.

“Even though my previous start-up was funded with traditional venture capital from Day One, I preferred not to directly try to fund-raise” this time around, Mr. Goellue said. “Back in 1998, you could just present your idea. Now, VCs want a lot more traction,” he said, referring to venture-capital investors.

Another advantage is that at the TTB facilities, Mr. Goellue said he is able to exchange ideas with both Siemens’s employees and with people from other seed-stage firms whom he bumps into in the hallway.

In return for the seed-stage investment in Mr. Goellue’s company, Siemens obtained an ownership stake and also will have contracts under which PINC will build certain products for Siemens. Although PINC plans to seek additional financing from traditional venture-capital firms, “the relationship with Siemens will stay even when we graduate out of TTB,” Mr. Goellue said.

Could the relationship with Siemens become an impediment? Mr. Haoui doesn’t think so.

“We have an investment from Siemens and a partnership with them, but there are no strings attached,” he said. “There were instances where we talked with Siemens’s competitors in the market and I had to explain to them that we could work with them if we wanted to.”

Siemens’s Mr. Heuser compares the difference between classic corporate research-and-development efforts and TTB to that between farming and hunting. “You work on technologies for years and develop things, just like developing a crop and harvesting it again and again,” he said.

But with TTB, “The idea behind this was to go in a more hunting direction, looking for technologies outside of Siemens’s R&D, looking for ideas from the start-up community and universities,” Mr. Heuser said.

Siemens and a host of other companies have venture-capital arms, and 40 are corporate members of the National Venture Capital Association. But Siemens goes further than most with TTB and other programs, said David Spreng, managing partner with Crescendo Ventures and a board member of the NVCA.