Recently in Thinking Category

Cuil

| No Comments | No TrackBacks
一个用了古爱尔兰语命名,并且index了世界上最多页面的搜索引擎-Cuil出现在了世人面前。看了一下它的features,基本上没有带来任何惊喜,query suggestion,result classification都是已经出现在其他搜索引擎中的技术。
虽然号称index了最多的页面,但并不代表用户就可以得到他想要的结果,high recall but low precision is trivial,用户没有那么多的时间和耐心去一个一个去点击查看哪些页面符合自己的要求,Searchme比Cuil好的地方就在于它降低了检查页面的成本。但是他们共同的问题是precision都不是那么的高。
现 在想要做一个大而全的搜索引擎和google,yahoo们竞争几乎不可能成功,这是一个需要创新但也需要积累的领域,绝不是简单的pagerank就能 解决一切,paperank可以过滤不好的页面,可以让有价值的页面拥有更高的rank,但这只解决了问题的一面,keyword search 是一种充满ambiguity的搜索模式,用户的intention很难通过几个关键字就清楚的表达出来。因此如果一个搜索引擎无法相对准确的分析出用户 的intention,搜索结果的precision很难提高。然而分析用户的intention是需要长期积累大量的数据、分析语义以及人的搜索行为, 并且不断的进行refine才能做到的,而且人的intention会随着时间而evolve。至少目前来看,Cuil的积累还不够,并且目前 research在语义这一块还不怎么成熟。其实如果一个新的搜索引擎focus on某一特定的领域或是用户群,那成功的几率或许会高些。
最后对于Cuil,我更感兴趣的是Cuil这个词该如何读呢?

SearchMe:Stacks

| No Comments | No TrackBacks
SearchMe released a new feature called stacks, which proposes a creative way of sharing interesting web pages.
  1. Users can push the useful search results into a stack and save it. The saved stack can be shared through email or embedding codes into web pages. Then others can view this stack in a slide mode. Obviously, a slide-view stack is much more convenient and vivid than previous sharing method.
  2. As we know, low precision and high recall is one major problem of existing search engines. With this feature, users can save a lot time on locating the useful results in the future. However, the web pages on the Internet are always changing. If the system can recommend some new useful web pages similar to a stack, it will be better. In addition, I think search in stacks is also a necessary feature.
  3. The search engine can adjust ranking score according to the saved stacks. Actually, the users help the search engine to find out which web pages are useful results of a keyword query. After large amount of data is collected, the search engine can make use of these data to improve the precision of results.

Pretty Cool SearchMe

| No Comments | No TrackBacks
I found a cool search engine - SearchMe (a gold supporter of the conference) on the website of CIKM 2008. It uses a totally different way to present the search results. Compared with snippets, picture slides is much more intuitionistic and friendly to users. Users can easily make an initial judgement whether the contents of a web page can satisfy their requirements.
To be honest, the most interesting feature attracts me is the classification of the returned web pages. When we input the keywords, a classification is instantly shown on the top of the slide. This feature is very useful for effectively filtering out the irrelevant results. Of course, the accuracy of classification should be improved (I think it mainly depends on the research of web mining), but this is a good direction.

Dapper, Semantify the Web Sites

| No Comments | No TrackBacks
For sure, Dapper can help our web sites convey semantic information or meta data and make our web sites more structured. But I doubt whether it can help the search engine improve the accuracy or precision of the results of a keyword search. Have a look at the keyword search over relational database or XML database. The data is structured, but the results of a keyword search is not ideal. Therefore, the precision of results is not determined by whether a web site is semantified. Semantifying is just the first step.
Users express their intention through several keywords, but these keywords convey too limited semantics, so it is very difficult to find out what information is really wanted by the users. If this problem can not be properly solved, the semantic information conveyed by the web sites may be not very helpful. Therefore, when our web sites become semantified, we also need a new simple query pattern to make the keywords more semantic.

Recent Comments

  • 北极冰仔: 上次你冬天我是夏天,现在你夏天,我冬天了 read more
  • Shawn: College 版 DMOZ 么?囧 read more
  • Jiang: 已经不分四季了:) read more
  • 老所: 同志,国内立冬了。 read more
  • 老所: 很幸福啊,:) read more