Touching hands
2.1Introduction
Navigation:

Back to the previous page Previous page

To the Table of Contents of this chapter Chapter ToC

To the Table of Contents of this paper Paper ToC

To many, the Internet is a valuable source of all sorts of information. Never before has an information source been available through which such massive amounts of information - and in such a broad range - can be gathered. And this information can be obtained conveniently and at very low costs, which too is something that seems unprecedented.
The same story can be told for those wanting to offer information or services through the Internet. The barriers as well as the investments needed to do this, are very low (at least when compared to other media). Literally anyone who wants to, can get their fifteen minutes of fame in "Cyberspace".
And although this has been - and it still is - one of the biggest traits of the Internet, it also has an important downside to it: there is very little supervision on the ways in which information is offered, i.e. there are no rules stating in which form information should be offered, the way in which information should be described or typified, and so on. Organisations such as the Internet Engineering Task Force and the W3C have designed various guidelines and standards to which information or documents should adhere (and they are doing continuous research into this and related areas), but to seemingly little avail. For instance, standard HTML offers the possibility to add META-tags to HTML documents. These tags can be used to convey information such as the author of the document, the date of creation, the document type, and one or more keywords that best describe the document's content. At this moment, only a small percentage of all documents available on the World Wide Web (WWW) part of the Internet, use these tags.
Search engines, currently the most popular way to search for information on the Internet1, will use these tags (when available) to classify and typify a document. But as these tags are usually missing from a document, a set of heuristics have to be used instead to classify it; besides standard data, such as the date of creation and the URL, a document is usually classified by a list of the most frequently used terms in it, or by extracting the first 50-100 words of a document.
The advantage of using such heuristics is that the whole process of locating, classifying and indexing documents on the Internet can be carried out automatically and fast2 by small programs called crawlers or spiders.

Today, this method of working is more and more showing that the medal has two sides. While this method makes it possible to index thousands of documents a day, there is a price that has to be paid for using it: the loss of detail and the lack of a comprehensive summary of a document's contents. This leads to search engines returning huge result lists as the answer to a query, lists which also contain a lot of noise, such as irrelevant, duplicate or outdated document links.
Moreover, a lot of users do not know exactly what they are looking for, let alone that they know which terms are best used (and which are best not used) to describe it;

"[...] The short, necessarily vague queries that most Internet search services encourage with their cramped entry forms exacerbate this problem. One way to help users describe what they want more precisely is to let them use logical operators such as AND, OR and NOT to specify which words must (or must not) be present in retrieved pages. But many users find such Boolean notation intimidating, confusing or simply unhelpful. And even experts' queries are only as good as the terms they choose.

When thousands of documents match a query, giving more weight to those containing more search terms or uncommon key words (which tend to be more important) still does not guarantee that the most relevant pages will appear near the top of the list. Consequently, the user of a search engine often has no choice but to sift through the retrieved entries one by one."

from [HEAR97]


Suppliers of information, as opposed to users, have problems of their own. One of the biggest problems facing suppliers is how they can get their information to the right persons. With the growing number of information sources, how do you stand out from the rest, how do you make sure that the right people can find you? Making your information known to people by submitting information about your service (an 'advertisement') to search engines, is a method that is getting less and less effective.

An attempt to provide more personalised and more up-to-date or even real-time information (e.g. by using databases to store and present a site's content) has and is amplifying this problem even more; at this moment, there are Web services which consist of thousands of pages, numerous of which are created ‘on-the-fly’ (so they can be filled with up-to-date/real-time information). However, sites that use such dynamic documents, cannot be properly indexed by the small indexing programs (called crawlers) that most search engines use to gather data about the information that is available on a site, as there are no complete or static documents that can be scanned and indexed. The information in those documents is hidden away from, or unavailable for, the indexing programs. (see [LYNC97])


One of the latest attempts to tackle (or circumvent) the problem of how information supply and demand can be brought together is called Push Technology. This concept really is not as new and novel as some would like you to believe: the technique of basic server push (which is what Push Technology is basically about) has been around for a number of years. It is the specific way in which server push is used now and the way in which it seems to meet certain (information/market) demands, that has turned it into a prominent development. Although Push Technology has already reached its peak in terms of coverage in the various media as well as massive usage by both end users as well as suppliers of content, it has been influencial enough to justify its (rather prominent) inclusion in this chapter.

Apart from a look at a prominent development such as "Push Technology", in this chapter we will also have a look at precursors of important (near) future concepts and applications in the context of the online market place (e.g. agent-like applications and intermediary services).
As an aside: this chapter is meant as a prelude and introduction to chapter three (and further); it is not meant to cover all currently important Internet trends & developments in an equal fashion, but is meant to show how tomorrow's concepts and techniques are emerging today.


1= Usually, search engines are used to search for information on the WWW, which is why in this section we will be talking about "documents". Yet, most search engines can be used to search for other items as well, such as files or Usenet articles. To keep things simple, we will continue to talk about (Web) documents, but what is said is in most cases just as valid for those other types of information.
2= The documents being scanned do not have to be interpreted or comprehended: merely applying the mentioned heuristics is all that needs to be done.
On to the next page Next page
To the Hermans' Home Page Home Page
Chapter 2 - The Internet (of) Today "Desperately Seeking: Helping Hands and Human Touch" -
by Björn Hermans