Search engine has its root in information retrieval, which had undergone over half a century of research. How is information retrieval evolved into the search engines we see today?
Most web search engines today index every single word on a page because they have abundant storage and processing power. At the same, they can cater for the most unexpected queries (e.g., searching for exact phrases that contain stopwords). However, in general, it is not advisable to index every word but only those that have the highest values. How are values defined?
It has been well-known that the Boolean model is too inflexible, requiring skilful use of Boolean operators to obtain good results. On the other hand, the vector space model is flexible but not precise enough. Is there a middle ground?