Startseite / Blog / How does the search in smart.finder and map.apps Smart Search work?

How does the search in smart.finder and map.apps Smart Search work?

Kristian Senkler August 26, 2015

Search represents a central functional aspect in smart.finder and map.apps Smart Search extension. This post will explain the search in both products and respond to key questions: What documents are in my index? How can I find them in a fast and targeted way?

Search in smart.finder

The search in smart.finder is divided into two steps: a suggest-search and a subsequent faceted search.

Suggest Search

A suggest-search means that matching terms are proposed on the fly, depending on the user input. They can:

  • have the same beginning;
  • Be similar in terms of the letters used;
  • have a similar meaning.

The suggestion function of smart.finder uses a fuzzy module that is based on the Levenshtein algorithm. The proposed results are concepts that are derived directly from the index. The order of the matching results depends on the number of documents in which the term occurs (the so-called term frequency).

The Levenshtein algorithm
The Levenshtein algorithm refers to the minimum number of edit operations needed to transform a particular character string into another. The less editing operations have to be used, the more similar the two concepts. Editing operations are replacing, inserting or deleting characters. A semantic similarity is not considered.

For further details, see for example: https://en.wikipedia.org/wiki/Levenshtein_distance.

In the following example, the term "gym" is entered and proposed the term "gymnasium" as a term with the highest frequency in the documents of the Index. The terms following in the suggestion list are similar to the input, but have a steadily decreasing incidences.

As an additional variant phrases are proposed:

For the suggest-search, two fields are defined in the schema of the server from which the terms are proposed, based on the user input.

  • suggest_word: includes all terms that are more or less complete words
  • suggest_phase: includes phrases as so-called Shingles

Based on the contents of these fields for the suggest-search, a dictionary is created (using HighFrequencyDictionaryFactory), which provides the proposed terms at runtime.

Faceted Search

Should the user select a term from the list of proposals, smart.finder initiates the faceted search. By way of example, this looks as follows.

Syntax of the faceted search

q=suggest_word%3Alandesvermessungsamt&wt=json&rows=0&start=0&facet=true&facet.field=publisher&facet.field=language&facet.field=type&facet.field=format&facet.field=accessRights

It will only search against the full terms (suggest_word). The facets are of course dependent on the particular configuration of the app (in this case, language, type, format, accessRights and publisher)

The response from the server contains the facets together with their frequency:

Faceted response

[...]
facet_counts: {
  facet_queries: {},
  facet_fields: {
    publisher: ["Landesvermessungsamt Schleswig-Holstein", 6, "Bundesamt für Naturschutz", 0,
      "European Commission, DG Joint Research Centre - Institute for Environment and Sustainability - Digital Earth and Reference Data Unit",
      0, "GIS-Zentrum Kanton Zürich", 0
    ],
    language: ["ger", 6, "eng", 0],
    [...]
  },
  facet_dates: {},
  facet_ranges: {},
  facet_intervals: {}
}

The results show, for example, that there are six documents in the index, the "language" field is "ger" and they include the keyword "Landesvermessungsamt". If you select other facets in smart.finder client, the result space is accordingly further restricted by ANDing or ORing of the additional facets.

Search in the map.apps Smart Search Extension

Searching with the map.apps Smart Search Extension differs from the search in smart.finder in so far that it is presented as a "normal" looking search, with no explicit suggestion search and no faceted search.

However, the search algorithm is designed to be very flexible. It basically consists of three parts:

  1. Fuzzy-search
  2. Wildcard-search of the term
  3. Direct match of the term

For example, the user searches for " Hoch Musik ". The relevant part of the search is then obtained as follows:

Hoch~ OR Musik~ OR Hoch*^98 OR Musik*^98 OR (Hoch* Musik*)^99 OR (Hoch Musik)^100

In prose, this means that documents can be searched such that

  • the terms "Hoch" and "Musik" are exactly contained: (Hoch Musik) ^ 100
  • they contain both terms, those beginning with “Hoch" and those beginning with "Musik": (Hoch* Musik*)^99
  • they contain either terms, those beginning with “Hoch" or beginning with "Musik": Hoch*^98 OR Musik*^98
  • similar terms such as "Hoch" or "Musik" are included: Hoch~ OR Musik~

The relevance of the partial results is determined through the query-boost factor (eg ^ 99). The higher this is, the higher are the documents in the results list. This factor part of the query is hard-coded and cannot be changed by configuration.

The situation is different when considering the degree of similarity, which is encoded by the tilde (~). This symbol indicates that in this case, searches for documents should be done in a "fuzzy" manner. The user can adjust this part of the search using the user interface or in the app.json file.

Allowed values are:

  • Do not use = no fuzzy-search
  • Server Standard = corresponds to an edit distance of 2
  • "0" = exact
  • "1" = similar
  • "2" = dissimilar

Kristian is Product Owner for the smart.finder, smart.finder SDI und map.apps Smart Search products.