Case-insensitive
IntraText search is case-insensitive: searching for "CAESAR", "Caesar" or "caesar" is exactly the same and gives the same results.
Default boolean operator
The AND operator is the default conjunction operator in IntraText search.
This means that if there is no Boolean operator between two terms, the AND operator is used.
Example: the query iulius caesar is the same of iulius AND caesar.
See further in this page for more details.
Accent-sensitive / accent insensitive
IntraText search can be accent-sensitive or accent-insensitive:
- in accent-sensitive search (default), ancora, àncora and ancóra are three different words,
i.e. accents and diacritics are relevant: searching for ancora DOES NOT FIND àncora;
- in accent-insentitive search (use the checkbox), ancora, àncora and ancóra are the same,
i.e. accents and diacritics are ignored: searching for ancora FINDS àncora as well.
In analysis results, text extracts have characters normalized to characters without accents and diacritics.
Terms
A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases:
- A Single Term is a single word such as Augustus or manus.
- A Phrase is a group of words surrounded by double quotes such as "Iulius Caesar".
- Multiple terms can be combined together with Boolean operators to form a more complex query.
Term Modifiers
IntraText search supports modifying query terms to provide a wide range of searching options.
- Wildcard Searches
IntraText search supports single and multiple character wildcard searches.
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced.
For example, to search for manus, minus or munus you can use the search:
m?nus
Multiple character wildcard searches looks for 0 or more characters. For example, to search for Caesar, Caesaris or Caesarianum, you can use the search:
caesar*
You can also use the wildcard searches in the middle of a term.
c*sar
Note: You cannot use a * or ? symbol as the first character of a search.
- Fuzzy Searches
IntraText search supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to aqua use the fuzzy search:
aqua~
This search will find terms like aquae and acqua.
An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:
aqua~0.8
The default that is used if the parameter is not given is 0.5.
- Proximity Searches (also known as NEAR operator)
IntraText search supports finding words that are within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a Caesar and Roma within 10 words of each other in a document use the search:
"caesar roma"~10
Booelan Operators
- OR
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol "||" can be used in place of the word OR.
To search for documents that contain either Iulius Caesar or Augustus or both, use the query:
"iulius caesar" OR augustus
- AND
The AND operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the AND operator is used. The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.
To search for documents that contain Iulius Caesar and Augustus use the query:
"iulius caesar" AND augustus
or
"iulius caesar" augustus
- NEAR: see Proximity Searches
- NOT
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.
To search for documents that contain Iulius Caesar but not Augustus use the query:
"iulius caesar" NOT augustus
Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:
NOT augustus
- +
The "+" or required operator requires that the term after the "+" symbol exist somewhere in the document. In other words, the "+" operator limits the search to the documents containing the term.
To search for documents that must contain caesar and may contain iulius use the query:
iulius +caesar
- -
The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.
To search for documents that contain Iulius Caesar but not Caesar Augustus use the query:
"iulius caesar" -"caesar augustus"
Grouping
IntraText search supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.
To search for either Iulius or Caesar and Gallia use the query:
(iulius OR caesar) AND gallia
This eliminates any confusion and makes sure you that Gallia must exist and either term Iulius or Ceasar may exist.
Escaping Special Characters
IntraText search supports escaping special characters that are part of the query syntax. The current list special characters are
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
Technology
IntraText search is based on Èulogos Progressive Search and Apache Lucene.
|