IntraText Digital Library
Home   Map   Catalogue   Updates   Download   Info   IXT format   Privacy   Copyright   References   Contributors   Newsletter   Contacts  
Full text search
Help


Case-insensitive

IntraText search is case-insensitive: searching for "CAESAR", "Caesar" or "caesar" is exactly the same and gives the same results.


Default boolean operator

The AND operator is the default conjunction operator in IntraText search.
This means that if there is no Boolean operator between two terms, the AND operator is used.
Example: the query   iulius caesar   is the same of   iulius AND caesar.

See further in this page for more details.


Accent-sensitive / accent insensitive

IntraText search can be accent-sensitive or accent-insensitive:
  • in accent-sensitive search (default), ancora, àncora and ancóra are three different words,
    i.e. accents and diacritics are relevant: searching for ancora DOES NOT FIND àncora;

  • in accent-insentitive search (use the checkbox), ancora, àncora and ancóra are the same,
    i.e. accents and diacritics are ignored: searching for ancora FINDS àncora as well.
    In analysis results, text extracts have characters normalized to characters without accents and diacritics.


Terms

A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases:
  • A Single Term is a single word such as Augustus or manus.

  • A Phrase is a group of words surrounded by double quotes such as "Iulius Caesar".

  • Multiple terms can be combined together with Boolean operators to form a more complex query.


Term Modifiers

IntraText search supports modifying query terms to provide a wide range of searching options.
  • Wildcard Searches
    IntraText search supports single and multiple character wildcard searches.
    To perform a single character wildcard search use the "?" symbol.
    To perform a multiple character wildcard search use the "*" symbol.
    The single character wildcard search looks for terms that match that with the single character replaced.
    For example, to search for manus, minus or munus you can use the search:

       m?nus

    Multiple character wildcard searches looks for 0 or more characters. For example, to search for Caesar, Caesaris or Caesarianum, you can use the search:

       caesar*

    You can also use the wildcard searches in the middle of a term.

       c*sar

    Note: You cannot use a * or ? symbol as the first character of a search.

  • Fuzzy Searches
    IntraText search supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to aqua use the fuzzy search:

       aqua~

    This search will find terms like aquae and acqua.
    An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

       aqua~0.8

    The default that is used if the parameter is not given is 0.5.

  • Proximity Searches (also known as NEAR operator)
    IntraText search supports finding words that are within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a Caesar and Roma within 10 words of each other in a document use the search:

       "caesar roma"~10


Booelan Operators

  • OR
    The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol "||" can be used in place of the word OR.
    To search for documents that contain either Iulius Caesar or Augustus or both, use the query:

       "iulius caesar" OR augustus

  • AND
    The AND operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the AND operator is used.
    The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.
    To search for documents that contain Iulius Caesar and Augustus use the query:

       "iulius caesar" AND augustus

    or

       "iulius caesar" augustus

  • NEAR: see Proximity Searches

  • NOT
    The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.
    To search for documents that contain Iulius Caesar but not Augustus use the query:

       "iulius caesar" NOT augustus

    Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:

       NOT augustus

  • +
    The "+" or required operator requires that the term after the "+" symbol exist somewhere in the document. In other words, the "+" operator limits the search to the documents containing the term.
    To search for documents that must contain caesar and may contain iulius use the query:

       iulius +caesar

  • -
    The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.
    To search for documents that contain Iulius Caesar but not Caesar Augustus use the query:

       "iulius caesar" -"caesar augustus"


Grouping

IntraText search supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.
To search for either Iulius or Caesar and Gallia use the query:

   (iulius OR caesar) AND gallia

This eliminates any confusion and makes sure you that Gallia must exist and either term Iulius or Ceasar may exist.



Escaping Special Characters

IntraText search supports escaping special characters that are part of the query syntax. The current list special characters are

   + - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use the \ before the character. For example to search for (1+1):2 use the query:

   \(1\+1\)\:2




Technology

IntraText search is based on Èulogos Progressive Search and Apache Lucene.

Best viewed with any browser at 800x600 or 768x1024 on touch, multitouch and tablet devices
The IntraText® Digital Library - Some rights reserved by EuloTech SRL - 1996-2012. Content in this page is licensed under a Creative Commons License
Last updated: 2012.01.03