Reveal Help Center

Search & Indexing Module

The Search & Indexing module is unlike other modules as it has two ribbon-level tabs that separate functionality within the Module for Search and Indexing. The Search tab creates, manages, and deletes search terms, whereas the Indexing tab creates, manages, and deletes indexes within the project. Search terms can be organized within a Search Group after a Search Group is created. The Search Group is represented in the Search Module Navigation with the 60391ae70b6cc.png icon.

Enabling Search or Indexing Tab
  1. Search – To create, manage, and delete Keyword and/or Concept searches, click the Search tab.

    60391aea3b1aa.png
  2. Indexing – To create, manage, and delete Indexes, click the Indexing tab.

    60391aed28c47.png

All counts displayed within the Search tab are total counts meaning they have both original and duplicate files included in the counts. There are two different types of counts within the Search Module: Doc Count and Family Count.

  • Doc Count is the result of a term on a file level.

  • Family Count is the result of a term on an entire family.

For example, the Search term ‘document’ may be responsive to the attachment of an email or an embedded object of an efile, but not the parent email or efile itself. In this example, the Doc Count will be 1 and the Family Count will be 2.

Only original or unique files are added to an Index. All counts displayed within the Indexing tab are unique counts meaning they have only original files included in the counts.

Running Keyword Terms

There are two types of searches that can be executed in the Search Module: Keyword and Concept searches. Keyword Terms use dtSearch to return files via search types like Boolean, proximity, wildcard, stemming, etc. Keyword Terms automatically update at the end of the Import process, which updates the counts in the Search Module. Run the Keyword Term Hits Report on the new imports to see the effect on the Keyword Terms. Note the collapsed Search & Index Module ribbon to increase work space in the illustration below.

60391af07f5ff.png
  1. New Search Group – In the Search tab choose New Search Group to collect Keyword Search Terms into a Keyword Search Group. Terms entered using Add Keyword Terms (Item 5 below) are added to the selected group if Add to Group is checked.

  2. Keyword – Choose the Keyword tab to add and run Keyword Terms.

  3. Add Keyword Terms – Search allows for literal, wildcard, proximity and fuzzy searches. All Keyword Terms must be written using the appropriate syntax, and each term must be entered on a separate line in the Add Keyword Search box. To view the Search Syntax guide, click the 60391af2c6f25.png link or see APPENDIX F - dtSearch Syntax Guide. There are two ways to add a Keyword Term to the project:

    • Typing a Keyword – Type a freeform Keyword Term, and click the Add Keyword Terms button. To add multiple Keyword Terms, type a term, hit the Enter key, type in a term, hit the Enter key, etc. (repeat as many times as necessary) and click the Add Keyword Terms button.

    • Dragging and Dropping a List of Terms – Create a list of Keyword Term(s) by typing the term(s) into a text file using the same method as above of one term per line, and drag and drop the text file into the Keyword tab and click the Add Keyword Terms button.

      • Add Term To Group – To add the Keyword Term(s) to a Keyword Group, check 60391af55fec6.png and choose the Keyword Group and the terms will be automatically added to the target Keyword Group as well as the ALL TERMS Keyword Group.

    Note

    By default, Reveal Discovery Platform indexes and searches the fields FULLTEXT, SENDER, RECIPIENTS, TO, FROM, CC, BCC. The sender and recipient email address fields contain both the display name and the fully qualified email address. Because of this it is possible that a Keyword Search Term will hit on one of the email address fields and the fully qualified email address will not visible in the extracted text (FULLTEXT). To only search the extracted text, use the syntax //text contains (<Term>). This is the only fielded search that requires the // syntax in the fielded search. Alternatively, within the Project Settings, the sender and recipient fields can be excluded from the dtSearch Index leaving only the FULLTEXT.

    Search syntax guidance in this module applies only to dtSearch. Different indexing engines may require different specification syntax for field searches.

  4. Keyword Search Terms Table – After the Add Keyword Terms button is clicked, the Keyword Term(s) are displayed in the Keyword Search Terms table. The Keyword Search Terms table has six columns in addition to sequentially-assigned ID:

    • Term – This is the Keyword search term that was added to the Keyword Terms table.

      • Term Derivatives – All Keyword Terms are displayed with a tree view. Once expanded, the tree view shows all derivatives for the parent term, for example, counts for individual connected terms or expansions of a wildcard term. The Doc Count for the parent term is the combination of the derivative’s Doc Hits combined with the given operation. It is likely the parent term’s Doc Count will not equal the sum of the derivatives' Doc Hits counts, as several derivatives may exist within one file.

    • Doc Count – This is the number of files responsive to the Keyword Term.

    • Family Count – This is the total number of files within a family one or more of whose members are responsive to a Keyword Term. For email, Doc Count and Family Count may be different depending on the situation. For example, the Keyword Term ‘document’ may be responsive to the attachment of an email but not the email itself. In this example, the Doc Count will be 1 and the Family Count will be 2.

    • Uniqueness – This is the number of files that uniquely and only hit on the particular Keyword Term with no other overlapping Keyword Terms responsive to the file. This means that if this Keyword Term were deleted from the case, these unique files would be removed from the responsive set. This is calculated on the document level.

    • Inclusiveness – This is percentage of Doc Hits/Indexed Files. If the percentage is high for a particular Keyword Term, that term may be over inclusive and need to be revised. This is calculated on the document level.

    • Group Membership – This is the Keyword Group(s) to which the term has been assigned.

  5. Keyword Search Groups – This table lists the totals for all terms within all defined Keyword Search Groups. The table displays three columns:

    • Group Name -- The name given the Keyword Search Group before Keyword Search terms were added.

      • Group Terms -- All Keyword Search Groups are displayed with a tree view. Once expanded, the Keyword Search Terms for the Group are displayed in this sub-table with Doc Count and Family Count.

    • Doc Count – This is the total number of files responsive to the Keyword Terms in the Keyword Search Group.

    • Family Count – This is the total number of files within a family one or more of whose members are responsive to a Keyword Term in the Keyword Search Group. As noted above, Doc Count and Family Count may be different depending on members of a family having or not having one or more of the terms in the group.

Running Concept Searches

Concept Terms use the Conceptual Index to return conceptually similar files. A concept is a theme or idea expressed throughout a set of documents. An entire file can be used to run as a Conceptual Search. The Conceptual Search seeks to return all files that are within a .5 minimum threshold of one or more training files within the Conceptual Index to which the file is mapped (to see more about training files, please see the Analytics Module, as well as Appendix B).

As seen in the image below, a Concept Search is like a dart board. The center, or the bullseye is a training file within the Conceptual Index. The higher the similarity score, the closer the file is to the center. A file with a .9 score is more similar to the training file it is mapped to but will return a smaller population of files. A search with .5 similarity will return more files as it is less similar or relevant to the training file. Even though Conceptual Terms can be created in the Discovery Manager, it is more likely that this functionality would be leveraged in the Decision Engine ECA. Unlike the automatic workflow of Keyword Terms, Concept Terms only update when the Conceptual Index is rebuilt in the Analytics Module through a user initiated Analytics Job.

60391af84a290.png
60391afcd369d.png
  1. Search – Choose the Search tab to search the project.

    Note

    The Search & Indexing Module ribbon has been collapsed to show greater screen detail.

    Use New Search Group button here to add a New Concept Search as discussed below.

  2. Concept – Choose the Concept tab to add a Concept Search.

  3. Add Concept Search – The following options can be applied when running a Concept Search term(s). Choose the appropriate options and click the Add Concept Search button to add the term(s) to the Concept Search table.

    • New Concept – The default option is New Concept, which allows users to provide a name for a new Concept Search, as the Concept Search term is based off one or more files instead of a single search term. Each file that is applied to the new Concept Search is an example file within the search.

      • Examples – There are two ways to add examples to a new or previous Concept Search term(s) within the Project, which can be change by clicking the Examples Search Type drop-down:

        • Selective Set – This default option uses all files within a Selective Set as example files for the new or existing Concept Search. To run this type of search choose the Selective Set within the table or grid, and click the Add Concept Search button. See Selective Set Module for more information.

        • FileID List – To run this type of search copy and paste a list of FileIDs (place a hard return for each new FileID) into the Search window, and click the Add Concept Search button.

    • Add To Concept – Since files are used as the search criteria and a name is applied to the Conceptual Search, it may be necessary to add more files to the Concept Search at a later point in time. To facilitate this, click the Add To Concept option, and choose the previously created Concept Search.

      Note

      Each file that is added to a new or existing Concept Search will become an example file. These files are individually run as Concept Searches with a minimum threshold of 0.50. As a result, the files returned by a Concept Search will have an associated threshold or similarity score that helps in determining how closely related the file is to a given search. A file within a concept result can have a concept score ranging anywhere from 0.50 to 1.0. A file with a score of 1.0 would be an exact match, a file with a score of 0.60 would be somewhat conceptually related. The Closest Concept is the Concept Search that is the most conceptually similar to the file in the Concept Search result. This value is helpful in quickly seeing the concept that hits with the strongest score.

    • Add to Group – To add the Concept Search terms to a Concept Search Group, click this checkbox and choose the target group to which the term(s) will be applied. You may create a group using the New Search Group button on the Search & Indexing Ribbon.

  4. Concept Search Table – After the Add Concept Search button is clicked, the term(s) are displayed in the Concept Search table. The Concept Search table has 6 columns:

    • ID – The Concept Search term’s ID.

      • Examples – All Concept Search terms are displayed with a tree view. Once expanded, the tree view shows the individual or examples files for the Concept Search. These values can be very helpful for identifying additional search terms of interest.

    • Concept Name – The name of the Concept Search.

    • Doc Count – This is the number of files responsive to the Concept Search.

    • Family Count – This is the total number of files within a family where one or more of its members is responsive to the Concept Search. For Email, Doc Count and Family Count can be different depending on the situation. For example, the Concept Search term ‘ConceptSearch01’ may be return the attachment of an email but not the email itself. In this example, the Doc Count will be 1 and the Family Count will be 2.

    • Example Count – The total number of the individual or examples files for the Concept Search.

    • Group Membership – This is the Search Group(s) the term belongs to.

  5. Concept Groups – This table lists the total documents within all defined Concept Groups. The table displays three columns:

    • Group Name -- The name given the Concept Search Group.

      • Group Terms -- All Concept Search Groups are displayed with a tree view. Once expanded, the Concepts for the Group are displayed in this sub-table with Doc Count and Family Count.

    • Doc Count – This is the total number of files contained in the Concept Search Group.

    • Family Count – This is the total number of files within a family one or more of whose members is contained in a Concept in the Concept Search Group. As noted above, Doc Count and Family Count may be different depending on members of a family being directly referenced with the group.

After Running Searches
60391aff4f5f2.png
  1. Refresh – Click the Refresh button to refresh the Search tab to show new Search Terms and Search Groups created/removed on different machines in a distributed environment, as well as to update the Search Group’s statistics.

  2. Delete Selected Terms – Select the Search Term(s) from the Keyword/Concept Search table that need to be deleted and click the Delete Selected Terms button. Optionally this can also be done via a right click menu after selecting the term(s).

  3. New Search Group – A Search Group is a simple way to combine Keyword or Concept Searches. To create a Search Group, click the New Search Group button and fill out the New Group form, click OK, and the 60391b00eb97a.png icon will appear in the Search Module, and the Keyword Search Groups section within the Module Form.

  4. Assign Terms to Group – To add one or more terms to a Search Group click the checkbox(es) next to the term(s), click the Assign Terms to Group button, and choose the target Search Group. The Doc Count for the Search Group is the combination of the term’s Doc Count combined with the OR operator.

    Note

    By default, all Search Terms added to the project will be added to the ALL TERMS KEYWORD/CONCEPT GROUPS.

  5. Search Module Navigation – The Search Module Navigation displays the various Keyword/Concept Search Groups. Each Search Group has an icon and has a tree view which displays the following counts:

    • Term Count – The total number of Search Terms assigned to the Search Group.

    • Doc Count – The total individual files responsive to the Search Terms within the Search Group.

    • Family Count – This is the total number of files within a family when one or more of its members is responsive to a Search Term. For email, Doc Count and Family Count can be different depending on the situation. For example, the Keyword Term ‘document’ may be responsive to the attachment of an email but not the email itself. In this example, the Doc Count will be 1 and the Family Count will be 2.

  6. Delete Search Group – To delete a Search Group, first click on the Keyword/Concept Group in the Search Module Navigation, and then click the Delete Search Group button in the Search Ribbon.

    Note

    This will only delete the Keyword/Concept Group from the project, but will not delete the Search Terms from the project.

  7. Launch to Preview – A Preview allows a user to see the files that are responsive to the chosen Search Term(s). To preview the results of a Search Term(s), select one or more Search Terms from the Keyword/Concept Search table, click the Launch To Preview button, and choose either Document or Family Level. To see more information about using Previews, please see Appendix G.

Creating Indexes

Every time an Indexing Job is created with a scope, Discovery Manager will only pull back the original files that are within the scope that have not been indexed in any prior Indexing Jobs. When creating Indexing Jobs, the user will choose the scope of files to create the Index, click the Launch Indexing Job button, and an Indexing Job will be created and sent out to the Discovery Agents. The Indexing Job can be monitored in the Search & Indexing Module and the Environment Module.

60391b0316417.png
  1. Indexing – Choose the Indexing tab to create an Index.

  2. Index Scope – There are three scopes that can be used to create an Index for the project:

    • Project – If no checkbox is selected in Imports or Selective Sets and the Launch Indexing Job button is selected, the system will look across the entire project to see if there are any files available for indexing.

    • Imports – To create an Index from one or more Imports, select the checkbox next to the applicable Import(s), and click the Launch Indexing Job button.

    • Selective Sets – To create an Index from one or more Selective Sets, select the checkbox next to the applicable Selective Set(s), and click the Launch Indexing Job button.

  3. Launch Indexing Job – To launch an Indexing Job to the Discovery Agents, select the Index Scope, and click the Launch Indexing Job button.

    Note

    The Reveal Discovery Manager uses only accent-insensitive indexes. This is done so that the same keyword term does not need to be added with and without accents to be a search hit. For example the Keyword Search of ‘uber’ would return ‘uber’ and ‘über’.

  4. Monitoring Indexing Jobs – Indexing Jobs can be monitored in the Indexing tab by clicking the Refresh button, or within the Environment Module.

After Creating an Index
60391b06d3999.png
  1. Refresh – Click the Refresh button to refresh the Indexing tab to show the current counts.

  2. Project Indexes – To delete or update an Index, select the checkbox next to the Index(s) and choose the applicable button in the Indexing Ribbon. The Project Indexes table contains the following values:

    • Index ID – The ID of the Index within the project.

    • Index Status – The status of the Index. If the Index Status is ERRORED, the Index should be deleted and a new Index should be built for the applicable scope(s).

    • Index Scope -- The content selected in generating the Index, either PROJECT, IMPORTS or SELECTIVE SETS.

    • Index Type – There are two different Index Types of EXTRACTED TEXT and OCR. While both extracted text and OCR text can be added to the dtSearch Index and be made searchable, this is done through different process, thus they are separated as different Index Types.

    • Actual Count – This is the actual number of items added to the dtSearch Index. If Actual Count does not equal Expected Count, the Index should be deleted, and a new Index should be built for the applicable scope(s).

    • Expected Count – This is the expected number of items that should be added to the dtSearch Index. If Actual Count does not equal Expected Count, the Index should be deleted, and a new Index should be built for the applicable scope(s).

    • Fragmentation – Fragmentation of an Index increases the size of the Index and slows searching, but the effect is generally not noticeable unless the fragmentation is severe. If the fragmentation of an Index is high and search results are taking a long time to complete, the Index should be deleted and a new Index should be built for the applicable scope(s).

    • Job ID – The Distributed Job ID of the Index.

  3. Delete Index – To delete an Index from the project select the checkbox next to the Index, and click the Delete Index button. Any file(s) deleted from the Index may be part of a future Index Scope and will be available for indexing. Note that null indexes from the prior illustration (having 0 documents to index) have been deleted here.

  4. Update Index Properties – To update Actual Count and Fragmentation to the most current states for one or more Indexes, select the Index(es) and click the Update Index Properties button.