Reveal Help Center

Datasets and Connectors

Datasets
Concept Topics

Datasets Screen

After clicking the Administration option in the user menu, the Datasets management screen will open by default. The Datasets management screen:

Datasets_screen_map.png

Callout 1: Manage datasets, users and groups, connectors, services, and portable models.

Callout 2: Search for a dataset name.

Callout 3: Add a new dataset to Brainspace.

Callout 4: Download a dataset management report.

Callout 5: View the dataset’s name, activity status, and identification number.

Callout 6: Manage dataset settings, download dataset reports, disable the dataset, manage tags, and open the dataset in the Analytics Dashboard.

Callout 7: View the dataset’s connector type, name, data source, number of documents in the dataset, dataset groups, the percentage of documents ingested incrementally, and build status.

Callout 8: View the datasets list by activity status.

Datasets Display Screen

When you log in to Brainspace, the Datasets display screen will open. The Dataset display screen includes the following features:

Datasets_display_screen.png

Callout 1: Click the Brainspace logo on anywhere in Brainspace to open the Datasets display screen.

Callout 2: Type a dataset name in the text field to locate a specific dataset.

Callout 3: Click the Hide icon to view only unpinned datasets.

Callout 4: View the number of pinned datasets in Brainspace.

Callout 5: Click a pinned dataset card to open the dataset in Analytics.

Callout 6: View the number of unpinned datasets in Brainspace.

Callout 7: Click the Hide icon to view only pinned datasets.

Callout 8: Click the unpinned dataset card to open the dataset in Analytics.

Callout 9: Hide datasets with no new documents.

Callout 10: Click the Sort by... dropdown menu to sort pinned datasets by name, by the number of new documents, or by total document count. Click the Filter by... dropdown menu to filter datasets by dataset activity status.

Callout 11: View pinned datasets in the card view or list view.

Callout 12: Click the Sort by... dropdown menu to sort unpinned datasets by name, by the number of new documents, or by total document count. Click the Filter by... dropdown menu to filter datasets by dataset activity status.

Callout 13: View unpinned datasets in the card view or list view.

Unpinned Dataset Card

After you create a new dataset, a dataset card will be added to the unpinned Datasets pane. A dataset card includes the following features:

Unpinned_Dataset_Card.png

Callout 1: Identify a dataset by name.

Callout 2: View a dataset’s status.

Callout 3: View the total number of documents in a dataset.

Callout 4: View the number documents that have been added to a dataset via incremental builds. This number is reset when a full build is completed.

Callout 5: Open a dataset in the Analytics Dashboard.

Callout 6: Move an unpinned dataset to the Pinned Datasets pane. After moving a dataset to the Pinned Datasets pane, the Unpin icon will display:

Pinned_Dataset.png

Dataset Settings Dialog

Group Admin and Super Admin

Dataset_settings_dialog.png

Callout 1: Edit the dataset’s name.

Callout 2: Add the dataset to or remove it from existing Brainspace groups. Callout 3: View the Dataset Info dialog.

Callout 4: Delete the dataset. This will remove the dataset and all the work product such as saved searches, classifiers and notebooks from the system forever.

Callout 5: View the data source connector status associated with this dataset. (Status may be empty for “no assigned data source,” “Prepared” for a fully completed data source that has had full field map and associated questions answered, and “Incomplete” for a data source connector that has been chosen, but hasn’t been fully completed (such as not having done the field mapping yet).

Callout 6: Reconfigure the data source.

Callout 7: Remove the data source from the dataset.

Callout 8: View last build date and time, scheduled builds, total deployed documents, incremental documents, and dataset creation date and time.

Callout 9: Modify advanced configuration options.

Callout 10: Enable or disable automatic builds.

Callout 11: Enable or disable automatic overlays to the data source after every build.

Dataset Build Steps

Ingest

As documents are ingested, Brainspace handles the interface to third-party products and streams the data into batchtools as json-line format. The ingestion process takes the raw text as provided for all fields and produces the document archive directory.

stream-json-line

Intermediate files exist in the following working directory:

<buildFolder>/archive/working/archive001.gz.

At the end of the ingestion process, the archive directory contains raw text and all metadata transferred in *.gz files, and the <buildFolder>/archive/output/archive001.gz subdirectory is only populated at the end of successful ingestion.

Document ingestion errors will be captured in the following folder: <buildFolder>/importErrorLog.

Analysis

Analysis includes the following high-level steps:

  1. Create Filtered Archive

  2. Boilerplate

  3. Exact Duplicate Detection and Email Threading

  4. Processed Archive

  5. Near Duplicate Detection

  6. Archive to TDM (Term Document Matrix)

  7. Build TDM Output

  8. De Dup TDM

  9. TDM Build

  10. Clustering

  11. Build TDM Output and Clusters

  12. Build doc-index

  13. Graph Index

  14. Generate Reports

Create Filtered Archive

Create filter archive includes one step—filter archive.

The filter archive step will apply the schema, filter strings that were filtered from filtered text, and remove Bates numbers.

Note

Text or fields not set to analyzed=”true” do not go into filtered archive.

filter archive

By default, filter archive removes soft hyphens and zero width non-breaking space, removes HTML markup, and removes all email headers. Bates numbers may be removed from all text via configuration files at the

command line interface (CLI). This step will decode Quoted Printable encodings (https://en.wikipedia.org/wiki/Quoted-printable).

This step removes filter strings. By default, this is mostly partial HTML markup. Custom filter strings can be set in the Brainspace user interface.

Boilerplate

Boilerplate includes the following steps:

  1. boilerplate hashing

  2. boilerplate counting

boilerplate hashing

For speed and efficiency, all lines of the bodyText field are analyzed and assigned a mathematical hash value. Common hashes are considered as candidates for boilerplate.

boilerplate counting

Lines of text identified as boilerplate candidates are given a second pass to determine if the text matches all requirements of boilerplate.

Full Boilerplate

Full boilerplate reports the duration of the boilerplate hashing and counting steps.

Exact Duplicate Detection and Email Threading

This step identifies exact duplicates and email threads.

The email threading step identifies near duplicates and email threads. Email threading works from documents in the filtered archive. Conversation Index (if valid) will supersede email threading from the document text. If Conversation Index is not present or not valid (see Data Visualizations), email threading will attempt to construct threads based upon document content/overlap (see shingling). If the dataset contains the parentid, attachment, and/or familyid field, the attachment’s parent-child relationships will be determined by those fields.

During this step, documents are shingled, and determination is made about any one document being a superset of another document (containing all of the other documents’ shingles).

Exact duplicate detection occurs here, utilizing the filtered archive. Subjects are normalized. For example, Re: and Fwd: are stripped.

Processed Archive

Processed archive includes one step—create processed archive.

The create processed archive step uses the outputs from the boilerplate step to remove boilerplate from the filtered archive. System will construct internal data to track what boilerplate was removed from each document. Words and phrases are counted and truncated/stemmed. If enabled, entity extraction occurs in this step. The processed documents will go into the processed archive.

Near Duplicate Detection

Near duplicate detection includes one step—ndf.

Near duplicate detection uses the processed archive to determine how many shingles two documents have in common and to identify them as near duplicates if they have enough of the same shingles. By default, 80 percent of shingles in common will identify two documents as near duplicates.

Archive to 1DM

Archive to TDM includes one step—arc-to-tdm.

arc-to-tdm

During the archive to TDM (Term Document Matrix) step, the processed archive will have stop words applied, determine the likely features (terms/words/phrases) and use that vocabulary to build the token TDM. Parts of speech are determined and utilize those and NLP against the detected language to assemble phrases that are meaningful and useful for our supported languages.

Various TDMs are generated for different purposes. For example, the Cluster TDM has a different threshold for content than the threshold for Brain TDM.

Brains and clusters will only use analyzed body text.

The Predictive Coding TDM will use any metadata fields that have the setting of analyzed=”true”.

Build 1DM Output

Build TDM output incudes one step—build-tdm tdm.

build-tdm tdm

In this step, build-tdm tdm creates a super-matrix of all terms by all documents (may be more than one word per term).

De Dup 1DM

De dup TDM includes one step—create-deduped-tdm. This TDM is used for the Cluster Wheel visualization.

create-deduped-tdm

In this step a TDM is built from documents identified as “Uniques” and “Pivots” (collectively called “Originals”).

1DM Build

TDM build includes the following steps:

  1. build-tdm tdm-deduped

  2. check-deduped-tdm-size

  3. build-tdm tdm-extended

build-tdm tdm-deduped

This step builds the full TDM (Term Document Matrix) without Near Duplicates.

check-deduped-tdm-tdm-size

This step does a simple sanity check on the size of the TDMs created at this point in the process.

build-tdm tdm-extended

This step creates a full TDM with metadata.

Clustering

Clustering includes the following steps:

  1. cluster tdm-deduped

  2. cluster-ndf

cluster tdm-deduped

Clustering of Uniques, Near Duplicates Pivots, and any Exact Duplicates Pivot that is not a Near Duplicate is performed around the deduped tdm.

cluster-ndf

This step adds Near Duplicates and Exact Duplicates to the Cluster Wheel.

Building TDM Output and Clusters

split tdm

During this step, the system will determine if we need more than one Brain.

build-tdm Root

The system will have one build-tdm step for each Brain. If there is only one Brain, it will be named Root. If there are multiple Brains, each Brain will be assigned a name that describes the terms it contains. Brains will be in alphabetical order.

build-brains

The build-brains step is where we build singular value decomposition.

Build doc-index

Build doc-index includes the following steps:

  1. Index documents

  2. index excludes docs

  3. all-indexing

Index documents

During this step, the documents in the processed archive are indexed to make the content keyword-searchable.

index exclude docs

During this step, the documents excluded from the processed archive are indexed to make the content keyword-searchable.

all-indexing

During this step, a summary is created of the duration for the index documents and index excluded documents steps.

graph index

Graph index includes one step—graph-data.

graph-data

The graph-data process builds the data used for the Communications analysis visualization.

generate reports

This step generates the final report.

Post-build Output Directory

At the end of the build process, the following files are copied to the output directory:

  • <buildFolder>/config/schema.xml

  • <buildFolder>/config/fieldMap.xml

  • <buildFolder>/config/language.xml

  • <buildFolder>/status.csv

  • <buildFolder>/process.csv

  • <buildFolder>/archive.csv

  • <buildFolder>/reports/*

The following file is moved to the output directory: <buildFolder>/doc-index.

Stop Words

Brainspace contains a list of standard stop words for each language supported by Brainspace (see Brainspace Language Support). Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

Note

Brainspace identifies languages for a document and then applies language-specific stop words to documents. The Common stop-words list is empty by default. You can create a custom stop-word list and upload it to Common if you want certain stop words to be applied to all languages. For example, Brainspace does not provide a stop word list for Estonian. If a you have a large Estonian population, it might be useful to upload an Estonian stop-word list to Common; however, any tokens that overlap with other languages will be applied to those languages as well. For example, if the word “face” is a stop word in Estonian, that word will be stopped in English documents as well.

Shingling

A common data mining technique used to reduce a document to a set of strings to determine if a document is a Near Duplicate in a dataset. A document with x-shingle is said to be all of the possible consecutive sub-strings of length x found within it. For example, if x=3 for the text string "A rose is a rose is a rose," the text will have the following shingles: “A rose is,” “rose is a,” “is a rose,” “a rose is,” “rose is a,” “is a rose.” After eliminating duplicate shingles, three shingles remain: “A rose is,” “rose is a,” “is a rose.”

Task Topics

Add Custom Stop Words

Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

To add custom stop words:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Settings icon:

    Select_Settings_for_Dataset.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. Click the Upload icon associated with the language:

    Advanced_Configuration-Upload.png
  5. Navigate to the *.txt file, and then click the Open button.

  6. Click the Apply button

    The Advanced Configuration dialog will close.

  7. In the Dataset Setting dialog, click the Save button.

  8. When the Dataset Settings dialog refreshes, click the Build button.

After the build completes, the new stop words will be included in the dataset.

Download Stop-Word Text Files

Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

The *.txt files associated with each language can be downloaded directly from the Brainspace user interface.

To download the stop word *.txt file for a language:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Settings icon:

    Select_Settings_for_Dataset.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. Click the Download icon associated with the language:

    Advanced_Configuration-Download.png

The stop word *.txt file will download to your local machine.

Modify Dataset Settings and Advanced Configuration Options

When creating a dataset or any time after creating a dataset, you can upload and download dataset-wide filter words, set email threading and boilerplate properties, select optional analytics, and manage languages and stop words.

Note

You must have Group Admin or Super Admin credentials to modify dataset settings.

To modify a dataset’s settings and advanced configuration properties:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Do one of the following:

    • For an existing dataset, locate the dataset, and then click the Settings icon:

      Select_Settings_for_Dataset.png
    • For a new dataset:

      1. Click the Add Dataset button. The New Dataset dialog will open.

      2. In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups.

      3. Click the Create button.

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. After setting a dataset’s advanced configuration options, click the Apply button.

For information on the different options available in the Advanced Configuration dialog, click the help (?) icon associated with each option.

Download Dataset Reports

Brainspace provides a number of different reports for each dataset (see Dataset Reports). To download Brainspace reports:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Download Reports icon:

    Dataset_Download_Reports.png

    The Report menu will open.

  3. Choose a report in the list, and then click the Download button.

The report will download to your local machine.

Download a Brainspace Datasets Report

To download a dataset management report:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Download button:

    Dataset_Download_button.png

The dataset management report *.csv file will download to your computer.

Create a Dataset with a Relativity Plus Connector

After configuring a Relativity OAuth client and creating a Relativity Plus connector, you are ready to create a dataset.

Note

This topic describes how to create and manage a Relativity Plus connector for Relativity v9.7 and newer versions of Relativity.

Note

After creating a dataset with a Relativity Plus connector, you cannot change the dataset’s connector to a legacy Relativity connector.

To create a dataset with a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Add Dataset button:

    Add_Dataset.png

    The New Dataset dialog will open.

  3. In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups:

    Enter_Dataset_Name.png
  4. Click the Create button.

    The Dataset Settings dialog will open.

  5. Click the Choose Connector button:

    Choose_Connector.png

    The Choose Connector dialog will open.

  6. In the Choose Connector dialog, click the appropriate Relativity Plus connector in the list.

    Note

    If you are not logged in to Relativity, you will be prompted to enter your Relativity username and password before you will be able to add the connector. If you are already logged in to Relativity, the Select a Source dialog will open.

  7. In the Select a Source dialog, click the appropriate Relativity Workspace folder, and then click the Save and Proceed button.

    The Relativity Saved Search dialog will open.

  8. To create the dataset using all documents in the Relativity Workspace, click the Proceed button.

    Note

    To create the dataset using a subset of documents, click a Saved Search in the list before clicking the Proceed button. After clicking the Proceed button, the License Checks dialog will open and run a check on your license document limits.

  9. After Brainspace verifies the license document limits, click the Proceed button. The Field Mapping dialog will open.

  10. Unmap the any of default fields, if necessary, map any additional fields, and then click the Continue button.

    The Dataset Settings dialog will refresh.

    Note

    For information about mapping fields, see Field Mapping Categories and Definitions.

  11. Verify the settings, click the Save button, and then click the Build button. The Dataset Build Options dialog will open.

  12. Choose a build option, and then click the Run This Build Type button. The Schedule Build dialog will open.

  13. Click the Build as soon as possible button.

    Note

    If you choose to build the dataset in the future, click the Schedule Build Time field, select a date and time, and then click the Save button. The Datasets screen will refresh and show the Dataset Queue build in progress. While the build is in progress, you can click the View Status button to view the build steps in progress. For information on each step in the build process, see Build Steps.

After the build completes successfully, the new dataset will move from the Dataset Queue to the list of active datasets in the Datasets screen, and you are ready to create work products with it.

Reference Topics

Dataset Reports

Aliases Report

Provides a list of all the email address aliases within the dataset. (This is generally used by Brainspace, and isn’t a particularly useful report for users. Brainspace recommends using the Person report for alias listings.)

Archive Report

Detailed report of the most recent import or transfer of data.

Batch Tools Version Report

Contains detailed information regarding which Batch Tools version was used to create the dataset,

including hostname, mac address, and PID information, as well as history for each incremental build or full build.

Boilerplate Report

Provides a list and occurrence count of all the unique boilerplate text identified during ingestion.

Build Error Log

Provides a detailed log of all the build errors encountered during ingestion.

Build Log

Provides a complete detailed log of all the ingestion steps during the build process.

Clusters Content

Lists all of the document IDs (for example, Control Numbers) for the ingested documents and maps them to a leaf cluster ID.

Clusters File

Contains the following cluster tree information: Cluster ID, Parent Cluster ID, Count of Documents in Cluster, Intra-cluster Metric, Cluster Type, and Folder Name.

Document Counts

Provides summary document count statistics for the dataset including how many documents were fed into Brainspace for ingestion, how many were ingested, how many were skipped, number of originals, exact duplicates, near duplicates, etc.

Extended Full Report

Includes all of the overlay fields and values from the Full Report and additional language detection fields BRS Primary Language and BRS Languages.

Full Report

Includes all of the overlay fields and values which can be overlaid into a Third Party system such as Relativity either manually through the Relativity Desktop Client or automatically by enabling Overlay within the Configuration screen within the Dataset Settings tab.

Import Error Archive

Compressed file that contains one or more of the files that failed to import.

Ingest Error Details

Text report containing more details about the errors in the Ingest Errors report.

Ingest Errors

*.csv report containing errors that occurred during ingestion with the location of the documents that caused the error.

Person Report

List all of the “Persons” automatically or manually created (via People Manager) along with the email addresses (aliases) associated with each person.

Process Report

Summary of the most recent dataset analysis.

Schema XML

The field mapping done via the interface is stored in this file and used to ingest the all of the mapped metadata and text.

Status Report

Summary of the most recent dataset analysis.

Vocabulary File

List of all the unique terms and phrases identified within the set of data during ingestion.

Common Options for Field Mapping

Use for Exact Duplicate

Ticking this checkbox will make this field part of the definition of exact duplicate. Two documents will only be considered as exact duplicates if the analyzed text fields, this field and all other fields that have this selected are the same. Examples would be “Sent Date,” “From,” and “Subject.”

Faceted

Ticking this checkbox will make this field available for display and search in the faceted field column of the Dashboard. If this field is a Date field, then ticking the checkbox will make it available in the Timeline display of the Dashboard.

Add Exact Text

will create a sibling field with an “-exact” extension to the name, and when searching that field, it will not be stemmed.

For example, a field called, “Highlights.” When searching for “indices” in that field, documents having “indices” and also “indicates”, and all other forms of that root.

If “Add Exact Text” is checked, then during a build, a field called “Highlights” will be created and a field called “Highlights-exact.” Searching the latter will return only documents that match the exact term.

Multi-value Separator

Used to provide a non-default delimiter to Brainspace to be used to divide a metadata field into separate values. For example, if a field has the value “Burger|Pizza|Tofu” then putting | in the Multi-value Separator will turn this into a field with three values of “Burger” and “Pizza” and “Tofu” rather than just one value of all three together.

Field Mapping Categories and Definitions

Attachment

The ID or IDs of a email’s attachments. Typically not used in conjunction with datasets using Family ID or Parent ID.

BCC

Contents of the BCC Field of an email should be used with full email addresses or names.

Body Text

The primary text field used for analysis. Example: Extracted Text

CC

Contents of the CC Field of an email should be used with full email addresses or names.

Conversation Index

Contents of the conversation index field. If valid for a document, this becomes the method to provide Email Threading for that document. Is also examined to see if any documents in the email chain are missing from the dataset. If so, there absence will be flagged in the field, EMT_ThreadHasMissingMessage.

Custodian

Contents of the Custodian Field, it is surfaced in the Advanced Search as a unique field.

Date

Contents of any other date field relevant to the document. In this category, faceted means that the data is broken down in a manner that the system can use the date field in the timeline view of dashboard (see Supported Date Formats).

Date Sent

Contents of the Date Sent field of an email. Used by Email Threading.

Enumeration

When a field has a category of enumeration the whole field is put into the index as a single token. One can only get results when searching for the whole value in quotes. The GUI will present a drop down for selection when searching an enumeration.

Exact

Used to provide a metadata field when you do not want to have stemming involved in a search. Family ID

A Unique ID that is used to represent the entire family of documents. This ID should be the Parent ID (See Parent ID) of the Family of documents. In the event that it is not the Parent document ID then Brainspace analytics will also require the configuration of the Parent ID field for all documents in the family to properly determine the relationships between parent documents and their attachments. FamilyID is not required, but can/should always be specified if available since it is used to populate the family id field used for indexing and EMT_FamilyId in the full report.

File Size

Used to provide special handling and search for documents based upon their size in advanced search.

File Type

Used to provide special handling and search for documents based upon their type in advanced search.

From

Contents of the From Field of an email should be used with full email addresses or names.

ID

The unique document identifier with the document population. (Examples include “Control Number,” “DocNo”, “DocID”, “BegBates.”)

NATIVE_PATH

Points to the native file on disk.

Numeric Bytes

Used to provide special handling and search for documents based upon their size in advanced search. Numeric Float

Used to provide special handling and search for documents based upon your custom numeric metadata in advanced search.

Parent ID

The ID associated with the parent of a document (e.g., a word document attachment) would have the ID of the email it was sent in. In order to identify attachments, the Parent ID field, Attachments field or Family field must be used. Only one of these is required, but it is best to specify two of these: either Parent and Family, OR Attachments and Family. All three can be specified, but that is not recommended since Parent and Attachments can conflict. If Parent is available it should be used instead of Attachments. If only Family is available, it will work to identify attachments, but only if the Family Id values correspond to the Key of the parent document. After all processing, if the provided Family field was blank, Brainspace analytics will populate the metadata field family_id with the key of the parent document.

Reference

Deprecated, do not use,

String

When a field has a category of string, each word in the value is a separate token. One can search for individual words, phrases, or the whole value (if you know what it is).

Subject

The subject line of an email or the title of a document. Used for Email Threading.

Text

An additional Text field, typically metadata such as comments, that can be part of your search, but you don’t want analyzed. Example “Lawyer Notes”

Text Path

Used when the DAT file does not contain the body_text of the document being imported. This field will have the path, as known by the tool that exported the data. Options include the ability to trim the beginning of the field value, and to point to an absolute disk address.

To

Contents of the CC Field of an email should be used with full email addresses or names.

Unfiltered Text

Retains filter words as defined in the Filter Words text files and in boilerplate content.

Total Documents

The total number of documents in a dataset.

New Documents

The total number of new documents added to a dataset. This number is cumulative until it resets the new document count to zero after a new build.

Pinned Dataset

A dataset card that has been moved from the unpinned Datasets pane to the Pinned Datasets pane. Unpinned Dataset

View a dataset card that is located in the default Datasets pane.

Activity Status

The status of the dataset:

  • Active: Indicates that the dataset is available for use in Brainspace.

  • Inactive: Indicates that the dataset remains in Brainspace but is not available for use (see Disable a Dataset).

Connectors

Relativity and Relativity Plus

Concept Topics

Relativity Overlay

When using a Relativity connector for a dataset, you can overlay a group of analytics fields from Brainspace into Relativity after creating a new dataset or after rebuilding an existing dataset. These fields can be used to organize and to accelerate linear document review in Relativity.

You can choose to run overlay to Relativity automatically every time you build a dataset, or you can choose to run overlay to Relativity manually as needed.

Multiple Relativity Overlays

When overlaying multiple datasets or classifiers to a single Relativity Workspace, Brainspace will display duplicate fields appended with additional characters to identify that a particular field in Relativity has more than one corresponding field in Brainspace. This also applies to multiple Brainspace datasets that use the Relativity Plus connector.

Relativity Plus Connector

Brainspace’s Relativity Plus connector is compatible with Relativity v9.7 and newer versions of Relativity. Relativity v9.7 and v10 work with the legacy Relativity connector and the Relativity Plus connector in Brainspace.

Note

The Relativity Plus connector only works with Relativity v9.7 and v10.x (including RelativityOne). Brainspace strongly recommends that customers upgrade to Brainspace v6.2 or newer to use the most recent API.

Brainspace-Relativity Document Links

By default, documents are linked between Relativity and Brainspace. Clicking the document link in Brainspace opens the source document in Relativity if network access (http or https) is permitted and the user is logged in to Relativity. Document links can be disabled using the Advanced Settings feature in Relativity(see third-party Relativity documentation for more information).

Multiple Relativity Web Servers

Relativity 9.7.229.5 does not support database-backed authorization codes with load-balanced web servers. Using multiple web servers will result in the Relativity Plus connector failing to authenticate. This can be resolved by configuring the Relativity Plus connector to explicitly communicate with a single Relativity web server.

Overlay Process

The Relativity Plus connector overlays Analytics field data in batches after a build. The Relativity connector overlays data as a single action. The Relativity Plus connector no longer causes the Relativity Workspace to

hold a full-table lock on the documents table while overlay is occurring. In the case of an overlay failure, the documents will have field values partially written to the Analytics field.

Pause and Resume

The Relativity Plus connector does not support pause and resume. Because of the concurrent nature of the implementation, Brainspace could not guarantee that a document would not be missed during the resume. The pause button works from the UI, but when resumed, the ingest will start from the beginning of the entire saved search or Workspace.

Predictive Coding

The Brainspace Addons Relativity (*.rap) application is still required for predictive coding (PC). This creates the choice fields (BDPC Is Responsive) that Brainspace is not able to create via the API. It also creates the views and saved searches that are useful for the PC workflow.

Note

CMML with ACS provides a control set solution, so PC is no longer required. The CMML solution does not require the *.rap file.

Ingest and Overlay Performance

Ingest performance with Relativity Plus should be significantly faster than the Relativity connector; however, the Relativity Plus connector is highly dependent on the values chosen for the connector configuration and the number of CPUs on the Brainspace servers, as well as the network bandwidth between the Brainspace host and the Relativity host.

Relativity Server Maintenance

Based on testing results and interaction with the kCura team, temporary resources are created on the Relativity server-side that correspond to each export initiated during the dataset ingestion process. The Relativity services have a cron-job that occurs weekly to clean-up temporary resources. These resources consume large amounts of space on disk, so it is important to monitor disk space for environments where many or large ingestion processes are being done. If more frequent clean-up jobs are required, contact the kCura team for assistance.

Task Topics

Run Overlay Automatically after a Dataset Build

The overlay to Relativity feature can be activated to run automatically each time you build an existing dataset with a Relativity connector.

Note

To run overlay automatically when creating a new dataset with a Relativity connector, see Create a Dataset with a Relativity Connector.

To run overlay automatically after a dataset build:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset with the Relativity connector, and then click the Settings icon:

    Select_Settings_for_Dataset_with_Connector.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane of the Dataset Settings dialog, toggle the Overlay switch to the On position:

    Select_Connector_Overlay.png

    The Overlay switch will become green.

  4. Click the Save button. Do one of the following:

    • To close the Dataset Setting dialog without running overlay to Relativity now, click the Close icon.

    • To overlay to Relativity now, click the Build button.

      Select_Relativity_Build.png

If you choose to close the Dataset Settings dialog without overlaying to Relativity, overlay to Relativity will run automatically every time you run a dataset build in the future. If you choose to run overlay to Relativity immediately and without the auto-overlay feature, overlay to Relativity will only run when manually initiated.

Run Overlay Manually on an Existing Dataset

After creating a dataset with a Relativity connector, you can use the overlay to Relativity feature at any time whether or not automatic overlay to Relativity feature has been enabled.

To run overlay manually on an existing dataset:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset with the Relativity connector, and then click the Settings icon:

    Select_Settings_for_Dataset_with_Connector.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane of the Dataset Settings dialog, click Run Now:

    Run_Overlay_Now.png

    Note

    If the Run Now option is not visible or is greyed out, confirm that the dataset has a connector to Relativity, and you have fields selected for overlaying.

  4. Click the Save button.

  5. Click the Close (X) icon.

    Close_Dataset_Configuration.png

After running the overlay to Relativity, you set up automatic overlays or manually run the overlay feature at any time.

Enable Multiple Relativity Overlays on an Existing Relativity Plus Connector

After or while creating a Relativity Plus connector, you can enable the multiple Relativity overlay feature to overlay Relativity field sets in multiple Brainspace datasets to a single Relativity Workspace.

Note

This feature is only available for the Relativity Plus connector.

Note

When a dataset build completes with this feature enabled on the Relativity Plus connector, Brainspace creates a unique field in Relativity to map each of the BD fields with the datasets in Brainspace. For more information on Brainspace fields, see Relativity Overlay Fields on page 42.

To enable multiple Relativity overlay field sets:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Connectors button.

  3. Locate the Relativity Plus connector, and then click the Update Connector icon.

    The Relativity Plus connector configuration dialog will open.

  4. In the Overlay pane, toggle the Enable Multiple Overlay Field Sets switch to the On position:

    Overlay_Pane_ON.png
  5. Click the Test connector button.

  6. After verifying that the connector configuration is valid, click the Update Connector button.

    The connector configuration dialog will close automatically.

Every dataset in Brainspace that employs this connector will produce unique fields in the Relativity Workspace.

Configure a Relativity OAuth2 Client for a Relativity Plus Connector

Configuring a Relativity OAuth2 client is the first step in creating a Brainspace dataset with a Brainspace Relativity Plus connector for Relativity v9.7 and newer versions of Relativity.

Note

Relativity 9.7.229.5 does not support database-backed authorization codes with load-balanced web servers. Using multiple web servers will result in the Relativity Plus connector failing to authenticate. This can be resolved by configuring the Relativity Plus connector to communicate with a single Relativity web server.

To configure a Relativity OAuth2 client:

  1. Open a Relativity instance in a web browser, type your username, and then click the Continue button:

    Relativity9_Username_prompt.png

    The Relativity password dialog will open.

  2. Type your password, and then click the Login button:

    Relativity9_Password_prompt.png

    The Relativity Workspaces window will open.

  3. Click the Authentication menu dropdown arrow, and then click the OAuth2 Client option:

    Relativity_authentication_option_select.png
  4. Click the New OAuth2 Client button:

    Relativity_choose_New_OAuth2.png

    The OAuth2 Client Information dialog will open.

  5. Type a name for the client.

  6. Set OAuth2 Flow to Code.

  7. Type the redirect URL (fully-qualified domain name) with /oauth as the URL endpoint:

    Relativity_OAuth2_Client_information.png
  8. In the Access Token Lifetime field, type a session timeout value:

    Relativity_access_token_timeout_setting.png

    Note

    Relativity does not issue refresh tokens. If your OAuth2 session exceeds the session timeout value, you must clear your credentials and create a new OAuth2 token. The OAuth2 session timeout can be set to a low value to be more secure, or it can be set to a maximum of one year.

  9. Click the Save button.

    The OAuth2 Client Information screen will refresh.

    Relativity_Save_OAuth2_config.png
  10. Make note of the Client ID and Client Secret codes.

    You will need this information when creating the Relativity Plus connector in Brainspace.

After configuring a Relativity OAuth2 client, you are ready to create a Relativity Plus connector in Brainspace (see Create a Relativity Plus Connector).

Create a Relativity Plus Connector

After configuring a Relativity OAuth2 client, you are ready to create a Relativity Plus connector in Brainspace. You will need the Client ID and Client Secret that you created for the Relativity OAuth2 client.

To create a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Add Connector button:

    Add_Connector_button.png

    The Connector menu will open.

  4. Click the Relativity Plus option in the menu.

    The Relativity Plus connector dialog will open.

  5. In the Connector Name field, type a name for the connector.

  6. In the Relativity Base Document URL field, type the Relativity base URL that points to the Relativity domain user-interface (fully-qualified domain name) with /Relativity as the URL endpoint.

  7. In the API field, toggle the switch to the On position to allow self-signed certificates.

  8. In the Brainspace Analytics Fields To Overlay After A Full Build field, click an option in the list, and then select all options using the keyboard command Ctrl-A.

  9. In the Relativity API Host Machine Name (FQDN) field, type the fully-qualified domain name for the Relativity REST / ObjectManager API.

  10. In the HTTPS field, toggle the switch to the On position to enable HTTPS.

  11. In the Client ID and Client Secret fields, paste or type the codes that were generated by Relativity when you created the OAuth2 client (see Configure a Relativity OAuth2 Client for a Relativity Plus Connector).

  12. In the Concurrency field, type a value (minimum of 2) for the number of threads to use for ingest and mass operations.

  13. Click the Test Connector button.

  14. After the connector test is successful, click the Create Connector button.

    The new Relativity Plus connector will be added to the Connectors screen.

After creating a Relativity Plus connector, you are ready to create a dataset. You can also manage the connector settings or permanently delete the connector from Brainspace at any time. To configure the advanced settings for a Relativity Plus connector, click the Advanced link in the Relativity Plus connector dialog.

Update a Relativity Plus Connector

After creating a Relativity Plus connector, you can change the connector’s settings at any time.

To update a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Update Connector icon:

    Update_Connector.png

    The connector configuration dialog will open.

  4. Modify any of the connector options, and then click the Test connector button.

  5. After the connector test is successful, click the Create Connector button. The updated Relativity Plus connector will refresh on the Connectors screen.

After updating a Relativity Plus connector, you are ready to create a dataset or continue using it for Brainspace datasets. You can also manage the connector settings or permanently delete the connector from Brainspace at any time. To configure the advanced settings for a Relativity Plus connector, click the Advanced link in the Relativity Plus connector dialog.

Delete a Relativity Plus Connector

After creating a Relativity Plus connector, you can permanently delete it from Brainspace at any time. To delete a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Delete Connector icon as shown in the following image:

    Delete_connector.png

    A confirmation dialog will open.

  4. Click the Delete button.

The confirmation dialog will close, and the Relativity Plus connector will be permanently deleted from Brainspace.

Reference Topics

Brainspace Supported Connectors

Beginning with Brainspace v6.3, all new features will be developed for the new Relativity Plus connector. The classic Relativity connector has been deprecated, and Relativity will discontinue direct SQL access with their version 11.x release.

Brainspace supports the following connectors:

Discovery v5.5

Brainspace v6.0

Brainspace v6.1

Brainspace v6.2

Brainspace v6.3

Relativity 10.3*

X

X

X

x

Relativity 10.1*

X

X

X

x

Relativity 9.7*

X

X

X

x

Relativity 9.6

X

X

X

x

==========

========

========

========

========

=======

Nuix 8.0

x

x

Nuix 7.8

X

X

x

Nuix 7.4

X

X

x

Nuix 7.2

X

X

X

X

x

Nuix 7.0

X

X

X

X

x

Nuix 6.2

X

X

X

X

x

Legend: X – Full Support

*Relativity v9.7 and v10.x work with the legacy Relativity connector and the Relativity Plus connector. Relativity Plus connector only works with Relativity v9.7 and v10.x (including RelativityOne). Brainspace strongly recommends that customers upgrade to Brainspace v6.3 or newer to use the most recent APIs.

Relativity Overlay Fields

When configuring a Relativity or Relativity Plus connector, you will decide which fields to overlay (see Create a Relativity Connector and Create a Relativity Plus Connector.

brs_strict_dup_set_id

If the document is in an SDG, this is the SDG ID of the SDG. Otherwise it is NULL.

brs_strict_dup_pivot is

If the document is in an SDG, this is the document ID of the pivot member of the SDG. Otherwise it is NULL.

BD EMT Duplicate ID

The document identifier of the duplicate email message or attachment. Unique group identifier used to group all documents within each of the exact text duplicate sets.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which control number or document IDs an email message or attachment is a duplicate of within an email thread.

BD EMT EmailAction

Identifies the specific action for each message within an email thread (send, forward, or reply).

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing whether an email was the send (the original message in an email thread), a forward, or a reply within an email thread.

BD EMT FamilyID

The control number of document identifier of the parent email message within a document family within an email thread.

This field can be removed from overlay. Relativity users will not know the Relativity control number of document identifier of the parent email message when reviewing a document family (message and attachments) within an email thread if this field is removed from the overlay.

BD EMT

Intelligent sort field that allows you to sort email threads hierarchically in descending order so that the most inclusive messages for each branch within an email thread are sorted to the top along with any attachments to those inclusive messages.

This field can be removed from overlay. Removing this field from the overlay will not allow Relativity users to sort the Brainspace Email Threads hierarchically in Relativity.

BD EMT IsDuplicate

Identifies whether an email message is a duplicate within the email thread.

This field can be removed from overlay. This field is “Yes” if the email message or attachment is a duplicate of another message or attachment within the email thread. Removing this field from the overlay will prevent Relativity users from knowing which email messages or attachments are duplicates within an email thread.

BD EMT IsMessage

Identifies which documents within an email thread are actual email message. Documents are consider emails if they have a Populated From field and are not identified as attachments.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which documents within an email thread are actual email messages.

BD EMT IsUnique

Identifies which messages within the email threads are the inclusive message.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which email messages are inclusive within an email thread. Relativity users are only required to review the inclusive messages within an email thread as they contain the content of all the non-inclusive messages within the email thread.

BD EMT MessageCt

The total number of messages within an email thread.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing how many email messages are within an email thread.

BD EMT ThreadID

Unique identifier assigned to a group of messages within a single email thread.

This field can be removed from overlay. This same BD EMT FamilyID is assigned to all emails and attachments that belong to the same email thread. If this field is not overlaid into Relativity, users will not be able to take advantage of Brainspace’s email threading for batching and review in Relativity. There are several custom Relativity Views created by Brainspace that require this field to be populated in order for the Views feature to function properly.

BD EMT ThreadIndent

This field is used for displaying the Email Thread view in Relativity where messages are properly indented in the view based on the order in which the messages were created within the email thread. For example, a reply to a message will have one greater thread indent than the message it replies to.

This field can be removed from overlay. Relativity users will not be able to use the custom Brainspace Email Thread Views if this field is not included in the overlay.

BD EMT ThreadPath Full

Contains a semicolon delimited list of the document IDs (control numbers) for all the messages that are included within each inclusive message.

This field can be removed from overlay. Relativity users will not know which non-inclusive email messages are contained in each inclusive message in the email thread if this field is not included in the overlay. This will make inclusive-only reviews in Relativity difficult to manage.

BD EMT ThreadSort

Field that sorts email threads by ThreadIndent first and then by chronology (the order in which the messages were generated within each email thread).

This field can be removed from overlay. Relativity users will not be able to sort the Brainspace email threads in Relativity chronologically if this field is not included in the overlay.

BD EMT UniqueReason

Indicates why the message is inclusive. “Attach” means the message had an attachment that is not present in the previous messages or is different from the attachment in the previous messages within an email thread. “Message” means the content of the message is not inclusive in another email in the same email thread. “Message” and “Attach” both contain unique information.

This field can be removed from overlay. Relativity users will not know why a message has been marked IsUnique if this field is not included in the overlay.

BD EMT ThreadHasMissingMessage

Indicates that parsing the ConversationIndex has revealed that a document in the thread has not been included in the Brainspace dataset.

This field can be removed from overlay. Users will not be able to see that the document was missing from the thread if this field is not included in the overlay.

BD EMT WasUnique

Indicates that this document was considered to contain unique content. However, a new document introduced in a subsequent build has all of this document’s content and more. This status will be preserved across all subsequent builds.

This field can be removed from overlay. Users will not be able to see that this document was previously considered having unique content if this field is not included in the overlay.

BD EMT WasUniqueReason

Indicates why the message was unique. “Attach” means the message had an attachment that was not present in the previous message or was different from the attachment in the previous message in an email thread. “Message” means the content of the message was not unique within another email within the same email thread. “Message” and “Attach” both contain unique information.

This field can be removed from overlay. Relativity users will not know why a message has been marked WasUnique if this field is not included in the overlay.

BD EMT Intelligent Sort

Alternative sorting algorithm that presents the most complete document in an email thread first.

This field can be removed from overlay. Users will not be able to see the most complete version of the email thread if this field is not included in the overlay.

BD EMT AttachmentCount

The number of attachments included with this email.

This field can be removed from overlay. Users will not be able to see how many attachments are included with this email if this field is not included in the overlay.

BD StrictDupStatus

This field identifies the status of a document with regard to its strict exact-duplicate state. With the option to include metadata in CMML classifiers, it becomes necessary to consider that two documents may be textual exact duplicates but have differences in metadata; therefore, this field represents the strict exact-duplicate state (see note below).

It will be populated with one of three values:

  • unique: This document may not be considered a strict exact-duplicate of any other document.

  • duplicate: This document is considered a strict exact-duplicate of another document

  • pivot: This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will not know whether this document is considered a strict exact duplicate if this field is not included in the overlay.

Note

Two documents are considered to be strict exact-duplicates if the analyzed text fields are identical (except for normalized whitespace), if all fields flagged as usedForExactDup in the schema.xml are identical and if all fields are flagged as “analyzed = true” in the schema.xml are identical.

Brainspace supplies a default schema that makes certain choices for which fields are marked as usedForExactDup and/or analyzed. The user can override those choices.

BD ExactDupSetID

Unique identifier for each Exact Duplicate group. Documents that are exact duplicates of one another are grouped together using this ID.

This field can be removed from overlay. This group identifier is used in Relativity to understand which documents are part of the same exact text duplicate grouping. Documents that are exact text duplicates of one another will all get the same BD EM Duplicate ID.

BD ExactDupStatus

This field identifies the status of a document with regard to its exact-duplicate state.

It will be populated with one of three values:

  • unique - This document may not be considered an exact duplicate of any other document.

  • duplicate - This document is considered an exact duplicate of another document.

  • pivot - This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing whether this document is considered an exact duplicate.

Note

Two documents are considered exact duplicates if the analyzed text fields are identical (except for normalized whitespace) and all fields flagged as usedForExactDup in the schema.xml are identical.

BD IsExactPivot

Identifies the original document against which all exact duplicates were compared.

This field can be removed from overlay. Relativity users will not know which document is considered to be the original against which all documents are compared to identify exact text duplicates if this field is removed from the overlay.

BD IsNearDupPivot

Identifies the original document against which all near duplicates were compared.

This field can be removed from overlay. Relativity users will not know which document is considered to be the original against which all documents are compared to identify near duplicates if this field is removed from the overlay.

BD NearDupSimilarityScore

Contains the near duplicate similarity score for near-duplicate documents.

The score is a number between the near duplicate threshold (by default 0.8) and 1.0. It is calculated based upon all fields in the schema marked as `analyzed=true`. Note that the configuration of true/false setting is not controlled through the UI and should not be altered without consulting Reveal/Brainspace support.

This field can be removed from overlay to your third party review platform. If that is done users will not know how similar a near-duplicate document is to its original document.

BD Languages

A semi-colon delimited list of the languages potentially within a document.

This field can be removed from overlay. Relativity users will not know what mix of languages are contained within a document if this field is removed from the overlay.

BD NearDupSetID

Unique identifier for each near-duplicate group. Documents that are near duplicates if one another are grouped together using this ID.

This field can be removed from overlay. Relativity users will not know which documents belong to the same near duplicate set if this field is removed from the overlay. Users will also not be able to propagate coding decisions to near-duplicate documents in Relativity.

BD NearDupStatus

This field identifies the status of a document with regard to its near-duplicate state.

It will be populated with one of three values:

  • unique - This document may not be considered an exact duplicate of any other document.

  • duplicate - This document is considered an exact duplicate of another document.

  • pivot - This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will now know whether this document is considered to be a near duplicate if this field is removed from overlay.

Note

By default, two documents are considered to be near duplicates if they share 80 percent of their text shingles in common.

BD Primary Language

The primary (or dominant) language identified within a document.

This field can be removed from overlay. Relativity users will not know the primary language identified within a document if this field is removed from the overlay.

BD RelatedSetID

Identifies the first parent cluster that is normal (not an exact duplicate or near duplicate). Directly correlates to ClusterID in Brainspace. This field identifies which documents are highly similar in terms of content but not similar enough to be considered near duplicates. Documents that are highly similar but not quite near duplicates are assigned the same BD RelativitySetID.

This field can be removed from overlay. Relativity users will not be able to organize batches and perform review on documents that are highly similar if this field is removed from overlay.

BD StrictDupStatus

This field identifies the status of a document with regard to its strict exact-duplicate state. With the option to include metadata in CMML classifiers, it becomes necessary to consider that two documents may be textual exact duplicates but have differences in metadata; therefore, this field represents the strict exact-duplicate state.

It will be populated with one of three values:

  • unique: This document may not be considered a strict exact duplicate of any other document.

  • duplicate: This document is considered to be a strict exact-duplicate of another document.

  • pivot: This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will not know whether this document is considered to be a strict exact duplicate if this field is removed from overlay.

Note

Two documents are considered strict exact duplicates if the analyzed text fields are identical (except for normalized white space), all fields flagged as usedForExactDup in the schema.xml are identical, and all fields flagged as “analyzed = true” in the schema.xml are identical. Brainspace supplies a default schema that makes certain choices for which fields are marked as usedForExactDup and/or analyzed, but the user may override those choices.

BD Summary

A summary of the document using six words or phrases. For near duplicates, this field will have the six terms or phrases that best distinguish this document from the pivot. For pivots, this field will have the six terms or phrases that best represent this document.

This field can be removed from overlay. Relativity users will not have a high-level summary of every document if this field is removed from overlay.

BDID

Brainspace unique identifier for every document ingested. This is an ID that BD gives every document that, when used sequentially, will show an evolution of documents. Every BDID is adjacent to its most similar document (e.g., BD_000000001, BD_000000002 with enough zeros for 999 million docs). Zeros are needed to maintain string sort order. Sorting documents by BDID will result in neighbor documents being highly related to each other, which expedites the review process.

This field allows Relativity users to sort documents when batching for review so that the documents within Relativity review batches are highly similar to one another in terms of content and vocabulary. Sorting by this field when creating batches will force “like” documents to be included in the same review batch. This has been proven to accelerate document review by as much as 90 percent.

This field can be removed from overlay. Relativity users will not be able to take advantage of this field and sorting feature if this field is removed from overlay.

Predictive Coding Overlay Fields

BDPC Auto Code

For a predictive coding (PC) classifier (model), this field contains the recommended coding decision for every document. This field is only populated when the predictive coding session in Brainspace is closed by clicking on “Close Session.” This Relativity field will only be populated if the user closes out the active PC session.

This field cannot be removed from overlay. Users may choose not to close out the PC session, which will leave this field blank or null in Relativity.

BDPC Control Set

This field identifies all the documents that are included in the control set (model). This field is only populated when using Brainspace’s predictive coding (PC) workflow.

This field cannot be removed from overlay. This Relativity field will only be populated if the user creates a control set for a PC session in Brainspace.

BDPC Is Responsive

This is the coding field used to code documents in Relativity that will be used to train a Brainspace classifier. This field is only populated when using Brainspace’s predictive coding workflow.

This field cannot be removed from overlay. The field will only be populated if the user applies this field in Relativity to code documents for a Brainspace PC session.

BDPC Needs Review

This field identifies all the documents in Relativity that need to be reviewed for a given Brainspace training round. This field is only populated when using Brainspace’s predictive coding (PC) workflow.

This field cannot be removed from overlay. This Relativity field will only be populated if the user creates a PC session in Brainspace and creates a control set or training round.

BDPC Predictive Rank

This field contains the most recent predictive rank. This field is only populated when using Brainspace’s predictive coding (PC) workflow. This Relativity field will only be populated if the user creates a Brainspace PC session.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a PC training round in Brainspace.

BDPC Use for Training

This field identifies which documents will be used for training the model.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a PC training round in Brainspace.

CMML Overlay Fields

BD CMML ## Score Relativity Field Name

This field is only populated when using Brainspace’s CMML workflow. This Relativity field will only get populated if the user creates a Brainspace CMML classifier where a “Connect Tag” (Relativity coding field) was used to train the classifier. A “BD CMML ## Score Relativity Field Name” field will be created in Relativity to store the predictive rank for that classifier where ## is the corresponding CMML classifier ID in Brainspace and “Relativity Field Name” is the name of the Relativity field connected to the classifier in Brainspace.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a training round in Brainspace for a CMML classifier. Multiple CMML classifiers cam be created and ran concurrently if more than one issue must be investigated.

Relativity Plus Configuration Options

Ingest Batch Size

The number of documents to be retrieved by each HTTP export request from Relativity.

Analytics Overlay Batch Size

The number of documents to be send by each HTTP overlay request to Relativity.

Embed Native Viewer URL

Whether or not a Relativity Document URL should be generated, per document.

HTTP(s) Request Timeout

The maximum number of milliseconds that any given HTTP request will wait for Relativity to respond.

Maximum HTTP(s) Requests Per Second

The maximum number of HTTP requests that the Brainspace application will send to Relativity, per second.

Validate User Facing URLs

Whether or not the Brainspace application should verify the base document URL and OAuth URLs.

API Query Page Size

The number of objects (documents, Relativity Workspaces, saved searches, fields) that should be retrieved when querying the Relativity API, per request.

Document Condition Size Limit

The number of documents that will be used for the optimized incremental ingest query.