Reveal Help Center

Brainspace User & Admin Guide

Dashboard
Concept Topics

Dashboard Screen

After logging in to Brainspace and clicking a Dataset card in the Datasets display screen, the Analytics Dashboard will open.

The Analytics Dashboard includes the following features:

image1.png

Callout 1: Dropdown menus to open a different dataset, an existing focus, or an existing CMML classifier.

Callout 2: Links to open the Analytics (default view), Notebooks, and Supervised Learning pages or the User dropdown menu to access other user and system information. The Administration option in the User dropdown menu is only available for Group Admins and Super Admins.

Callout 3: Text field to search for concepts in a dataset, icons to manage search results, and links for advanced searches, search history, and saved searches.

Callout 4: Links to open the Cluster Wheel, Communications, Conversations, or Thread Analysis pages.

Callout 5: Bar graph displays the relative volume of original documents, near-duplicate documents, exact-duplicate documents, and documents that have not been analyzed in a dataset.

Callout 6: Timeline chart displays relative document volumes by year, month, day, hour, or minute in logarithmic scale (default) or linear scale. Timeline chart will display a chevron icon if no documents are present for two or more consecutive time periods. For example, the following image shows that no documents exist in the dataset for years 1981 through 1997, inclusive:

image2.png

If no documents are present for only one time period (e.g., one year or one month), the Timeline chart will display a missing bar, not a chevron.

Callout 7: Heatmap displays a chart for term volumes by top terms over time and a chart for anomaly detection term volumes by terms over time. The Heatmap will display a chevron icon if no terms are present for two or more consecutive time periods. For example, the following image shows that no terms exist in the dataset for years 1981 through 1997, inclusive:

image3.png

If no terms are present for only one time period (e.g., one year or one month), the Heatmap will display blank cells in the column, not a chevron.

Callout 8: Three panes display the top terms in the dataset in one pane and two faceted data panes for viewing and searching by metadata.

Callout 9: Results pane includes the total number of documents in the dataset or search, a sort-by relevancy or metadata menu, relevancy distribution graph for documents in the dataset or search, and list of document cards in the dataset or search. The Results pane is only visible for active searches.

Document Bar

The Document Bar displays the total number of documents and a break-down of the total number of documents by original, near-duplicate, exact-duplicate, and not-analyzed documents. This can be the complete dataset if no filter is active or the result of any search or other filter.

image4.png

Click the Document Bar to view only original, near-duplicate, exact-duplicate, or not-analyzed documents in the Analytics Dashboard.

Heatmaps

The Heatmap is disabled by default to increase the speed of search results in Analytics. The Heatmap includes two views—Document Volume and Anomaly Detection.

Document Volume Heatmap

The Document Volume Heatmap displays the co-occurrence of the top five terms over time.

image5.png

Callout 1: View the default top five terms in a search.

Callout 2: Create and manage custom top-terms terms lists. Callout 3: Show more terms (up to ten).

Callout 4: View document volume for a node in the Heatmap or view the documents associated with a node in the Dashboard. Click on a node to execute a search for that term on that date interval.

Callout 5: Switch to the Anomaly Detection view.

Callout 6: Select multiple nodes to add multiple terms to a search.

Callout 7: Update the Heatmap automatically after a search. If not enabled, the Heatmap will automatically disable after each search to provide a more responsive update to the Dashboard.

Anomaly Detection Heatmap

The Anomaly Detection Heatmap displays the frequency of the terms over time. Anomaly detection uses a standard score to determine when a term’s frequency is higher than average. Brighter colors indicate higher-than-average usage.

image6.png

Callout 1: View the default top five terms in a search.

Callout 2: Create and manage custom top-terms terms lists. Callout 3: Show more terms (up to ten).

Callout 4: View term frequency for a node in the Heatmap or view the documents associated with a node in the Dashboard. Click on a node to execute a search for that term on that date interval.

Callout 5: Switch to the Document Volume view.

Callout 6: Select multiple nodes to add multiple terms to a search.

Callout 7: Update the Heatmap automatically after a search.

Top Terms Pane

Top term values are document counts based on the frequency of their occurrence in de-duplicated documents with boilerplate and email headers removed. The Top Terms list is sorted by UsageScore, which is calculated using both the document count for the specific term and its weight. Clicking on a term will execute that term as a keyword search, which will reveal the actual number of documents containing that term.

Note

De-duplicated document counts can differ considerably from total document counts.

The Top Terms pane displays the top-ten terms by default:

image7.png

Callout 1: Filter the list of terms.

Callout 2: Download a *.csv file of top terms, which includes the DocumentCount and the UsageScore for each term.

Callout 3: Rather than clicking on a term to run just that term as a search, tick the checkbox to add it as a Boolean OR search with other terms that have the checkbox ticked after clicking on Add to Search.

Callout 4: View minimum document counts for top terms. To see the actual counts, click on the term to run it as a keyword search.

Note

Minimum document count does not include duplicates or other excluded documents.

Callout 5: Add the next ten top terms to the Top Terms pane.

Faceted Data Panes

Brainspace provides two faceted metadata panes so that you can compare two different sets of metadata values side-by-side.

image8.png

Callout 1: View the first ten metadata values for the selected metadata field. Click to run a faceted search for that metadata field and value. In the above example, clicking on “choate” will return every document that has a value of “choate” in the BCC field.

Callout 2: Select a different metadata field dropdown menu.

Callout 3: Filter the metadata values to look for a more specific value.

Callout 4: Export the selected metadata values to a *.csv file.

Callout 5: Add the next ten metadata values to the list.

Callout 6: Execute a faceted search or search for all selected metadata values.

Timeline Chart Overview

After opening a dataset in the Analytics Dashboard, the full Timeline chart for the dataset will display by default.

The Timeline chart includes the following features:

image9.png

Callout 1: Dropdown menu provides options for viewing the Timeline chart for sent, created, or received documents.

Callout 2: Clickable bars for viewing the Timeline chart by year (default view), month, week, day, hour, or minute.

Callout 3: Switch the Timeline chart from logarithmic scale (default) to linear scale .

Callout 4: Identify missing documents in the dataset .

User Dropdown Menu

The User dropdown menu, accessible from every page in Brainspace, includes the following features:

image10.png

Callout 1: Navigate to the Administration screen to add new datasets, manage existing datasets, download system-wide dataset reports and individual dataset reports, manage users and groups, connectors, services, portable models, and troubleshoot errors .

Callout 2: View notifications from Brainspace.

Callout 3: Edit your personal information and manage your password.

Callout 4: View the Brainspace GUI and API versions, batch tools jars, and connector versions available for your Brainspace instance.

Callout 5: Log in to Brainspace Support Station online help.

Callout 6: Log out of Brainspace.

Task Topics

View the Document Volume Chart

After opening a dataset in the Analytics Dashboard, the Document Volume chart will be disabled and hidden from view by default.

To view the Document Volume chart, click the DISPLAY button:

image11.png

The Document Volume chart will display .

After creating a search to view a subset of documents in the dataset, the Heatmap will be disabled and hidden from view by default, and the DISPLAY button name will change to UPDATE.

To enable and display the Document Volume Heatmap automatically after a search, see Update the Heatmap Automatically on Search .

View Top Terms Document Volume Chart

After opening a dataset or creating a search, click the DISPLAY or UPDATE button, and then hover over a node in the Document Volume chart:

image12.png

The document volume popup will display the top term, time period, and volume of documents associated with the node. Click to execute a search for that term in that time period.

View the Anomaly Detection Heatmap

After opening a dataset or creating a search, click the Display or Update button, and then click the Anomaly Detection button:

image13.png

The Anomaly Detection Heatmap will display .

Add Multiple Terms to a Search

After opening a dataset in the Analytics Dashboard, the Document Volume chart displays will be disabled and hidden from view by default . After opening the Document Volume chart or the Anomaly Detection chart, you can add a single term or multiple terms to a search.

Note

This procedure applies to the Document Volume chart and Anomaly Detection chart.

To add multiple terms to search:

  1. Click the Display or Update button.

  2. Toggle the Select Multiple switch to the On position:

    image14.png

    The Add to Search button will appear.

  3. Click multiple nodes in the chart, and then click the Add to Search button. The Analytics Dashboard will refresh to display the search results.

  4. Click the Update button.

The added terms will display in the chart.

Update the Heatmap Automatically on Search

After opening a dataset in the Analytics Dashboard, the Document Volume chart will be disabled and hidden from view by default.

To enable and display the Document Volume Heatmap automatically after a search, click the Display or Update button, and then toggle the Always Update on Search switch to the On position:

image15.png

The Heatmap will update and display automatically after every search.

View the Timeline Chart in Linear Scale

After opening a dataset in Analytics Dashboard, the full Timeline chart will display document volumes by year for the dataset or search results in logarithmic scale.

To view the Timeline chart in linear scale, toggle the Logarithmic Scale switch to the Off position:

image16.png

View Documents Volumes by Year

By default, the full Timeline chart displays document volumes by the full span of time for the documents, usually at the year level, in a dataset or a subset of documents in search results.

To view the full Timeline chart, open the Analytics Dashboard.

The full Timeline chart will display by default. You can click a year bar in the Timeline chart to view document volumes by month for a specific year, or you can click-and-drag multiple year bars in the chart to view documents for multiple years.

View Documents Volumes by Month

To view the Timeline chart by month, open the Analytics Dashboard, and then click a year bar in the full Timeline chart .

The Timeline chart will refresh to display document volumes by month for the selected year. You can click a month bar in the Timeline chart to view document volumes by week for a specific month, or you can click-and-drag to select multiple month bars in the chart to view documents for multiple months.

View Documents Volumes by Week

To view the Timeline chart by week, open the Analytics Dashboard, click a year bar in the full Timeline chart, and then click a month bar.

The Timeline chart will refresh to display document volumes by week. You can click a week bar in the Timeline chart to view document volumes by day for a specific week, or you can click-and-drag to select multiple week bars in the chart to view documents for multiple weeks.

View Documents Volumes by Day

To view the Timeline chart by day, open the Analytics Dashboard, click a year bar in the full Timeline chart, click a month bar, and then click a week bar.

The Timeline chart will refresh to display document volumes by day. You can click a day bar in the Timeline chart to view document volumes by hour for a specific day, or you can click-and-drag to select multiple day bars in the chart to view documents for multiple days.

View Documents Volumes by Hour

To view the Timeline chart by hour, open the Analytics Dashboard, click a year bar in the full Timeline chart, click a month bar, click a week bar, and then click a day bar.

The Timeline chart will refresh to display document volumes by hour. You can click an hour bar in the Timeline chart to view document volumes by minute for a specific hour, or you can click-and-drag to select multiple hour bars in the chart to view documents for multiple hours.

View Documents Volumes by Minute

To view the Timeline chart by minute, open the Analytics Dashboard, click a year bar in the full Timeline chart, click a month bar, click a week bar, click a day bar, and then click an hour bar.

The Timeline chart will refresh to display document volumes by minute. You can click a minute bar in the Timeline chart to view documents for a specific minute, or you can click-and-drag to select multiple minute bars in the chart to view documents for multiple minutes.

Concept Search

Concept Topics

Concept Search Drawer

Using a word, phrase, paragraph, or even an entire document, Brainspace’s Concept Search automatically expands queries to reveal related concepts and retrieves conceptually related documents ranked by relevance or contextual distance. These inferences are a result of Brainspace’s understanding of the entire dataset and can quickly introduce key concepts previously unknown to the user.

To create a concept search, type a word, phrase, sentence, or copy and past an entire document in the Concept Search text field on the Analytics tab, and then press Enter on your keyboard. The Concept Search drawer will open.

The Concept Search drawer includes the following features:

image1.png

Callout 1: Search for a concept the dataset.

Callout 2: Review the top-ten concepts and relative weights found in the dataset for the concept search.

Callout 3: Add a specific concept to the Top Concepts pane.

Callout 4: See Dashboard callout 3.

Callout 5: Use the terms in the Top Concepts pane to find additional concepts.

Callout 6: Add additional concepts to the Top Concepts pane.

Callout 7: Refresh the Additional Concepts pane after adding concepts to the Top Concepts pane.

Callout 8: Open the Concept Query dialog.

Callout 9: Open the Concept Terms and Weights dialog.

Callout 10: Close the Concept Search drawer.

Rediscover Concepts

Clicking the Rediscover Concepts icon Rediscover_Concepts_Icon.png in the Additional Concepts pane (callout 5) pulls additional

terms from the Brain based on the current selection of Top Concepts, allowing you to pursue a particular conceptual usage. You can rediscover additional terms by adjusting the weight of terms or by deleting terms in the Top Concepts pane.

For example, if you search for “party” and get “fun party” terms as well as “contractual party” terms in the results, you can delete “contractual party” and then click Rediscover Concepts to return only more “fun party” terms.

Top Concepts

The top-ten concepts related to the search term will display in the Top Concepts pane, with the most closely related term identified with the largest dot and the term most distantly related to the search term identified with the smallest dot.

Weights

After searching for a concept, you can manage each the weight of each term in the Top Concepts pane. By default, each term is automatically assigned a weight that identifies how closely or distantly the concept term is related to the search term. The following options are available in the Weights dialog:

image2.png

Callout 1: Exclude the term from the search results.

Callout 2: Change the term’s weight to low.

Callout 3: Change the term’s weight to medium.

Callout 4: Change the term’s weight to high.

Callout 5: Require the term in the concept search results.

Callout 6: Remove the term from the Top Concepts pane.

Show Concept Weights Dialog

After searching for a concept, you can view the numerical values associated with the concept weights in the Top Concepts pane and download a *.csv file to archive a record of the weights associated with each term in the Top Terms pane. The following options are available in the Concept Terms and Weights dialog:

image3.png

Callout 1: View the numerical weights associated with each term in the Top Concepts pane.

Callout 2: Download a *.csv file of the concept terms and weights.

Concept Query Dialog

After searching for a concept, view the Boolean logic for the search results in the Top Concepts pane.

image4.png
Conversations
Concept Topics

Conversations Screen

After opening a dataset in Brainspace, you can open the Conversations screen from any Analytics screen in Brainspace.

The Conversations screen includes the following features:

image1.png

Callout 1: Add one or more people to the Conversations chart.

Callout 2: Open the People Manager dialog.

Callout 3: Show only original senders (To), copied recipients (CC), or blind copied recipients (BCC) in the Conversations chart.

Callout 4: Zoom in to focus on a specific date range in the Conversations chart.

Callout 5: View the legend icons.

Callout 6: View the list of people added to the Person list.

Callout 7: View threads and weights for people who have been added to the Conversations chart.

Callout 8: View a timeline graph that displays the volume of documents for the people added to the Person list.

Callout 9: View the number of documents in the search results. The first number is how many documents Brainspace can display on the graph. The second number is the number of documents in the current search.

Callout 10: View document cards and Relevancy Distribution graph the Results pane.

Conversations Chart

The Conversations chart includes the following features:

image2.png

Callout 1: Click an email dot to view the email chain (callout 2) and list of associated emails (callout 3).

Callout 2: Highlight the email chain associated with an email dot (callout 1).

Callout 3: View the list of emails associated with an email dot (callout 1) and click an email in the list to open it in the Document Viewer.

Callout 4: Click the numeral associated with an email dot to add excluded communications to the Communications chart.

Person List

The Person list includes the following features:

image3.png

Callout 1: Expand the entry in the list to view aliases.

Callout 2: Remove the entry from the Person list.

Task Topics

Populate the Conversations Chart

The Conversations feature provides a way to analyze the who, what, and when in email conversations between people. You can use the Conversations feature with all documents in a dataset or with a subset of documents in a dataset. The following procedure describes how to use the Conversations feature using all of the documents in a dataset.

To populate the Conversations chart:

  1. In the main navigation ribbon, click the Conversations option:

    image4.png

    The Conversations screen will open.

  2. Click the Add (+) Person button.

    The Add Person dialog will open. This dialog contains a list of all people and aliases associated with documents in the dataset. You can either add one person at a time or automatically add the top 5, 10, 25, 20, or 25 people involved in conversations.

  3. Click a person in the list.

    The Conversations screen will refresh.

  4. Click the Add (+) Person button. The Add Person dialog will open.

  5. Click a person in the list.

    The Conversations screen will refresh to display the two people in the Person pane and the conversation visualization in the Conversations chart. You can either continue adding people to the Conversations chart or begin analyzing conversations in the Conversations chart between the two people that you just added. The dots on the horizontal lines and vertical connections between the dots represent emails from the original sender to the recipient or recipients (To, CC, and BCC) and forwarded emails (FWD).

After you are finished adding people to the Conversations chart, you can save the search results for future reference, tag the emails, create a Focus, create a Notebook, or download the search results in a Metadata Report *.csv file.

Add Excluded People to the Conversations Chart

After populating the Conversations chart, you can add people who are involved in a specific email thread but are shown in the People pane.

To add excluded people to the Conversations chart:

  1. Populate the Conversations chart.

  2. In the Conversations chart, click the +number below a node circle:

    image5.png

    The Excluded Recipients dialog will open.

  3. Click one of the following options:

    • Add One recipient to the Conversations chart.

    • Add Five recipient to the Conversations chart.

    • Add All recipients to the Conversations chart.

    The Conversations screen will refresh.

In addition to adding the new people to the People pane and associated Conversation node circles to the Conversations chart, the document count in the Showing Docs field will increment to include all of the new documents associated with the people added to the Conversations chart.

Highlight a Conversation Thread

After populating the Conversations chart, you can highlight a conversation thread to view emails that have the same thread ID.

To highlight a conversation thread:

  1. Populate the Conversations chart (see Populate the Conversations Chart on page 18).

  2. In the Conversations chart, hover your cursor over a conversation dot.

    The Conversations chart will refresh to highlight conversations included in the thread and dim conversations that are not included in the conversation, and the document list associated with the email thread will open.

After highlighting a conversation thread, you can add people who are included in the thread but are not included in the Conversations chart, and you can view messages and recipients included in the thread.

Add a Recipient to a Conversation Thread

After populating the Conversations chart, you can add an email recipient who is included in a conversation thread but is missing from the Conversations chart.

To add a person to a conversation thread:

  1. Populate the Conversations chart (see Populate the Conversations Chart on page 18).

  2. In the Conversations chart, hover your cursor over a conversation dot.

    The Conversations chart will refresh to highlight conversations included in the thread and dim conversations that are not included in the conversation, and the Document List dialog will open.

  3. Hover your cursor over an email in the Document List dialog.

    The Document Information dialog will open to display the Message information by default.

  4. Click Recipients, and then click the (+) Add Recipient icon:

    image6.png

    The Conversations pane will refresh to display the added recipient in the Conversations chart.

After adding the recipient to the Conversations chart, you can either continue adding people to the Conversations chart or begin analyzing the conversation.

Notebooks
Concept Topics

Notebooks Screen

You can create a Brainspace notebook to collect documents for a specific subject or for search results. You can also make notebooks public or private, include existing tags, and include trained classifiers in a notebook. Notebooks are not shared across datasets, but public notebooks are shared to all users of the dataset.

Note

When creating a new notebook, an option is provided to add documents to a notebook from a list of document IDs included in a *.csv file.

After opening a dataset in Brainspace, you can open the Notebooks screen from any screen in Brainspace. The Notebooks screen includes the following features:

image1.png

Callout 1: View the current dataset name or click the dropdown menu to open a different dataset.

Callout 2: View all private and public notebooks in the dataset. All Brainspace users can view and modify the contents of public notebooks in a dataset. Private notebooks can only be viewed and managed by the notebook creator and by Admins. Although you can change a private notebook to a public notebook at any time, you cannot change a public notebook to a private notebook.

Callout 3: View all public and private notebooks created by the current Brainspace user. All Brainspace users can view and modify the contents of public notebooks in a dataset. Private notebooks can only be viewed and managed by the notebook creator and by Admins. Although you can change a private notebook to a public notebook at any time, you cannot change a public notebook to a private notebook.

Callout 4: Click a notebook card to manage its settings and documents.

Callout 5: Search for a notebook name in the dataset.

Callout 6: Create a new notebook.

Notebook Card Information

After you create a notebook, Brainspace adds a notebook card to the Notebooks screen. A notebook card includes the following features:

image2.png

Callout 1: View the notebook name.

Callout 2: View the notebook creator, description, and Lucene query information. The Lucene query information is generated automatically in some cases and will be removed if any documents are added to or removed from the notebook.

Callout 3: View the number of documents in the notebook and the date it was last modified.

Callout 4: View the public or private notebook’s status. All Brainspace users can view and modify the contents of public notebooks in a dataset. Private notebooks can only be viewed and managed by the notebook creator and by Admins. Although you can change a private notebook to a public notebook at any time, you cannot change a public notebook to a private notebook.

Callout 5: Edit the notebook’s settings.

Callout 6: Delete the notebook from Brainspace. This will remove the notebook from the dataset, but the documents will remain in the dataset.

Create a New Notebook Dialog

After opening the Notebooks screen and clicking the New Notebook button, the Create a New Notebook dialog will open. The Create a New Notebook dialog includes the following features:

image3.png

Callout 1: Enter the notebook name.

Callout 2: Describe the purpose of and any additional important details about the notebook.

Callout 3: Upload IDs from a *.csv file to add specific documents to the notebook.

Callout 4: Change notebook status from private (default) to public. All Brainspace users can view and modify the contents of public notebooks in a dataset. Private notebooks can only be viewed and managed by the notebook creator and by Admins. Although you can change a private notebook to a public notebook at any time, you cannot change a public notebook to a private notebook.

Callout 5: Optionally select existing dataset tags to be highlighted in the notebook.

Callout 6: Optionally select existing classifiers to sort the documents in the notebook.

Task Topics

Create a Notebook

After searching for documents in a dataset, you can add the documents to an existing notebook or create a new notebook for the documents.

To create a new notebook:

  1. Open a dataset in the Analytics Dashboard.

  2. Click the Notebooks tab.

    image4.png

    The All Notebooks screen will open by default.

  3. Click the New Notebook button.

    The Create a New Notebook dialog will open.

  4. Type a notebook name.

  5. Type a notebook description.

    After typing a description, you can make the notebook public and include existing tags used in the dataset, and you can use the IDs from a *.csv file or an existing classifier to add documents the notebook.

  6. Click the Create Notebook button.

    The Notebooks screen will refresh to display the new notebook card.

After creating the notebook, you can add documents to the notebook while using any of the Analytics features in Brainspace and after creating new classifiers. You can edit the notebook setting at any time, and you can remove documents from the notebook in the Notebook dialog.

View Documents in a Notebook

After creating a notebook, you can open it to view and manage its documents. To view documents in a notebook:

  1. Open a dataset in the Analytics Dashboard.

  2. Click the Notebooks tab.

    image5.png

    The All Notebooks screen will open by default.

  3. Click a Notebook card.

    The notebook’s documents will display in the card viewer with the Overview chart displaying the number of documents included over a given period of time.

After opening the notebook to view its documents, you can edit the tags highlighted in the notebook, use any selected classifiers to sort the documents in the notebook, and click on any document to view it in the Document Viewer.

View Notebook List View

After creating a notebook, you can view its documents and sort them by specific fields, including metadata such as sent date.

Note

The date fields format displayed depends on the date format in the original data.

To view notebook cards in the list view:

  1. Open a dataset in the Analytics Dashboard.

  2. Click the Notebooks tab.

    image6.png

    The All Notebooks screen will open by default.

  3. Click a Notebook card.

    The notebook’s documents will display in the card view.

  4. Click the List View icon.

    image7.png

    The notebook’s documents will display in the list view.

You can change the order of the metadata fields displayed, change the column sizes, add metadata fields to or remove metadata fields from the display, and sort by any metadata field. These changes are not preserved when you exit the list view.

Document Viewer
Concept Topics

Metadata Features

Metadata features are described by two element JSON arrays. JSON is a format for transmitting data between software applications.

Note

Metadata feature descriptions have an indirect relationship to the field names of the original metadata fields in the source dataset. Brainspace strongly encourages using an exported portable model file as a starting point when creating a new portable model file that will use metadata features. Begin by using CMML to train a portable model from some examples on the source dataset (or one with the same metadata structure and import configuration), and then export the model. You can then modify the feature descriptions in that exported file to produce documents that will be useful for your classification task.

Below are some example feature descriptions for metadata match features:

  • “[“”cc””,””fred jimes <jimes@foo.org (jimes@foo.org)>””]”

  • “[“”created””,””20011204″”]”

  • “[“”emailclient””,””outlook””]”

  • “[“”sent-hour-of-day””,””12″”]”

  • “[“”received-day-of-week””,””thursday””]”

  • “[“”created-year-month-day-hour””,””2001120412″”]”

The first element of the description specifies the field name of a metadata field. Because of mapping and feature extraction operations, the field names in metadata features are not necessarily the same as the field names in the source data.

In the examples above, the field names are:

  • cc

  • created

  • emailclient

  • sent-hour-of-day

  • received-day-of-week

  • created-year-month-day-hour.

The second element is the field value to be matched. The metadata feature will contribute to the score of a document when the field value matches the contents of the specified field for the document.

In the examples above, the field values are:

  • “fred jimes <jimes@foo.org (jimes@foo.org)>”

  • “20011204”

  • “outlook”

  • “12”

  • “thursday”

  • “2001120412”

The process of determining whether a field value matches the contents of the corresponding field depends on the nature of the particular field.

Two factors play into this:

  • The values for some metadata fields (e.g., emailclient) correspond directly to raw values in the source field in the original data. Other metadata fields have values that are the output of some computation. For example, received-day-of-week has a value that is derived mathematically from the date.

  • Some metadata fields always take on a single value, (which might be missing) for a document (e.g., emailclient above), while others (e.g., cc) can take on zero, one, or several values.

Document Viewer Features

After doing a search for documents in Analytics, you can view each document and its metadata in the Document Viewer. The Document Viewer includes the following features:

image1.png

Callout 1: View the Results pane.

Callout 2: Open a specific document in the Document Viewer.

Callout 3: View the Hit Relevance pane to display a document’s metadata and manage tags.

Hit Relevance Pane (Metadata Tab)

When reviewing documents in the Document Viewer, you can view metadata and view and manage tags associated with a document. The Metadata tab includes the following features:

image2.png

Callout 1: View the search elements that are true for the selected document. In this example, the user searched for near-duplicate documents, and the example document was included in the search results because it is a near-original (pivot of the near-duplicate set) document.

Callout 2: View document metadata.

Callout 3: View and manage document tags.

Callout 4: Filter to find specific fields associated with the document.

Callout 5: Manage pinned metadata fields.

Callout 6: View metadata fields associated with the document.

Callout 7: Unpin the metadata field.

Callout 8: Pin the metadata field.

Hit Relevance Pane (Tags Tab)

When reviewing documents in the Document Viewer, you can view and manage tags associated with a document. The Tags tab includes the following features:

image3.png

Callout 1: View the search elements that are true for the selected document. In this example, the user searched for near-duplicate documents, and the example document was included in the search results because it is a near-original (pivot of the near-duplicate set) document.

Callout 2: View and manage document tags.

Callout 3: Add an existing tag to the document.

Callout 4: View and delete tags associated with the document.

Focus

To include a specific subset of documents in a dataset when training a CMML classifier, you can create a focus and then use it as a training round. All training examples must come from this subset of documents. Predictive ranks will only be applied to documents from the same subset of documents.

For example, a dataset called English Wikipedia contains 4.3 million documents, but only 21,000 of those documents are related to the concept “Apollo.” Given that the rest of the documents in the dataset are unrelated to Apollo, there is no need to consider them for training or assign predictive ranks to them. When using a focus with the Relativity integration, scores for the 21,000 documents will be overlaid in Relativity each time the model is updated instead of the entire corpus of 4.3 million documents.

Task Topics

Sort Documents by Metadata Fields

After creating a search, the documents will display in the Results pane sorted by relevance to the search terms. You can also choose to sort documents by specific metadata fields.

To sort documents by specific metadata fields:

  1. With a search active and the Results pane visible, hover over the SORT BY Relevancy field, and then click a metadata field:

    image4.png

    The Results pane will refresh to display the highest ranking document for the specified metadata field at the top of the document card list and the lowest scoring document at the bottom of the card list. You can click the Sort arrow to view the results in ascending order:

    image5.png

After sorting document cards by a metadata field, you can use the results to create a focus or a notebook. You can also tag the documents and download a report for the results.

Pin and Unpin Metadata Fields

After opening a document in the Document Viewer, you can pin a document’s metadata fields to view specific metadata fields while reviewing documents.

To pin and unpin metadata fields:

  1. In the Results pane, click a Document Card:

    image6.png

    The Document Viewer will open.

  2. In the fields list, find the metadata field of interest (perhaps by filtering field names) by hovering over a field, and then click the Pin icon:

    image7.png

    The field will be pinned to the top of the metadata fields list, and the Pin icon will change to an Unpin icon.

Fields will remain pinned from session to session even after you log out of Brainspace. After pinning the field, you can pin additional fields, clear all pins, show only pinned fields in the metadata fields list, or continue reviewing documents.

Show Only Pinned Metadata Fields

After pinning metadata fields, you can hide unpinned metadata fields to view only pinned fields in the Metadata tab.

To show only pinned metadata fields:

  1. In the Results pane, click a document card:

    image6.png

    The Document Viewer will open.

    Note

    If you have pinned metadata fields, the pinned fields will display at the top of the Fields pane followed by unpinned metadata fields.

  2. Toggle the Show Only Pinned Fields switch to the On position:

    image8.png

    The Fields pane will refresh and display only pinned fields.

Your decision to show only pinned metadata fields is preserved from session to session even after you log out of Brainspace. After showing only pinned fields, you can clear all pins or continue reviewing documents.

Clear All Pinned Metadata Fields

You can clear all pins to view all metadata fields in the search results. To clear all pinned metadata fields:

  1. Click the Supervised Learning tab.

    image9.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Results pane, click a Document Card:

    image10.png

    The Document Viewer will open with pinned fields at the top of the Fields pane.

  4. Click the Clear Pins link:

    image11.png

    The Fields pane will clear all pinned fields refresh to display all metadata fields.

After clearing unpinning the field, you can pin additional fields or continue reviewing documents.

Create a Focus

After searching for documents in a dataset, you can create a focus from the search results to use when creating a CMML classifier.

  1. Open a dataset in the Analytics Dashboard.

  2. Search for documents.

    The Document Viewer will open.

  3. Click the Focus icon:

    image12.png

    The Manage Focus dialog will open.

  4. In the Name text field, type a name for the focus.

  5. Click the Private/Public switch to make the focus public.

    Note

    If you choose to make the focus public, you will not be able to make it private in the future.

  6. Click the Threads, Families, Related Documents, or Exact Duplicates check boxes to include additional documents in the focus.

  7. Click the Save Focus button.

    The Manage Focus dialog will close.

After creating a focus, you can use it to create a CMML classifier.

Manage a Focus

After creating a focus, you make a private focus public and delete a focus. To manage a focus:

  1. Open a dataset in the Analytics Dashboard.

  2. Click the Select a Focus dropdown menu, and then click the Edit Focus icon :

    image13.png

    The Manage Focus dialog will open.

  3. Edit the settings and then click Save Focus, or click the Delete icon.

Manage Documents in a Focus

After creating a focus, you can open the focus in the Analytics Dashboard to view its documents, tag documents.

To manage documents in a focus:

  1. Open a dataset in the Analytics Dashboard.

  2. Click the Select a Focus dropdown menu, and then click the Focus entry in the list:

    image14.png

    The focus will open in the Analytics Dashboard, and the Select a Focus dropdown menu will display the name of the focus.

All further searches with this focus active will be constrained to the documents in this focus.

Tags

Task Topics

Create a New Tag

After a dataset is created, Admin-level users can add new tags at any time. To create a new tag:

  1. In the user drop-down menu, click Administration:

    image1.png

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Tag Management icon:

    image2.png

    The Manage Tags dialog will open.

  3. Click the +New Tag button:

    image3.png
  4. In the Create a New Choice Tag dialog, type a tag name and one or more tag choices.

    To add more choices, click the Add (+) icon.

    Note

    For example, you could create a tag named “Responsive” and three choices for the tag–a “Yes” choice, a “No” choice, and a “Maybe” choice.

  5. Click the Save button.

  6. Add additional tags to the dataset as required.

  7. After you are finished adding tags to the dataset, click the Close button.

    Your new tags will now appear in the Manage Tags dialog. The Counts column contains the value zero for all new tags until a document in the dataset has been coded with the new tag.

After adding a new tag to a dataset, you can edit or delete it at any time.

Add Tags to a Dataset

After creating tags, Brainspace users with Admin credentials can add more tags, edit existing tags, and delete tags from the dataset. To add additional tags to a dataset, see Create a New Tag.

Edit an Existing Tag in a Dataset

After adding a tag to a dataset, Admin-level users can edit the tag name and tag choices. To edit an existing tag:

  1. In the user drop-down menu, click Administration:

    image4.png

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Tag Management icon:

    image5.png

    The Manage Tags dialog will open.

  3. In the Manage Tags dialog, click the Edit icon:

    image6.png

    The Edit Tag dialog will open.

  4. In the Edit Tag dialog, edit the existing tag name and tag choices or add additional tag choices.

  5. Click the Save button.

    The Edit Tag dialog will close, and the Manage Tags dialog will show the edited tag name and tag choices.

  6. Click the Close button.

    The tag is ready for use in Analytics.

After adding a tag, can modify or delete it at any time.

Connect Brainspace Tags to Relativity Tags

After creating tags in Brainspace, you can connect them to tags in Relativity.

Note

Connecting a tag that already exists in Brainspace will cause all choices and document tagging to be overwritten.

To connect Brainspace tags to Relativity tags:

  1. In the user drop-down menu, click Administration:

    image7.png

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Tag Management icon:

    image8.png

    The Manage Tags dialog will open.

  3. In the Manage Tags dialog, click the Connect Tags icon:

    image9.png

    After Relativity verifies your stored credentials, the Connect Tags dialog will open.

    Note

    If you have not stored your Relativity credentials, you will be prompted to verify you Relativity user name and password.

  4. In the Connect Tags dialog, click the check boxes for the tags to connect.

  5. Click the Connect button.

    The Manage Tags dialog will refresh, and the tag will display the Connect Tag icon connect_tags_icon.png.

  6. Click the Close button.

    The Manage Tags dialog will close.

After connecting a tag, can disconnect it, push the tag, pull the tag, modify the tag, or delete the tag at any time.

Delete an Existing Tag

After adding tags to a dataset, Admin-level users can delete one or more of the tags associated with the dataset.

Note

Deleting a tag will permanently untag all documents with the tag, will permanently remove all choices for the tag, and will permanently delete any classifiers that use the tag. This cannot be undone.

To delete an existing tag:

  1. In the user drop-down menu, click Administration:

    image10.png

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Tag Management icon:

    image11.png

    The Manage Tags dialog will open.

  3. In the Manage Tags dialog, click the Delete icon:

    image12.png

    The Manage Tag dialog will refresh with the tag deleted.

  4. Click the Close button.

    The Manage Tags dialog will refresh.

View Tag Usage

After tagging documents in Brainspace, you can view the total number of documents tagged and the number of documents tagged as positive and negative.

To view tag usage:

  1. In the user drop-down menu, click Administration:

    image13.png

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Tag Management icon:

    image14.png

    The Manage Tags dialog will open to display the number of tagged documents in the Count column.

Supervised Learning

Concept Topics

Supervised Learning

A type of machine learning where human decisions are used to classify documents as either likely relevant or not relevant, which is then used to teach the machine through rounds of training to find documents that are likely relevant or not relevant.

Features

Elements of a document such as terms, phrases, and, optionally, metadata. When a document is scored positive and incorporated into a model, the individual features receive incremental gains. When a document is scored negative, the individual features receive a decrement. Over time, these increments and decrements help to produce a model that can be used to evaluate unreviewed documents. When an unreviewed document is evaluated, the cumulative score of each feature is used to determine its likely relevance.

  • Example text phrase feature: “tangible personal property”

  • Example metadata feature: ["email sent date-hour-of-day","00"]

Model

A compilation of features such as terms, phrases and, optionally, metadata that represent the decisions made on positive (likely relevant) and negative (likely not relevant) documents. Each term is assigned a weight of relevance that influences the score of each document.

Classifier

A model that is in the process of being refined using positive and negative example documents using either algorithmic or manual selection methods.

Predictive Rank

Documents are scored using predictive rank. The output of training the classifier that results in a normalized score between zero and 1.0 for each document in the dataset where higher ranking documents are likely to be relevant and lower ranking documents are likely to be not relevant.

Training

The incremental process of improving a model by providing additional training examples.

Predictive Model

The result of training a classifier using tagged positive and tagged negative examples and includes a list of features and their associated relevance weights.

Portable Model

A predictive model is a model that is saved for the purposes of reusing the model for another case.

Auto-coding

Throughout the machine training process, validation statistics are reviewed to determine if goals are being met. After the goals are met, coding decisions can be used to auto-code documents that have not been reviewed by humans.

Note

Auto-coding is only available with Predictive Coding at this time, not with Continuous Multimodal Learning (CMML).

Training Round

A set of additional training examples used to improve the model.

Precision

The total number of relevant documents compared to the number of all the documents that were selected and analyzed by the training model.

Responsive

In legal document review, a document is considered to be responsive when the information it contains is responsive (likely relevant) to a discovery request by the opposition.

For example, when opposing counsel requests all documents related to comedy, a document is responsive if it is mostly about comedy. This can be an article about a comedy show, comedy book, comedy TV show, or a comedian.

Non-responsive

In legal document review, a document is considered to be non-responsive when the information it contains is not responsive (likely not relevant) to a discovery request by the opposition.

For example, when opposing counsel requests all documents related to comedy, a document is non­responsive if it is not mostly about comedy. In other words, a document about a person is non-responsive if that person is not a comedian and has not taken comic roles.

Seed Set (Seed Round)

The first training round of documents tagged by human reviewers for a Predictive Coding or Continuous Multimodal Learning session used to teach a classifier how to distinguish between responsive and non­responsive documents.

Target Population

The full set of documents within a document corpus that are being sought (i.e., the theoretically complete set of responsive documents).

Control Set

A control set is a random sample of documents in a dataset that can be used to measure the classifier. When creating a control set, the validation settings for Minimum Recall , Margin of Error and Confidence Level are selected by the user. After a randomly selected sample has been fully reviewed, you can use it to estimate richness , recall , precision , and depth for recall. These estimates depend on the size of the control set and the values of the validation settings chosen.

Creating control sets can be confusing because of the “chicken-and-the-egg” problem of needing a control set to decide how big of a control set you need.

Brainspace draws the first control set to get an initial estimate of richness and if that insufficiently large, Brainspace will guide the user into adding documents to the control set until a sufficient number of responsive documents are tagged. Brainspace can estimate the richness of your document set and use that value to estimate how many documents need to be added to the control set for your desired level of recall, certainty, and margin of error. Keep in mind that this is an estimate. After more documents have been randomly selected, the richness estimate will have improved, and the number of documents you need in your control set might increase. It can take a few rounds of adding more randomly selected documents until you finally have enough documents.

In low-richness document sets, you may have to review a large number of documents to find enough responsive documents for your control set.

Control Sets and Rolling Productions

Many e-discovery projects have rolling productions. While you might begin predictive coding when you have 100,000 documents, in the end you might have 1,000,000 documents because of rolling productions. It’s important for the final control set used to evaluate the model to be valid.

Your control set is valid if you cannot differentiate how you selected your control set from randomly selecting a QA set at the end of your predictive coding process. Randomly selecting a QA set at the end of your predictive coding process means that all rolling productions have happened. The full candidate set of documents is now known, and you pick a random set of documents out of that full set.

Note

Brainspace does not support clawbacks.

For rolling productions, this means the initial control set may no longer be valid.

Let ’s say that when you started with 100,000 documents you picked a random control set containing 1,000 documents. Now, because you received additional documents along the way, you have a document set with 1,000,000 documents. If you picked a control set with 1,000 documents out of a set of 1,000,000, odds are that you picked a control set that only has documents from the first 100,000 documents.

Each time you randomly pick a document, there is a 1:10 chance that the document came from the first 100,000 documents. When you pick the second document, there is only a 1:100 chance that both documents were in the first 100,000 (1/10 * 1/10 = 1/100). By the time you have picked nine documents randomly, there is only a 1:1,000,000,000 chance that all nine documents were found in the first 100,000 documents. By the time you pick document 1,000, odds are that all of those documents came from the first 100,000 documents.

Let’s say that you started with 100,000 documents and, at the end of your rolling productions, you have 200,000 documents. Now for each document selected there is a 1:2 chance that it was picked from the first 100,000 documents. In this case, by the time you randomly pick document 30, there is less than a 1:1,000,000,000 chance that all 30 documents came from the first 100,000 documents.

In other words, as soon as you have a rolling production, your existing control set is biased and can no longer be used for making predictions about richness, recall, precision, or anything else about your dataset.

Repairing and Replacing a Control Set

Adding more documents to your document population after you have drawn your control set breaks control-set validity. If you add more documents to a control set after drawing a control set, you can either repair or replace the control set.

The biggest difference between repairing and replacing is the total number of documents that must be reviewed for the control set, and the total cost of reviewing those documents.

One other difference is that if you repair, you can choose to repair your control set after each rolling production. This means you always have an up to date view of the richness of your document population and you always have a control set that can accurately estimate recall and precision for your entire document population.

If each rolling production is the same size as the original production of documents, then you will have to replace your control set less frequently than you receive rolling productions if you want to save on the total number of documents reviewed for the control set. If you don’t generate a new control set each time there is a rolling production, then for some amount of time you will be working with a control set that does not accurately estimate the richness or your document set and can’t accurately predict the recall and precision of a classifier’s performance on that document set.

In practice, you will usually decide how to handle rolling productions based on how much you expect your document population to grow. If you expect the document population to significantly more than double in size, then replacing the control set one or more times will probably cost less than repairing the control set each time.

If you do not know how much your document population will grow, then we recommend you do simply replace the control set each time the document population doubles. This way you keep your control set from getting too far out of sync with your document population to be useful while at the same time avoiding an unacceptable increase in review costs.

Repair a Control Set by Same-Rate Sampling

To repair the control set, we need to pick additional control set documents in a way that, if we were picking a completely new control set now, would be likely to have as many documents coming from the first 20 documents as the current control set has selected from the first 20 documents.

Replace a Control Set

In a case where the document population will more than double, you are usually better off replacing the control set. Since the certainty and margin of error of a control set depends more on the number of positive examples in the control set, you can review fewer documents by replacing the control set with a similarly sized control set selected over the entire final population.

Rolling productions can add document sets with different richnesses than the original set. Just selecting a control set that is the same size as the one it is replacing does not guarantee that the resulting set will have the same number of responsive documents and the same statistical properties as the original control set. In

the real world the control sets are larger and the results won’t vary as much unless the richness of the document population was significantly changed by the rolling productions.

Control Sets and Training Documents

Brainspace uses a control set as a way to evaluate the classifier training. To keep the control set effective, it is important to avoid using knowledge of the control set to guide your selection of training documents.

Validation Set

A random sample of documents in a dataset that have not been tagged for classification that is used to estimate the proportion of untagged documents that would be tagged positive if they were reviewed by people.

A validation set is a simple random sample from all documents that are available for supervised learning but have not already been tagged with the positive or negative tag for the classification. It is used to estimate the proportion of untagged documents that would be tagged positive if they were reviewed. This proportion is sometimes called “elusion,” though in this case it is the elusion of the review to date, not the elusion of the latest predictive model.

Continuous Multimodal Learning
Concept Topics

Continuous Multimodal Learning (CMML) Document Scoring

CMML (and for that matter PC) uses a machine learning algorithm called logistic regression to produce its predictive models. The algorithm produces a predictive model which produces two scores, which you can see in the prediction report. The raw score (labeled “score” in the prediction report) is between minus infinity and plus infinity. The normalized probability score (labeled “chance_responsive” in the prediction report) is between 0 and 1.

The probability score can be interpreted as the current predictive model’s best guess at the probability the document is responsive. In other words, if you took a large number of documents that the predictive model all gave a probability score of 0.8, you would expect roughly 80 percent of them to be responsive. If that fact were to be true for all probability scores output by the model, we would say the model is “well-calibrated.”

A big caveat is that the predictive model will be poorly calibrated until the training set has a lot of training data that is representative of the dataset. In particular, it tends to take many rounds of active learning before the scores are decently calibrated, and calibration will never be perfect.

Since probability scores have a nice interpretation, why do we also have raw scores? The main reason is that you can, particularly when scores are poorly calibrated, get a lot of documents that all have scores of 1.0 but where some are really better than others. The raw score allows breaking those ties. The CMML interface sorts on the raw score for that reason.

Create Control Set Dialog

The following settings are available when creating a CMML control set:

image1.png

Callout 1: Estimate the cost-to-review per document for a specific currency type.

Callout 2: Provide the minimum threshold for recall.

Callout 3: Provide the margin of error.

Callout 4: Provide the confidence level for.

Callout 5: Estimate preliminary richness. Brainspace will iterate until sufficient documents are added to the control set.

Callout 6: Choose the number of documents to include in the initial control set. A value will be suggested based upon the settings chosen and the estimated preliminary richness. Brainspace may suggest more documents be added in a later cycle if an insufficient number of documents are not in the control set.

CMML Control Set Progress Tab

The following features are available on the Progress tab:

image2.png

Callout 1: View the classifier training progress graph to compare the number of positive documents (y-axis) as a function of the total number of coded documents (x-axis) in each training round. View each training round’s status and the number of positive (responsive) and negative (non-responsive) documents in each training round. An upward slope suggests responsive documents are being added to the model. A flatter line suggests few responsive documents are being added to the model.

Callout 2: View only coded, positive , or negative documents in Analytics.

CMML Control Set Recall/Review Tab

The following features are available on the Recall/Review tab:

image3.png

Note

“Documents to review” is used on this graph to refer to all documents that would be recommended as responsive and assumes a further step of review of those documents before production.

Callout 1: View the classifier recall/review graph to compare the recall percentage (y-axis) as a function of the percentage of documents to review (x-axis) in the classifier. Drag the %Review node on the horizontal line to view the recall percentage and total cost to review the suggested documents at any stage of classifier training.

Callout 2: Change the currency type or monetary value for the classifier. This will change the estimated cost to review the documents recommended for as responsive by the model.

Callout3: View the percentage of documents suggested as responsive (for further review) and the associated recall, precision, and F-score percentages for the selected position of the %Review node on the x-axis. These values change according to the %Review value selected on the x-axis (see callout 2 and callout 6).

Callout 4: View the number of documents suggested as responsive (for further review) in the classifier and the number of documents after including families. These values change according to the %Review value selected on the x-axis (see callout 2 and callout 6).

Callout 5: Estimate the number of documents that must be reviewed before finding the next document in the classifier that is likely to be relevant.

Callout 6: Micro-adjust the position of the %Review node on the x-axis to view the recall percentage number and total cost to review the suggested responsive documents at any stage of classifier training.

CMML Control Set Depth for Recall Tab

The following features are available on the Depth for Recall Graph tab:

image4.png

Callout 1: View the depth for recall graph to compare the percentage of documents of the classifier that would be considered responsive (y-axis) over the number of training rounds (x-axis) in the control set.

CMML Control Set Training Statistics Tab

The following features are available on the Training Statistics tab:

image5.png

Callout 1: View and modify current control set settings. If more documents are recommended to get the control set to align with the validation settings requested, it will display a message here.

Callout 2: View the number and proportion of responsive documents and non-responsive documents in the control set.

Callout 3: Click to view control set documents as search results in the Analytics Dashboard.

Callout 4: View the total number of documents in the control set.

Callout 5: Convert the control set to a training round. Usually used if the control set is no longer desirable, or if the intent is to draw a brand new control set. This way, the effort of manual review of these documents currently in the control set is turned into training the model.

Callout 6: Download reports for a training round.

Callout 7: View the responsiveness probability value for a training round. A chart of all the documents in the classifier sorted by how likely they are to be responsive. Documents with a score of 0 are highly likely to be non-responsive, documents with a score of 1.00 are highly likely to be responsive. Note the graph displays from 0 to 100, but scores of documents go from 0 to 1.00.

Callout 8: View the number and proportion of responsive and non-responsive documents in a training round.

Callout 9: View statistics for a training round.

CMML Classifier with a Connect Tag

You can create a CMML classifier with or without using a Connect Tag. If you do not use a Connect Tag, control set review and all coding decisions are in Brainspace. When using a Connect Tag, however, you can pull the values for tags from Relativity manually as needed, or you can create a schedule to pull the values for tags from Relativity on a regular schedule.

If adding a control set and using a Connect Tag in Brainspace, control-set review must take place in Relativity. The name of the flagged field in Relativity is different than the name of the Connect Tag in Brainspace. Brainspace automatically generates the tag name in Relativity according to the following convention: BD CMML [First Eight Characters of the Dataset in Brainspace][First Seven Characters of the Classifier Name in Brainspace] xxxxxxxx CtrlSet Pos, where xxxxxxxx is replaced with a randomly generated unique tag value.

To create a classifier with a Connect Tag, see x-ref. For information on creating a field in a Relativity Workspace, visit the Relativity website for third-party product information.

CMML Validation Set

A validation set is a simple random sample drawn from a set of documents available for CMML classifier training that have not been tagged positive or negative in previous training rounds. A validation set is used to estimate the proportion of untagged documents that would be tagged positive if they were reviewed. This proportion is sometimes called “elusion” though in this case it is the elusion of the review to date, not the elusion of the latest predictive model.

Note

“Defensible” is a legal term, not a technical one. The elusion estimate is one of several pieces of technical information that could be relevant to an attorney arguing that the result of a particular TAR project is defensible.

You can create a validation set of any size. A validation set is tagged by the Brainspace user. To get elusion statistics, all documents in the validation set be must be tagged positive or negative without skipping any documents. When the validation set has been completely tagged, Brainspace displays an estimate of elusion, which is the proportion of untagged documents that belong to the positive (responsive) class of documents. The estimate is in the form of a 95 percent binomial confidence interval on elusion. The upper bound of this confidence interval is the value that is typically of interest, since it means we have 95 percent confidence that the true elusion is no higher than this.

For example, suppose you have 103,000 documents, and you have tagged 2,000 of them. There are now 101,000 untagged documents. You draw a validation set of 1,000 documents and tag them, finding that two of them are positive. The 95 percent binomial confidence interval on elusion is [0.0002, 0.0072], so you have 95 percent confidence that 0.0072 (0.72 percent) or fewer of the untagged documents are positive.

You can also translate that to actual document counts. Because you tagged the validation set, you now have 100,000 untagged documents. You can multiply the elusion confidence interval by that number to get a 95 percent confidence interval on number of untagged positive documents: [20, 720]. In other words, you are 95 percent confident that there are now between 20 and 720 untagged positive documents, and, in particular, you are 95 percent confident there are 720 or fewer positive untagged documents.

Frequently the validation set is used as a final round to provide this estimate of elusion, but if more training is desired, you can convert the validation set to a training round and continue training the model with more rounds. This doesn’t preclude you from creating another validation set later.

Continuous Multimodal Learning (CMML)

The CMML workflow can be carried out entirely in Brainspace, and integrates supervised learning with Brainspace’s tagging system. Predictive models can be trained simultaneously for as many binary classifications as desired. Training can be done using batches as in Predictive Coding or in a more flexible fashion by tagging documents anywhere they are viewed. Predictive models can be used to rank documents within Brainspace, and top-ranked documents can be selected for training. All of the training data selection methods provided in the Predictive Coding workflow may also be used. Predictive scores can be exported to third-party review platforms. If desired, a random sample may be drawn from unreviewed documents after a CMML review to estimate the fraction of target documents in the unreviewed material. The CMML approach supports workflows referred to as CAL (TM), TAR 2.0, and other e-discovery buzzwords, but CMML goes beyond them in ease of use, effectiveness, and the ability to leverage Brainspace’s wide range of analytics to find training data.

CMML with Adaptive Control Sets

Using an Adaptive Control Set with a CMML classifier serves the same purpose as a control set with a Predictive Coding classifier (see x-refConnectors). A control set is a random sample of your dataset. After the random sample has been fully reviewed, it can be used to estimate richness, recall, precision, and depth for recall (DFR). The quality of these estimates depends on the size of the control set.

Note

  • Using an Adaptive Control Set with a CMML classifier is optional and can be used with a CMML classifier at any point during training.

  • Datasets must use a Relativity Plus connector to use the Adaptive Control Set feature with Relativity or in Brainspace only as well.

CMML with Automode

When used with a CMML classifier or control set, Automode automatically batches documents in training rounds for review in a Relativity Workspace. When a training round review is complete and the training round is pulled back into the CMML classifier or control set, Automode automatically batches another training round for review in Relativity, and this process continues until Automode is disabled or all of the documents in the classifier or control set have been reviewed. Automode is disabled by default, so Brainspace users must to batch documents manually to create new training rounds until Automode is enabled.

Note

The Automode feature is only available when using CMML in Brainspace with a Relativity Plus connector and a Connect Tag.

Control Set Statistics

These statistics compare the scores assigned to documents by predictive models with specified cutoffs. Two types of statistics are available:

  • Statistics on the predictive model from the most recent training round can be seen to the right of the Recall / Review graph. The cutoff may be chosen by adjusting the slider on that graph. The vertical line corresponds to the current recall goal.

  • Statistics on predictive models from previous training rounds can be seen on the round cards. The cutoff corresponds to the recall goal at the time of that training round.

Control set statistics are based on omitting error documents and including only one document from each duplicate group. The duplicate groups are based on exact duplicates if the classifier uses text only and strict duplicates if the classifier uses text and metadata.

Recall and Review

To understand and adjust the score cutoff for auto-coding, move the slider on the Recall/Review graph to vary the score cutoff. Recall, review %, and other metrics are also included with the graph.

To understand progress at training the predictive model:

  • The Depth for Recall graph shows how training is reducing the review effort to hit the recall goal.

  • The round cards show metrics for each training round based on the recall goal:

    • Effectiveness: Estimated recall, precision, F1, review %, and docs to review—all based on the cutoff that hits the recall goal.

    • Consistency: How well the predictive model agreed with the training data.

CMML Training and Review

Documents discovered using any Brainspace Analytics feature can be tagged at any time. If a positive or negative value is assigned for a classifier tag, the document becomes an ad hoc document in the current open training round for that classifier. It will be added to the training set the next time a training round is closed for that classifier. Documents can be removed from the training set at any time either by untagging them or by tagging them with a value that is neither positive nor negative (e.g., a tag named “Skipped” to identify a document that is deliberately tagged as neither positive nor negative).

Each training round can also include a batch using any of the available training round types. Some, all, or none of the batch documents can be reviewed and tagged before a training round is closed. Batches are selected from documents that have not been tagged.

CMML Prediction Report

The Prediction Report contains one row for each document and provides the following details about the scoring of CMML classifier documents by training round.

Each row in the Prediction Report contains the following fields:

key

The unique internal ID of the document.

responsive

This is “yes” if the document has been assigned the positive tag for the classification, “no” if assigned the negative tag, and blank if the document is untagged or given a tag that is neither positive nor negative.

predicted_responsive

This is “yes” if chance_responsive is 0.50 or more and “no” otherwise. A cutoff of 0.50 used with the predictive model creates a classifier that attempts to minimize error rate. Note that 0.50 is typically not the cutoff you would use for other purposes.

matches

This is “yes” if responsive is non-null and the same as predicted_responsive. This is “no” if responsive is non-null and different from predicted_responsive. Blank if responsive is null (e.g., the document is not a training document).

score

This is the raw score the predictive model produces for the document. It can be an arbitrary positive or negative number. The main use of score is for making finer-grained distinctions among documents whose chance_responsive values are 0.0 or 1.0.

chance_responsive

For predictive models that have been extensively trained, particularly by many rounds of active learning, chance_responsive is an approximation of the probability that a document is a positive example. Values are between 0.0 and 1.0, inclusive. This value is produced by rescaling the score value.

Uncertainty

The uncertainty value is a measure of how uncertain the predictive model is about the correct label for the document. It ranges from 0.0 (most uncertain) to 1.0 (least uncertain). The uncertainty value can be used to identify documents that might be good training documents. Uncertainty is one factor used by Fast Active and Diverse Active training batch selection methods. This value is produced by rescaling the chance_responsive value.

term1, score1,...term 8, score 8

The terms that make the largest magnitude contribution (positive or negative) to the score of the document, along with that contribution.

CMML Consistency Report

The Consistency Report includes rows from the Prediction Report that correspond to training examples for a CMML classifier. It is useful for understanding how the predictive model acts on the documents used to train it. There are two main purposes for the Consistency Report:

  • To understand which training examples contributed particular terms to the predictive model.

  • To find training examples that might have been mistagged (x-ref search the “matches” field for “no”).

CMML Round Report

The Round Report for training round K contains one row for each training round from 1 to K. The row for a training round summarizes information about the training set used on that round.

Note

  • Tags can have more than two values, but only one tag value is associated with the positive category for a classification, and one value is associated with the negative category for a classification. Document with other tag values are not included in the training set.

  • Documents can be untagged and retagged using a different tag choice. For that reason, it is possible that some documents that were in a training set on an earlier round are no longer in the training set in a subsequent training round.

Each row in the CMML Round Report contains the following fields:

Training Rounds

The training round number in the series of classifier training rounds.

Number of Docs

The number of new documents added to the training set for the training round. It includes new documents from both the training batch, if any, and the ad hoc tagged documents, if any. This count does not include documents that were in the training batch on the previous round. The count includes documents with changed labels. The count is also not reduced if some document tags were removed from the previous round’s training batch or converted to non-classification tag values on the current round. For that reason, the Number of Docs count is never negative.

Net Manual Docs

This is the net number of new documents contributed by ad hoc tagging to the training set on the current round. It takes into account documents that have new positive or negative tag values. It also takes into account documents that had a positive or negative tag value at the end of the previous training round but no longer have that value. For that reason, the Net Manual Docs value can positive, negative, or zero.

Classification Model

This column specifies the method that was used to create the training batch, if a batch creation method

was used on a training round. If the only ad hoc documents were tagged in a training round, the value REVIEW is used.

Cumulative Coded

The total number of documents in the training set at the end of a training round.

Cumulative Positive

The total number of documents with positive tag values at the end of a training round.

Cumulative Negative

The total number of documents with a positive tag value at the end of a training round.

Round Positive

The net change in the number of positive training documents since the previous training round.

Round Negative

The net change in the number of negative training documents since the previous training round.

Stability

This is an experimental measure of how much recent training has changed the predictive model. While presented for informative purposes, Brainspace does not recommend its use for making process decisions.

Consistency

This is the proportion of training set examples where the manually assigned tag disagrees with the prediction of a minimum error rate classifier (see CMML Prediction Report).

Unreviewed Docs

Number of documents that are not tagged either positive or negative. Populated only for validation rounds.

Estimated Positive (lower bound)

Lower bound on a 95 percent binomial confidence interval on the proportion of positive documents among the documents not yet tagged as either positive or negative. Populated only for validation rounds.

Estimated Negative (lower bound)

Upper bound on a 95 percent binomial confidence interval on the proportion of positive documents among the documents not yet tagged as either positive or negative. Populated only for validation rounds.

Task Topics

Create a CMML Classifier

You can create a CMML classifier using all of the documents or a subset of documents in a dataset. To create a CMML classifier using a subset of documents in a dataset, see (Create a Focus). You can also create a CMML classifier before or after tagging documents.

The following procedure describes how to create a CMML classifier using all of the documents in a dataset with no existing tag choices and no pre-tagged documents.

To create a CMML classifier:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click the New Classifier button.

    image6.png
  3. In the dropdown list, click the CMML option.

    The New CMML Classifier dialog will open.

    1. Type a name for the CMML classifier.

    2. Click Create Tag, and then type a name for the tag.

      Note

      When you name the tag, Brainspace automatically creates a positive and a negative choice for the tag.

    3. Click the Save button.

    The Classifier screen will open.

  4. Click the New Training Round button.

    The Create a Training Round dialog will open.

    1. Click the Round Types dropdown menu, and then click Influential. Note: You must choose the Influential training round type for the first training round.

    2. In the Size of Training Round (Max 5000) text field, type the number of documents that you want to include in the first training round.

      Note

      As a general rule, 200 documents will provide good classifier training and will help when choosing documents to include in subsequent training rounds.

    3. Click the Continue button. The training round will begin.

    4. After the training round documents have been selected, click the Refresh button.

      The classifier Progress graph and Training pane will refresh to show the results of the first training round.

  5. Review the documents in the training round:

    1. Click the Review button.

      A document open in the Document Viewer.

    2. Click the [Tag Name]: Positive or [Tag Name]: Negative button for each document.

      Note

      You can click the X icon at any point during the review session to return to the Classifier screen before you finish reviewing all of the documents. If you review all documents in the training round, the Classifier screen will open automatically.

    3. Click the Refresh button.

  6. Click the Train Now button.

    When the Classifier screen refreshes, the first training round has completed, a predictive model has been built, and every document in the classifier has been scored and assigned a predictive rank. By default, the document list displays all of the untagged documents in the CMML classifier, with the highest scoring document on the top of the list and the lowest scoring document on the bottom of the list.

  7. Repeat steps 4, 5, and 6 to create all subsequent training rounds.

    Note

    For the second and all subsequent training rounds, you can choose any of the available training round types.

At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags , compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create a CMML Classifier Using a Connect Tag

Before creating a CMML classifier using a Connect Tag, a field must be created in a Relativity Workspace, and a Connect Tag must be created in Brainspace.

The following procedure describes how to create a CMML classifier using all of the documents in a dataset with an existing Connect Tag choice and no pre-tagged documents.

To create a CMML classifier using a Connect Tag:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click the New Classifier button.

    image7.png
  3. In the dropdown list, click the CMML option.

    The New CMML Classifier dialog will open.

    1. Type a name for the CMML classifier.

    2. With the Assign Tag option selected, click Select a Positive Choice or Select a Negative Choice.

      The New CMML Classifier dialog will expand to display the Connect Tag choices.

    3. Click a Connect Tag connect_tags_icon.png in the list of existing tags.

      The New CMML Classifier dialog will refresh.

      Note

      You have the option of tagging documents before or after creating the first training round. For this procedure, we will tag documents after creating the first training round.

    4. Click the Save button.

      The Relativity login credentials dialog will open.

    5. Type your Relativity User Name and Relativity Password, and then click the Use these credentials button.

      After your Relativity credentials are verified, the Classifier screen will open.

  4. Click the Refresh button.

  5. Click the New Training Round button. The Create a Training Round dialog will open.

    1. Click the Round Types dropdown menu, and then click Influential. Note: You must choose the Influential training round type for the first training round.

    2. In the Size of Training Round (Max 5000) text field, type the number of documents that you want to include in the first training round.

      Note

      As a general rule, 200 documents will provide good classifier training and will help when choosing documents to include in subsequent training rounds.

    3. Click the Continue button. The training round will begin.

    4. After the training round documents have been selected, click the Refresh button.

      The classifier Progress graph and Training pane will refresh to show the results of the first training round.

      Note

      After creating a CMML classifier with a Connect Tag, the Connect Tag connect_tags_icon.png icon and Relativity field name will display in the classifier information field:

      image8.png

      In this example, the Relativity field name is Traveler Connect Gas. The information on the Classifier screen is identical to the information on the Classifier card (see x-ref).

  6. Using the field chosen for the Connect Tag, review the documents in Relativity.

    Note

    The Review button will be greyed-out in Brainspace when using Connect Tags from Relativity. Any tagging done in Relativity will overwrite the tagging in Brainspace, so the ability to review documents in Brainspace is disabled. Note that the definition of a Connect Tag implies that Relativity is the database of record.

  7. Click the Refresh button.

  8. Click the Train Now button.

    When the Classifier screen refreshes, the first training round has completed, a predictive model has been built, and every document in the classifier has been scored and assigned a predictive rank. By default, the document list displays all of the untagged documents in the CMML classifier, with the highest scoring document on the top of the list and the lowest scoring document on the bottom of the list.

  9. Repeat steps 4, 5, 6, 7, and 8 to create all subsequent training rounds.

    Note

    For the second and all subsequent training rounds, you can choose any of the available training round types.

After creating a Connect Tag, you can update it manually or create a schedule to update the Connect Tag automatically.

Schedule an Automatic Update to a Connect Tag

After creating a CMML classifier using a Connect Tag , you can update it manually or create a one-time or recurring schedule for future Connect Tag updates.

To schedule an automatic update to a Connect Tag:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Schedule Update icon:

    image9.png

    The Scheduler dialog will open.

  4. To pull the reviewed field from Relativity into the Connect Tag in Brainspace and update the predictive model, do one of the following:

    • To update the tag immediately, click the Update Now button.

    • To schedule a tag update for a specific date, click the Schedule Next Update text field, click a date in the calendar, and then click the Apply button.

    • To schedule a recurring update, toggle the Recurring switch to the On position, choose Daily, Twice Daily, or Weekly, and then select times and days as prompted.

  5. Click the Save button.

When the scheduled time arrives, Brainspace automatically pulls the Connect Tag from the Relativity or third-party database and rebuilds the predictive model. You can also export the scores from Brainspace to the Relativity or third-party database.

Pull a Connect Tag

After creating a CMML classifier using a Connect Tag , you can pull the Connect Tag into Brainspace from a Relativity Workspace manually as needed.

To pull a Connect Tag into Brainspace from a Relativity Workspace:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Pull Tag icon:

    image10.png

    Brainspace will begin pulling the Connect Tag.

    Note

    You will be prompted to enter your Relativity credentials if you have not saved your username and password for the Relativity Plus connector.

After pulling the Connect Tag into Brainspace, you can schedule automatic Connect Tag updates or continue training the CMML classifier.

Set the Automode Polling Interval for a CMML Training Round with a Connect Tag

At any time after creating a CMML classifier with a Connect Tag , you can set the Automode polling interval to set the frequency for Brainspace to request review scores.

To set the Automode polling interval:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings icon:

    image11.png

    The Training Round Settings dialog will open.

  4. In the Polling Interval text fields, type a value for hours and a value for minutes.

  5. Click the Apply button.

    Note

    If prompted, enter your Relativity credentials.

After setting the polling interval, Automode will pull all reviewed documents from Relativity into the CMML classifier as scheduled.

Change the Automode Training Round Type

At any time after creating a CMML classifier with a Connect Tag and running the first CMML classifier training round , you can change the current Automode training round type at any time.

Note

For the first training round, you must use the Influential training round type.

To change the Automode training round type:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings icon:

    image12.png

    The Training Round Settings dialog will open.

  4. In the Document Selection, click the Diverse Active Learning or Top Scores radio button.

  5. Click the Apply button.

    Note

    If prompted, enter your Relativity credentials.

After setting the polling interval, Automode will pull all reviewed documents from Relativity into the CMML classifier as scheduled.

Create an Adaptive Control Set for an Existing CMML Classifier

After creating a CMML classifier or any time during the classifier training process, you can create a CMML control set to calculate precision and recall statistics.

Note

At any time during training, you can recycle the control set , view control set documents in Analytics , convert the control set to a training round , or modify the control set’s settings.

To create a control set for an existing CMML classifier:

  1. Click the Supervised Learning tab.

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Use Control Set for Recall and Precision Statistics button. The Create Control Set dialog will open.

  4. In the Create Control Set dialog, do the following:

    1. In the Cost per Document text field, change the monetary symbol if necessary, and then type a monetary value.

    2. Set the Minimum Recall.

    3. Set the Margin of Error.

    4. Set the Confidence Level.

    5. Set the Preliminary Richness.

    6. In the Control Set to Review text field, type the number of documents to include in the control set.

    7. Click the Submit button.

      The Classifier screen will refresh.

  5. Click the Tag Control Set to Set Statistics button. The Document Viewer will open.

  6. Tag all of the documents in the control set.

    After you have tagged the last document, the Document Viewer will close automatically, and the CMML classifier screen will refresh.

After creating the CMML control set, you can review the control set’s training progress graph , recall and review graph , depth for recall graph , and training statistics for each training round.

Modify a CMML Adaptive Control Set

After creating a CMML control set and performing the initial review of the precision and recall statistics, you can modify the control set’s settings at any time during training.

To modify a CMML control set:

  1. Click the Supervised Learning tab.

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Training Statistics tab.

  4. In the Control Set pane, click the Modify button.

    image13.png

    The Modify Control Set dialog will open.

  5. Do the following:

    1. Set the Minimum Recall.

    2. Set the Margin of Error.

    3. Set the Confidence Level.

    4. In the Control Set to Review text field, type the number of documents to include in the control set.

    5. Click the Submit button. The Classifier screen will refresh.

  6. Click the Continue Review button. The Document Viewer will open.

  7. Tag all of the new documents in the control set.

    After you tag the last document, the Document Viewer will close and the Classifier screen will refresh.

After modifying a control set, you can change the control set settings again at any point during training.

Convert a CMML Adaptive Control Set to a Training Round

After creating and using a control set to measure the health of a predictive model, you can convert the control set to a training round to retain reviews on the control set’s documents when the control set is no longer needed or when a new control set is required.

Note

Before converting a control set to a training round, close any active training rounds by disabling Automode , if enabled, and clicking the Train Now button.

Note

After recycling a control set, you will not be able to view training statistics until you create a new CMML control set and tag documents.

To convert a CMML control set to a training round:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Training Statistics tab.

  4. In the Control Set pane, click the Convert control set to training round button.

    image14.png

    A confirmation dialog will open.

  5. Click the Recycle button.

    The confirmation dialog will close.

  6. Click the Refresh button. The Classifier screen will refresh.

After converting a CMML control set to a training round, can continue training the CMML classifier and create a new CMML control set.

View CMML Adaptive Control Set Documents in Analytics

After creating a CMML control set, you can view its documents in Analytics. To view CMML control set documents in Analytics:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Click the Training Statistics tab.

  4. In the Control Set pane, click the View Control Set Documents in Analytics button.

    image15.png

    The Analytics Dashboard will open.

After opening the control set’s documents in Analytics, you can search and analyze the documents using any of the Analytics features in Brainspace (see Cluster Wheel, Communications, Conversations, Thread Analysis, Results pane, Document Viewer).

Pull Reviewed Document Tags into a CMML Adaptive Control Set

After documents have been tagged in a Relativity, you can pull reviewed document tags from a Relativity Workspace into a CMML control set.

To pull reviewed document tags into a CMML control set:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. Log in to Relativity, if prompted, and then click the Get Review from RelativityPlus button.

    After Brainspace pulls the tags, a control set training round will initiate automatically.

After pulling document tags from Relativity into Brainspace, you can continue training the CMML control set , convert the CMML control set to a training round , or view control set documents in Analytics.

Enable Automode for an Existing CMML Classifier

Automode can be enabled when creating a CMML classifier or after creating a CMML classifier. The procedure below describes how to enable Automode for an existing classifier.

To enable Automode for an existing CMML classifier:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings Training_Round_Settings_Icon.png icon. The Training Round Settings dialog will open.

  4. Toggle the Enable automatic training switch to the On position.

  5. In the Round Size text field, type the number of documents to include in the training rounds.

  6. Select either the Diverse Active Learning or Top Scores radio button.

    Note

    For the first training round, you must choose the Influential training round type. After the first training round, your selection will apply to all subsequent training rounds.

  7. Click the Apply button.

    The Training Round Settings icon will change to indicate that Automode is enabled Automode_Enabled_icon.png, and the first automatic training round will initiate automatically.

After enabling Automode, new training rounds will initiate automatically after manually tagging documents in the CMML classifier. For subsequent training rounds, you can modify the Automode training round size or disable Automode at any time.

Disable Automode for a CMML Classifier

After enabling Automode Automode_Enabled_icon.png for a CMML classifier, you can disable the automatic training feature at any time to create subsequent training rounds manually.

To disable automatic training for an existing CMML classifier:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings icon:

    image16.png

    The Training Round Settings dialog will open.

  4. Toggle the Enable automatic training switch to the Off position.

  5. Click the Apply button.

    The Classifier screen will refresh, and the Training Round Settings icon will change to indicate that Automode is disabled Training_Round_Settings_Icon.png.

After disabling Automode, you must create subsequent training rounds manually.

Modify Automode Round Size

After enabling Automode Automode_Enabled_icon.png and setting the round size for automatic training rounds, you can increase or

decrease the number of documents in subsequent automatic training rounds at any time.

To increase or decrease the number of documents in automatic training rounds:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings icon:

    image11.png

    The Training Round Settings dialog will open.

  4. In the Round Size text field, type a new number to increase or decrease the number of documents in automatic training rounds.

  5. Click the Apply button.

    The Training Round Settings dialog will close, and the Classifier screen will refresh.

The next training round will include the revised number of documents.

View the Progress Graph for a CMML Classifier

After creating a CMML classifier and running the first training round, the Progress graph will show the total number of positive documents (y-axis) relative to the total number of coded documents (x-axis).

To view a CMML classifier Progress graph:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

You can also view all of the CMML classifier’s coded documents, all positive documents , or all negative documents in Analytics.

If a control set has been created and reviewed for the classifier, you can review the control set’s training progress graph, recall and review graph , depth for recall graph , and training statistics for each training round.

View All Coded CMML Classifier Documents in Analytics

After creating a CMML classifier and running the first training round, the Progress graph will display the total number of positive documents (y-axis) relative to the total number of coded documents (x-axis).

To view a CMML classifier’s coded documents in Analytics:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Coded button:

    image17.png

    The Analytics Dashboard will open.

After viewing a CMML classifier’s coded documents in Analytics, you can also view only positive documents or only negative documents in Analytics.

View All Positive CMML Classifier Documents in Analytics

After creating a CMML classifier and running the first training round, the Progress graph will display the total number of positive documents (y-axis) relative to the total number of coded documents (x-axis).

To view a classifier’s positive documents in Analytics:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Positive Docs button:

    image18.png

    The Analytics Dashboard will open.

After viewing a CMML classifier’s positive documents in Analytics, you can also view all coded documents or only negative documents in Analytics.

View All Negative CMML Classifier Documents in Analytics

After creating a CMML classifier and running the first training round, the Progress graph will display the total number of positive documents (y-axis) relative to the total number of coded documents (x-axis).

To view a CMML classifier’s negative documents in Analytics:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Negative Docs button:

    image19.png

    The Analytics Dashboard will open.

After viewing a CMML classifier’s negative documents in Analytics, you can also view all coded documents or only positive documents in Analytics.

View the Recall/Review Graph for a CMML Control Set

After creating a CMML classifier, creating and reviewing a control set, and running the first training round, the Progress graph will display the total number of positive documents (y-axis) relative to the total number of coded documents (x-axis).

To view a CMML control set Recall/Review graph:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Recall/Review tab.

    The Recall/Review graph will display the recall percentage relative to the review percentage.

After viewing the Recall/Review graph and associated statistics, you can update cost and currency information and analyze the total cost to review the control set.

Update Cost and Currency for a CMML Control Set

After creating a CMML control set, you can update the currency type and cost per document value at any time to analyze the cost to review a control set.

To update cost and currency for a CMML control set:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Recall/Review tab.

  4. Click the Update Cost and Currency icon:

    image20.png

    The Update [classifier name] dialog will open.

  5. Select a new currency symbol from the dropdown list, if required, and then type a new monetary value in the Cost per Document text field.

  6. Click the Save Changes button.

    The Recall/Review tab will refresh with new values in the Total Cost to Review field.

After updating the cost and currency for a CMML control set, you can adjust the number of documents in a control set by converting the control set to a training round and then creating a new CMML classifier , and you can analyze the cost to review documents at different intervals on the Recall/Review graph.

Analyze a CMML Control Set’s Total Cost to Review

After running a training round, you can analyze the total cost to review documents in a CMML classifier at different points in the review process by changing the % Review value on the x-axis of the Recall/Review graph.

To analyze the a CMML control set’s total cost to review:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Recall/Review tab.

  4. Click the Plus or Minus icon:

    image21.png

    The vertical % Review bar will move on the x-axis.

After moving the vertical % Review bar, the Total Cost to Review, Documents to Review, Documents to Review with Families, and Review fields will refresh.

View the Depth for Recall Graph for a CMML Control Set

After running a training round, you can view the depth for recall percentage for each training round displayed in the Depth for Recall graph.

To view the Depth for Recall percentage for a training round:

  1. Click the Supervised Learning tab.

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Depth for Recall tab.

  4. Hover over a node in the graph:

    image22.png

The percent of documents value will display for the selected round.

View Control Set Information and Training Statistics

After running a training round, you can view and manage control set information and training statistics for each training round.

To view control set information and training statistics:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open, and the Progress graph will display by default.

  3. Click the Training Statistics tab.

    The Control Set pane and Round # panes will display.

In the Control Set pane, you can modify the control set settings, view control set documents in Analytics , or convert the control set to a training round. In the Round # pane, you can review training statistics and download CMML reports for each training round in the control set.

Create a CMML Classifier from a Portable Model

After creating a portable model, you can use it to create a new CMML classifier. To create a CMML classifier using a portable model:

  1. Click the Supervised Learning tab.

    The Supervised Learning screen will open.

  2. Click the New Classifier button.

    image23.png
  3. In the dropdown list, click the CMML option.

    The New CMML Classifier dialog will open.

    1. Click Import portable model...:

      image24.png
    2. Do one of the following:

      • To upload a portable model that is not in Brainspace, click Upload Portable Model button, and then navigate to the *.csv file in your directory.

      • To use an existing portable model in Brainspace, click Choose an Existing Portable Model, and then click a portable model in the pop-up dialog. Only portable models that have been provided to your group from the Brainspace Portable Model library will be available. The Portable Model library is controlled by the Administrator panel.

Create a Top Scores Training Round for an Existing CMML Classifier

Any time after creating a CMML classifier and running the first training round, you can choose the Top Scores training-round type to train the classifier, even if using Automode.

Note

When creating a CMML classifier, you must use either a manual round (picking and tagging documents), an ad hoc round, or the Influential training round type for the first training round if you are not using a portable model to create the CMML classifier.

To enable Automode to select Top Scores training rounds for an existing CMML classifier:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Training Round Settings icon:

    image25.png

    The Training Round Settings dialog will open.

  4. Toggle the Enable Automatic Training switch to the On position.

  5. Click the Top Scores radio button.

  6. Click the Apply button.

The Training pane will refresh to display the Top Scores training round.

Choose a Top Scores Training Round for an Existing CMML Classifier

To choose a Top Scores training round for an existing CMML classifier:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. If no Training round is currently active, click on the New Training Round button

    image26.png

    The Create a Training Round dialog will open.

  4. Select Top Scores from the dropdown menu

  5. Provide the number of documents to be chosen, and then click the Continue button.

    The Training pane will refresh to display the Top Scores training round.

The number of documents requested will be chosen from the top scoring documents for this Classifier. Review will continue as normal.

After Brainspace runs the Top Scores training round, you can run additional training rounds using any of the available training round types , view training data and graphs, view Model Insights , create a portable model, and view training round documents in Analytics.

Create a CMML Classifier Using a Focus

It is possible to build a CMML classifier against just a focus so that all training and scoring will occur with just the documents in the focus. This allows for a more concentrated effort, especially if there are a large number of documents in the dataset that are clearly not of interest.

For example, a dataset called English Wikipedia contains 4.3 million documents, but only 21,000 of those documents are related to the concept “Apollo.” Given that the rest of the documents in the dataset are unrelated to Apollo, there is no need to consider them for training or assign predictive ranks to them. When using a Focus with the Relativity integration, scores for only the 21,000 documents will be overlaid in Relativity each time the model is updated instead of the entire corpus of 4.3 million documents, a dramatic improvement in processing time, Relativity interaction and bandwidth.

After creating a public Focus, you can use it to create a CMML classifier.

Note

The public Focus must be successfully built before you can use it to create a CMML classifier.

To create a CMML classifier using a public Focus:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click the New Classifier button.

    image27.png
  3. In the dropdown list, click the CMML option.

    The New CMML Classifier dialog will open.

    1. Click Choose a Focus:

      image28.png

      The Select Focus dialog will open.

    2. Click a Focus in the list of public Focusses.

Continue with the classifier as normal. All training documents will be drawn only from the documents in the focus, and only the documents in the focus will be assigned scores.

Create a Validation Set

After evaluating a CMML classifier , you can create a validation set to estimate the proportion of untagged documents that would be tagged positive if they were reviewed.

To create a validation set:

  1. Click the Supervised Learning tab. The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the text filed, type the number of documents to include in the validation set, and then click the Continue button.

  4. In the Training pane, click the Create Validation Set button:

    image29.png

    The Create a Validation Set dialog will open.

  5. In the text filed, type the number of documents to include in the validation set, and then click the Continue button.

    The Classifier screen will refresh, and the validation build will begin.

    Note

    A validation set can include an unlimited number of documents.

After creating the validation set, you will review the validation set in the same way you would review a training round. After completing the review, you will receive statistics. You can also convert the validation set to a training round to continue training the classifier.

End CMML Classifier Training

Determining when to end training a CMML classifier is not an obvious decision. Setting aside questions of cost and time and simply looking at the model itself to decide when to end training a CMML classifier, the following are a few criteria or steps to consider:

  • The purpose has been served. Sometimes, it is useful just to know if certain documents exist. Finding them completes this task, and there is no further reason to invest more training. Other times, you just want to find enough documents to tell the story of what happened, and having found enough, the task is done.

  • Exhaust ways of finding good documents. Do not stop training until you’ve used up your good ideas. If you still have good ideas about how to find new kinds of training examples manually (conceptual search, metadata search, looking at particular custodians, etc.) then use them to create manual training batches. If active learning is not coming up with many responsive documents, consider training on top-ranked documents using the Top Scores training round.

  • Confirm declining precision of top-ranked unreviewed documents. If reviewing the top-ranked documents is producing few to no responsive document, then it’s possible the model is reaching an end to its ability to suggest new likely to be responsive documents. Using a round or few of Diverse Active (if none have been used previously) may help to fill in the gaps or hedge against human bias.

  • Confirm declining score of top-ranked unreviewed document. As the score of the remaining top-ranking documents drops, the likelihood of them being responsive is diminishing.

  • Building a focus on top-ranked documents (with cutoff chosen to be below the vast majority of positive documents). Using the wheel to look for clusters that appears to be new types of information on the topic of interest. Tagging both positive and negative documents from those clusters may feed useful information into the model.

  • Create a validation set. If the validation set suggests there are few remaining responsive documents to be found, then it may be that training should complete.

  • Create a control set. Keep in mind that control sets can be comparatively expensive, but if a stronger measure of the model is needed, then they are a powerful tool to deploy. With the provided measurements of recall, precision and other metrics, it can be easier to see if the model is progressing over rounds, or appears to be approaching completion.

Create a CMML Training Round Using a Public Notebook

After creating a CMML classifier, you can use untagged documents in a public notebook to create a training round for a CMML classifier.

Note

A public notebook can only be used once for classifier training. If documents in a notebook have been tagged in a previous training round, the round in the Training pane will display “closed with no documents coded” if you attempt to use the notebook after its documents have been tagged in training.

To create training round using a public notebook:

  1. Click the Supervised Learning tab.

    image30.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the New Training Round button.

    The Create Training Round dialog will open with the Notebook round type selected by default.

  4. Click the Notebook Name field. A list of public notebooks will open.

  5. Click a notebook in the list, and then click the Continue button.

    The Create Training Round dialog will close, and the notebook documents will be loaded into the CMML classifier.

  6. In the Training pane, click the Train Now button. The training round will begin.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during or after CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags, compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create a Random Training Round for an Existing CMML Classifier

After creating a CMML classifier, you can use a Random training round to train the classifier. When you use Random training, Brainspace chooses a random sample of documents in the dataset that have not been used in previous training rounds.

To create a Random training round:

  1. Click the Supervised Learning tab.

    image32.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the New Training Round button.

    The Create Training Round dialog will open.

  4. Click the Round Types dropdown menu, and then click Random.

  5. In the Size of Training Round (Max 5000) text field, type the number of documents to include in the training round.

  6. Click the Continue button.

    The Create Training Round dialog will close.

  7. In the Training pane, click the Train Now button.

    The training round will begin.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags , compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create a Fast Active Training Round for an Existing CMML Classifier

After creating a CMML classifier, you can use a Fast Active training round to accelerate training for large batches of documents. When you use Fast Active training, Brainspace favors documents that appear in clusters distant from each other and from those of previous training documents, documents that are similar to many other dataset documents, and documents that have a predictive score near 0.5 under the current predictive model.

To create a Fast Active training round:

  1. Click the Supervised Learning tab.

    image33.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the New Training Round button.

    The Create Training Round dialog will open.

  4. Click the Round Types dropdown menu, and then click Fast Active.

  5. In the Size of Training Round (Max 5000) text field, type the number of documents to include in the training round.

  6. Click the Continue button.

    The Create Training Round dialog will close.

  7. In the Training pane, click the Train Now button.

    The training round will begin.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags, compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create a Diverse Active Training Round for an Existing CMML Classifier

After creating a CMML classifier , you can use a Diverse Active training round to train the classifier. When you use Diverse Active training, Brainspace favors documents that are different from each other and from previous training documents, documents that are similar to many other dataset documents, and documents that have a score near 0.5 under the current predictive model.

Note

When Automode is enabled, you can choose Diverse Active to train the classifier automatically for every training round.

To create a Diverse Active training round:

  1. Click the Supervised Learning tab.

    image34.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the New Training Round button.

    The Create Training Round dialog will open.

  4. Click the Round Types dropdown menu, and then click Diverse Active.

  5. In the Size of Training Round (Max 2000) text field, type the number of documents to include in the training round.

  6. Click the Continue button.

    The Create Training Round dialog will close.

  7. In the Training pane, click the Train Now button.

    The training round will begin.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags , compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create an Influential Training Round for an Existing CMML Classifier

After creating a CMML classifier using all of the documents or a subset of documents in a dataset, you can use an Influential training round to train the classifier. When you use Influential training, Brainspace favors documents that are different from each other, different from previous training documents, and similar to many other dataset documents.

Note

For the first training round, if you want to automatically select documents to be used for training, you must use an Influential training round.

To create an Influential training round:

  1. Click the Supervised Learning tab.

    image35.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the New Training Round button.

    The Create Training Round dialog will open.

  4. Click the Round Types dropdown menu, and then click Influential.

  5. In the Size of Training Round (Max 5000) text field, type the number of documents to include in the training round.

  6. Click the Continue button.

    The Create Training Round dialog will close.

  7. In the Training pane, click the Train Now button.

    The training round will begin.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags, compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Create an Ad-Hoc Training Round for an Existing CMML Classifier

After creating a CMML classifier using all of the documents or a subset of documents in a dataset, you can use an Ad Hoc training round to train the classifier. When you use Ad-Hoc training, you select documents manually to create the training round.

To create an Ad-Hoc training round:

  1. Click the Supervised Learning tab.

    image32.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Train Now button.

    The classifier will rebuild the model with any documents that have been manually tagged.

After the training round completes, the Training pane will refresh to show the notebook training round and associated statistics. At any point during CMML classifier training, you can open the classifier in Analytics, add additional training documents, add tags, compare training rounds , or download training round reports. After you have finished training the CMML classifier, you can use it to create a validation set or a control set.

Filter Classifiers

After creating multiple classifiers in Brainspace, you can filter classifier cards to find a specific classifier. To filter classifiers:

  1. Click the Supervised Learning tab.

    image32.png

    The Supervised Learning screen will open.

  2. Click the Filter Classifiers text field, and then type a classifier name. The Classifier cards will begin filtering as you type the classifier name.

After finding a specific classifier, you can click the classifier card to view training statistics or to continue classifier training.

Edit a Classifier’s Name

At any time after creating a CMML classifier, you can edit or completely change the classifier’s name. To edit a classifier’s name:

  1. Click the Supervised Learning tab.

    image36.png

    The Supervised Learning screen will open.

  2. In the classifier card, click the Edit Classifier icon:

    image37.png

    The Edit Classifier dialog will open.

  3. Edit the classifier’s name, and then click the Submit button.

After editing the classifier’s name, you can click the classifier card to view training statistics or to continue classifier training.

Delete a Classifier

At any time after creating a classifier, you can delete it from Brainspace.

Note

Deleting a classifier permanently removes all training statists and training documents from Brainspace.

To delete a classifier:

  1. Click the Supervised Learning tab.

    image38.png

    The Supervised Learning screen will open.

  2. In the classifier card, click the Edit Classifier icon:

    image39.png

    The Delete Classifier dialog will open.

  3. Click the Delete button.

The Delete Classifier dialog will close, and the Supervised Learning screen will refresh.

View Classifier Documents in Analytics

At any time after creating a classifier and running the first training round, you can view the classifier’s documents in the Analytics Dashboard.

To view a classifier’s documents in the Analytics Dashboard:

  1. Click the Supervised Learning tab.

    image40.png

    The Supervised Learning screen will open.

  2. In the classifier card, click the View Analytics icon:

    image41.png

    The Analytics Dashboard will open.

After opening the classifier’s documents in the Analytics Dashboard editing the classifier’s name, you can click the classifier card to view training statistics or to continue classifier training.

Switch a Brainspace v6.2.x Dataset’s Relativity Connector to Relativity Plus

To use an Adaptive Control Set (ACS) or Automode with a CMML classifier, you must use a Relativity Plus connector. If you have an existing dataset that uses the legacy Relativity connector, you can switch the dataset’s connectors to Relativity Plus in Brainspace v6.3.x.

This process involves creating two dedicated Relativity Plus connectors in Brainspace v6.3.x, one for switching existing Brainspace v6.2.x datasets from the legacy Relativity connector to Relativity Plus and a second Relativity Plus connector for creating new Brainspace v6.3 datasets going forward that will use CMML classifiers with ACS or Automode.

Note

If you will be performing an incremental build on a dataset built with the legacy Relativity connector that now has the Relativity Plus connector, return to the Relativity Plus connector dialog and deselect the following overlay fields:

  • brs_near_dup_status

  • brs_exact_dup_status

  • brs_strict_dup_status

Note

If working with a dataset that was originally built with the legacy Relativity connector that now has the Relativity Plus connector, you will not be able to perform an incremental build until a full build is completed.

To switch an existing Brainspace v6.2.x dataset from Relativity (legacy) to a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    image42.png
  2. In the Connectors screen, create a new Relativity Plus connector with the settings appropriate to your environment for Relativity Base Document URL, API Host Port Number, API Host Machine Name, Authorization Endpoint URL, Token Endpoint URL, Client ID, Client Secret, Concurrency, and, in the Advanced Settings dialog, enable the Use Legacy Field Names.

  3. In the Datasets screen, locate the dataset with the legacy Relativity connector, and then click the Settings icon:

    image43.png

    The Dataset Settings dialog will open.

    Note

    If you do not have a saved field map for the dataset, toggle the switches to reconfigure the dataset in the Dataset Settings dialog , navigate to the Field Mapping dialog , map the fields , save the field map, and then return to the Dataset Settings dialog.

  4. In the Data pane, click the Delete icon:

    image44.png

    The Remove Data Source from [Dataset Name] dialog will open.

  5. Click the Yes, Remove button. The Dataset Settings dialog will refresh.

  6. Click the Choose Connector button:

    image45.png

    The Choose Connector dialog will open.

  7. Click the Relativity Plus connector that you created.

  8. In the Choose Connector dialog, click the new Relativity Plus connector in the list.

    Note

    If you are not logged in to Relativity, you will be prompted to enter your Relativity username and password before you will be able to add the connector. If you are already logged in to Relativity, the Select a Source dialog will open.

  9. In the Select a Source dialog, click the appropriate Relativity Workspace folder, and then click the Save and Proceed button.

    The Relativity Saved Search dialog will open.

  10. If the dataset was created using all documents in the Relativity Workspace, click the Proceed button.

    Note

    If the dataset was created using a subset of documents, click the appropriate Saved Search in the list before clicking the Proceed button. After clicking the Proceed button, the License Checks dialog will open and run a check on your license document limits.

  11. After Brainspace verifies the license document limits, click the Proceed button. The Field Mapping dialog will open.

  12. Unmap any of default fields, if necessary, map any additional fields, and then click the Continue button.

    The Dataset Settings dialog will refresh.

  13. Verify the settings, click the Save button, and then click the Build button. The Dataset Build Options dialog will open.

  14. Choose a build option, and then click the Run This Build Type button. The Schedule Build dialog will open.

  15. Click the Build as soon as possible button.

    Note

    If you choose to build the dataset in the future, click the Schedule Build Time field, select a date and time, and then click the Save button. The Datasets page will refresh and show the Dataset Queue build in progress. While the build is in progress, you can click the View Status button to view the build steps in progress. For information on each step in the build process, see (Build StepsDatasets). After the build completes successfully, the new dataset will move from the Dataset Queue to the list of active datasets in the Datasets page, and you are ready to create work products with it.

After changing the dataset’s connector to Relativity Plus, you are ready to search for documents and to train CMML classifiers using the dataset.

Portable Models
Concept Topics

Portable Model Overview

Brainspace’s Predictive Coding and Continuous Multimodal Learning (CMML) classifiers support generating a predictive model on a dataset after coding example documents. The predictive model can then be used on later classifiers (usually in different datasets) to score all of the documents in the classifier to prioritize and classify them.

A portable model is a *.csv (comma-separated values) file containing a simplified version of a CMML predictive model. Each line in the *.csv file is a feature description followed by a portable weight between -100 and 100.

You can create a portable model in three ways:

  • Exporting a predictive model from a classifier within an active dataset

  • Starting with an existing portable model and editing it

  • Manually creating a *.csv file in the proper format

When exporting an existing predictive model as a portable model, only the most influential features are retained and the coefficients are scaled and rounded between -100 and 100. An exported portable model can be either saved outside of Brainspace as a *.csv file or within Brainspace in the portable model library, managed in the Administration panel.

An existing portable model in *.csv file form can be edited, or curated, before importing it into a new dataset. Editing can drop features, change the weight of features, and add features. Features can be words or phrases, as well as metadata values described by feature descriptions.

Since a portable model is simply a *.csv file with a particular format, it is also possible to create a portable model file manually , without beginning from an existing exported portable model.

An imported portable model will be converted to a predictive model. The predictive model can then be used just as if it was trained on that dataset. It can also be updated with training data from the dataset to further improve it.

Portable Model File Formatting and Editing

A portable model is a *.csv (comma-separated values) file. The first row of the file is a header with the column headings “term” and “weight.” The remaining lines each contain a single pair consisting of a feature description in the first column and a portable weight in the second column. Below are examples of feature-weight pairs:

  • purple, +5

  • augmented intelligence, +7

  • échantillon aléatoire, +2

  • 机器学习, +3

  • “[“”cc””,””fred jimes <jimes@foo.org (jimes@foo.org)>””]”, +1

  • “[“”emailclient””,””outlook””]”, +2

  • “[“”created-year-month-day-hour””,””2001120412″”]”, -3

The first four examples are textual features, while the last three are metadata features. As shown above, the feature description for a term (a textual feature) is just the term (word or phrase) itself. You can enter terms manually in a *.csv file or convert an existing lists of keywords to *.csv format.

Metadata feature descriptions require more care. It is important both to know what derived features are present in your dataset of interest, and to observe the proper formatting of the JSON array used for metadata feature descriptions. When using metadata features in a portable model, we recommend starting with an exported predictive model from the dataset of interest, or one with the same fields and import configuration.

Portable weights must be integers between -100 and 100, including 0. There should be no decimal point, and scientific notation should not be used (so write 100, not 1E+2).

Portable Model Character Encoding

Portable model files use the UTF-8 character encoding. The usual 7-bit ASCII encoding is equivalent to UTF-8 for the characters typically used in English, so you can usually ignore character encoding issues when editing portable model files that contain only English.

For portable model files that contain non-ASCII UTF-8 characters, care needs to be taken to both import the portable model *.csv file using UTF-8 encoding and to export the *.csv file preserving the UTF-8 encoding. This may be challenging in some versions of Microsoft Excel, for instance. You may want to consider using a text editor or *.csv file editor with explicit UTF-8 support.

New Portable Model

When creating a new portable model, provide a name, select a *.csv file to upload, and optionally select which user groups can access the portable model.

Task Topics

Create a Portable Model from a CMML Classifier

After creating a CMML classifier and running training rounds, you can convert any completed training round to a portable model.

To create a portable model from a CMML classifier training round:

  1. Click the Supervised Learning tab.

    image1.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Portable Model Actions icon:

    image2.png

    The Portable Model Actions dialog will open.

  4. To save the portable model to the Brainspace library, Click the Save to Portable Models button.

    The Save Portable Model dialog will open.

  5. Type a name for the portable model.

  6. Toggle one or more Portable Model Groups switches to the On position.

  7. Click the Save button.

After creating the portable model, you can use it to create a new CMML classifier, or you can download the Portable Model as a *.csv file.

Download a Portable Model as a *.csv File

After running CMML classifier training rounds, you can download a *.csv file for any of the training rounds.

To download a Portable Modal as a *.csv file:

  1. Click the Supervised Learning tab.

    image3.png

    The Supervised Learning screen will open.

  2. Click a Classifier card.

    The Classifier screen will open.

  3. In the Training pane, click the Portable Model Actions icon:

    image4.png

    The Portable Model Actions dialog will open.

  4. To save the portable model to the Brainspace library, click the Save to Portable Models button.

    The Save Portable Model dialog will open.

  5. Type a name for the portable model.

  6. Toggle one or more Portable Model Groups switches to the On position.

  7. Click the Download button.

Model Insights
Concept Topics

Model Insights Overview

Model Insights provides details about the terms and phrases that are influencing a classifier during training. You must create at least two training rounds for a classifier before using Model Insights.

Insights compares the features and weights of a selected portable model to those of a prior portable model or optionally to a portable model.

Note

Rank comparisons are based on the sort order of the features.

Insights Dialog

The Insights dialog includes the following features:

image1.png

Callout 1: Select a model from an earlier round to compare with the selected training round. In this example, we are comparing round 4 to round 5.

Callout 2: Toggle the switch to add portable models to the To dropdown menu (callout 1).

Callout 3: Filter the terms list for a specific term or text-string in the training round selected for comparison (callout 1).

Callout 4: Download the Insights comparison report for the rounds selected to compare (callout 1).

Callout 5: Click the filter buttons to view terms that were added (relatively more predictive of target category), terms that were removed (relatively less predictive of target category), terms that increased or decreased in rank, and terms that have not changed in rank.

Callout 6: To view impactful terms or text-strings, click the column header to sort the columns by increasing or decreasing values:

  • Prev. Rank and Prev. Wt: Impact of the term or text-string in the model in the (compare) To column

  • Rank and Weight: Current impact of term or text-string in the selected model

  • Rank Diff. and Wt Diff: Change in impact of term or text-string between the two models

    Note

    Rank comparisons are based on the sort order in the portable model file.

Task Topics

View Model Insights for Portable Models

After creating two or more portable models in Brainspace, you can compare rank and weight values between two classifiers. You can also use Model Insights to compare two training rounds in a single CMML classifier.

To view Model Insights information for classifiers:

  1. Create at least two portable models.

  2. Click the Supervised Learning tab.

    image2.png

    The Supervised Learning screen will open.

  3. Click the Insights icon:

    image3.png

    The Insights dialog will open.

  4. Click a choice in the Compare and To dropdown menus.

    The Insights dialog will refresh to display terms and associated ranks and weights for the Insights comparison, with the Added Terms selected by default in the Filters field.

You can compare results for one or more of the filters by clicking and unclicking additional Filter buttons. You can also search for terms and text-strings to refine filter results and download a Insights comparison report to record and preserve the comparison results.

View Model Insights for Training Rounds

After creating at least two CMML classifier or control set training rounds, you can compare training data for a specific range of training rounds.

To view Model Insights information:

  1. Create a classifier, and then run at least two training rounds.

  2. Click the Supervised Learning tab.

    image4.png

    The Supervised Learning screen will open.

  3. Click a Classifier card.

    The Classifier screen will open.

  4. After at least two training rounds complete, click the Show Insights button. The Insights dialog will open.

Terms with the plus (+) icon in the Change column have been added to the model. Terms with the minus (-) icon in the Change column have been removed from the model.

Compare Training Rounds

To compare the current training round with an earlier training round, navigate to the Insights dialog, click the To dropdown arrow, and then click a training round in the list:

image5.png

Compare a Training Round with a Portable Model

To compare the current training round with a Portable Model, navigate to the Insights dialog, and then toggle the Compare with Portable Models switch to the On (green) position:

image6.png

Download Model Insights Comparison

To download an Insights comparison to a *.csv file, navigate to the Insights dialog, and then click the Download Insights Comparison icon:

image7.png
Datasets and Connectors
Datasets
Concept Topics

Datasets Screen

After clicking the Administration option in the user menu, the Datasets management screen will open by default. The Datasets management screen:

Datasets_screen_map.png

Callout 1: Manage datasets, users and groups, connectors, services, and portable models.

Callout 2: Search for a dataset name.

Callout 3: Add a new dataset to Brainspace.

Callout 4: Download a dataset management report.

Callout 5: View the dataset’s name, activity status, and identification number.

Callout 6: Manage dataset settings, download dataset reports, disable the dataset, manage tags, and open the dataset in the Analytics Dashboard.

Callout 7: View the dataset’s connector type, name, data source, number of documents in the dataset, dataset groups, the percentage of documents ingested incrementally, and build status.

Callout 8: View the datasets list by activity status.

Datasets Display Screen

When you log in to Brainspace, the Datasets display screen will open. The Dataset display screen includes the following features:

Datasets_display_screen.png

Callout 1: Click the Brainspace logo on anywhere in Brainspace to open the Datasets display screen.

Callout 2: Type a dataset name in the text field to locate a specific dataset.

Callout 3: Click the Hide icon to view only unpinned datasets.

Callout 4: View the number of pinned datasets in Brainspace.

Callout 5: Click a pinned dataset card to open the dataset in Analytics.

Callout 6: View the number of unpinned datasets in Brainspace.

Callout 7: Click the Hide icon to view only pinned datasets.

Callout 8: Click the unpinned dataset card to open the dataset in Analytics.

Callout 9: Hide datasets with no new documents.

Callout 10: Click the Sort by... dropdown menu to sort pinned datasets by name, by the number of new documents, or by total document count. Click the Filter by... dropdown menu to filter datasets by dataset activity status.

Callout 11: View pinned datasets in the card view or list view.

Callout 12: Click the Sort by... dropdown menu to sort unpinned datasets by name, by the number of new documents, or by total document count. Click the Filter by... dropdown menu to filter datasets by dataset activity status.

Callout 13: View unpinned datasets in the card view or list view.

Unpinned Dataset Card

After you create a new dataset, a dataset card will be added to the unpinned Datasets pane. A dataset card includes the following features:

Unpinned_Dataset_Card.png

Callout 1: Identify a dataset by name.

Callout 2: View a dataset’s status.

Callout 3: View the total number of documents in a dataset.

Callout 4: View the number documents that have been added to a dataset via incremental builds. This number is reset when a full build is completed.

Callout 5: Open a dataset in the Analytics Dashboard.

Callout 6: Move an unpinned dataset to the Pinned Datasets pane. After moving a dataset to the Pinned Datasets pane, the Unpin icon will display:

Pinned_Dataset.png

Dataset Settings Dialog

Group Admin and Super Admin

Dataset_settings_dialog.png

Callout 1: Edit the dataset’s name.

Callout 2: Add the dataset to or remove it from existing Brainspace groups. Callout 3: View the Dataset Info dialog.

Callout 4: Delete the dataset. This will remove the dataset and all the work product such as saved searches, classifiers and notebooks from the system forever.

Callout 5: View the data source connector status associated with this dataset. (Status may be empty for “no assigned data source,” “Prepared” for a fully completed data source that has had full field map and associated questions answered, and “Incomplete” for a data source connector that has been chosen, but hasn’t been fully completed (such as not having done the field mapping yet).

Callout 6: Reconfigure the data source.

Callout 7: Remove the data source from the dataset.

Callout 8: View last build date and time, scheduled builds, total deployed documents, incremental documents, and dataset creation date and time.

Callout 9: Modify advanced configuration options.

Callout 10: Enable or disable automatic builds.

Callout 11: Enable or disable automatic overlays to the data source after every build.

Dataset Build Steps

Ingest

As documents are ingested, Brainspace handles the interface to third-party products and streams the data into batchtools as json-line format. The ingestion process takes the raw text as provided for all fields and produces the document archive directory.

stream-json-line

Intermediate files exist in the following working directory:

<buildFolder>/archive/working/archive001.gz.

At the end of the ingestion process, the archive directory contains raw text and all metadata transferred in *.gz files, and the <buildFolder>/archive/output/archive001.gz subdirectory is only populated at the end of successful ingestion.

Document ingestion errors will be captured in the following folder: <buildFolder>/importErrorLog.

Analysis

Analysis includes the following high-level steps:

  1. Create Filtered Archive

  2. Boilerplate

  3. Exact Duplicate Detection and Email Threading

  4. Processed Archive

  5. Near Duplicate Detection

  6. Archive to TDM (Term Document Matrix)

  7. Build TDM Output

  8. De Dup TDM

  9. TDM Build

  10. Clustering

  11. Build TDM Output and Clusters

  12. Build doc-index

  13. Graph Index

  14. Generate Reports

Create Filtered Archive

Create filter archive includes one step—filter archive.

The filter archive step will apply the schema, filter strings that were filtered from filtered text, and remove Bates numbers.

Note

Text or fields not set to analyzed=”true” do not go into filtered archive.

filter archive

By default, filter archive removes soft hyphens and zero width non-breaking space, removes HTML markup, and removes all email headers. Bates numbers may be removed from all text via configuration files at the

command line interface (CLI). This step will decode Quoted Printable encodings (https://en.wikipedia.org/wiki/Quoted-printable).

This step removes filter strings. By default, this is mostly partial HTML markup. Custom filter strings can be set in the Brainspace user interface.

Boilerplate

Boilerplate includes the following steps:

  1. boilerplate hashing

  2. boilerplate counting

boilerplate hashing

For speed and efficiency, all lines of the bodyText field are analyzed and assigned a mathematical hash value. Common hashes are considered as candidates for boilerplate.

boilerplate counting

Lines of text identified as boilerplate candidates are given a second pass to determine if the text matches all requirements of boilerplate.

Full Boilerplate

Full boilerplate reports the duration of the boilerplate hashing and counting steps.

Exact Duplicate Detection and Email Threading

This step identifies exact duplicates and email threads.

The email threading step identifies near duplicates and email threads. Email threading works from documents in the filtered archive. Conversation Index (if valid) will supersede email threading from the document text. If Conversation Index is not present or not valid (see Data Visualizations), email threading will attempt to construct threads based upon document content/overlap (see shingling). If the dataset contains the parentid, attachment, and/or familyid field, the attachment’s parent-child relationships will be determined by those fields.

During this step, documents are shingled, and determination is made about any one document being a superset of another document (containing all of the other documents’ shingles).

Exact duplicate detection occurs here, utilizing the filtered archive. Subjects are normalized. For example, Re: and Fwd: are stripped.

Processed Archive

Processed archive includes one step—create processed archive.

The create processed archive step uses the outputs from the boilerplate step to remove boilerplate from the filtered archive. System will construct internal data to track what boilerplate was removed from each document. Words and phrases are counted and truncated/stemmed. If enabled, entity extraction occurs in this step. The processed documents will go into the processed archive.

Near Duplicate Detection

Near duplicate detection includes one step—ndf.

Near duplicate detection uses the processed archive to determine how many shingles two documents have in common and to identify them as near duplicates if they have enough of the same shingles. By default, 80 percent of shingles in common will identify two documents as near duplicates.

Archive to 1DM

Archive to TDM includes one step—arc-to-tdm.

arc-to-tdm

During the archive to TDM (Term Document Matrix) step, the processed archive will have stop words applied, determine the likely features (terms/words/phrases) and use that vocabulary to build the token TDM. Parts of speech are determined and utilize those and NLP against the detected language to assemble phrases that are meaningful and useful for our supported languages.

Various TDMs are generated for different purposes. For example, the Cluster TDM has a different threshold for content than the threshold for Brain TDM.

Brains and clusters will only use analyzed body text.

The Predictive Coding TDM will use any metadata fields that have the setting of analyzed=”true”.

Build 1DM Output

Build TDM output incudes one step—build-tdm tdm.

build-tdm tdm

In this step, build-tdm tdm creates a super-matrix of all terms by all documents (may be more than one word per term).

De Dup 1DM

De dup TDM includes one step—create-deduped-tdm. This TDM is used for the Cluster Wheel visualization.

create-deduped-tdm

In this step a TDM is built from documents identified as “Uniques” and “Pivots” (collectively called “Originals”).

1DM Build

TDM build includes the following steps:

  1. build-tdm tdm-deduped

  2. check-deduped-tdm-size

  3. build-tdm tdm-extended

build-tdm tdm-deduped

This step builds the full TDM (Term Document Matrix) without Near Duplicates.

check-deduped-tdm-tdm-size

This step does a simple sanity check on the size of the TDMs created at this point in the process.

build-tdm tdm-extended

This step creates a full TDM with metadata.

Clustering

Clustering includes the following steps:

  1. cluster tdm-deduped

  2. cluster-ndf

cluster tdm-deduped

Clustering of Uniques, Near Duplicates Pivots, and any Exact Duplicates Pivot that is not a Near Duplicate is performed around the deduped tdm.

cluster-ndf

This step adds Near Duplicates and Exact Duplicates to the Cluster Wheel.

Building TDM Output and Clusters

split tdm

During this step, the system will determine if we need more than one Brain.

build-tdm Root

The system will have one build-tdm step for each Brain. If there is only one Brain, it will be named Root. If there are multiple Brains, each Brain will be assigned a name that describes the terms it contains. Brains will be in alphabetical order.

build-brains

The build-brains step is where we build singular value decomposition.

Build doc-index

Build doc-index includes the following steps:

  1. Index documents

  2. index excludes docs

  3. all-indexing

Index documents

During this step, the documents in the processed archive are indexed to make the content keyword-searchable.

index exclude docs

During this step, the documents excluded from the processed archive are indexed to make the content keyword-searchable.

all-indexing

During this step, a summary is created of the duration for the index documents and index excluded documents steps.

graph index

Graph index includes one step—graph-data.

graph-data

The graph-data process builds the data used for the Communications analysis visualization.

generate reports

This step generates the final report.

Post-build Output Directory

At the end of the build process, the following files are copied to the output directory:

  • <buildFolder>/config/schema.xml

  • <buildFolder>/config/fieldMap.xml

  • <buildFolder>/config/language.xml

  • <buildFolder>/status.csv

  • <buildFolder>/process.csv

  • <buildFolder>/archive.csv

  • <buildFolder>/reports/*

The following file is moved to the output directory: <buildFolder>/doc-index.

Stop Words

Brainspace contains a list of standard stop words for each language supported by Brainspace (see Brainspace Language Support). Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

Note

Brainspace identifies languages for a document and then applies language-specific stop words to documents. The Common stop-words list is empty by default. You can create a custom stop-word list and upload it to Common if you want certain stop words to be applied to all languages. For example, Brainspace does not provide a stop word list for Estonian. If a you have a large Estonian population, it might be useful to upload an Estonian stop-word list to Common; however, any tokens that overlap with other languages will be applied to those languages as well. For example, if the word “face” is a stop word in Estonian, that word will be stopped in English documents as well.

Shingling

A common data mining technique used to reduce a document to a set of strings to determine if a document is a Near Duplicate in a dataset. A document with x-shingle is said to be all of the possible consecutive sub-strings of length x found within it. For example, if x=3 for the text string "A rose is a rose is a rose," the text will have the following shingles: “A rose is,” “rose is a,” “is a rose,” “a rose is,” “rose is a,” “is a rose.” After eliminating duplicate shingles, three shingles remain: “A rose is,” “rose is a,” “is a rose.”

Task Topics

Add Custom Stop Words

Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

To add custom stop words:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Settings icon:

    Select_Settings_for_Dataset.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. Click the Upload icon associated with the language:

    Advanced_Configuration-Upload.png
  5. Navigate to the *.txt file, and then click the Open button.

  6. Click the Apply button

    The Advanced Configuration dialog will close.

  7. In the Dataset Setting dialog, click the Save button.

  8. When the Dataset Settings dialog refreshes, click the Build button.

After the build completes, the new stop words will be included in the dataset.

Download Stop-Word Text Files

Brainspace Administrators have the ability to upload custom stop-word lists for any of the Brainspace-supported languages or to download the current stop-word list for each language in the Language and Stop Words list.

The *.txt files associated with each language can be downloaded directly from the Brainspace user interface.

To download the stop word *.txt file for a language:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset, and then click the Settings icon:

    Select_Settings_for_Dataset.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. Click the Download icon associated with the language:

    Advanced_Configuration-Download.png

The stop word *.txt file will download to your local machine.

Modify Dataset Settings and Advanced Configuration Options

When creating a dataset or any time after creating a dataset, you can upload and download dataset-wide filter words, set email threading and boilerplate properties, select optional analytics, and manage languages and stop words.

Note

You must have Group Admin or Super Admin credentials to modify dataset settings.

To modify a dataset’s settings and advanced configuration properties:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Do one of the following:

    • For an existing dataset, locate the dataset, and then click the Settings icon:

      Select_Settings_for_Dataset.png
    • For a new dataset:

      1. Click the Add Dataset button. The New Dataset dialog will open.

      2. In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups.

      3. Click the Create button.

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane, click the Advanced Configuration icon:

    Select_Advanced_Configuration.png

    The Advanced Configuration dialog will open.

  4. After setting a dataset’s advanced configuration options, click the Apply button.

For information on the different options available in the Advanced Configuration dialog, click the help (?) icon associated with each option.

Download Dataset Reports

Brainspace provides a number of different reports for each dataset (see Dataset Reports). To download Brainspace reports:

  1. In the user drop-down menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Download Reports icon:

    Dataset_Download_Reports.png

    The Report menu will open.

  3. Choose a report in the list, and then click the Download button.

The report will download to your local machine.

Download a Brainspace Datasets Report

To download a dataset management report:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Download button:

    Dataset_Download_button.png

The dataset management report *.csv file will download to your computer.

Create a Dataset with a Relativity Plus Connector

After configuring a Relativity OAuth client and creating a Relativity Plus connector, you are ready to create a dataset.

Note

This topic describes how to create and manage a Relativity Plus connector for Relativity v9.7 and newer versions of Relativity.

Note

After creating a dataset with a Relativity Plus connector, you cannot change the dataset’s connector to a legacy Relativity connector.

To create a dataset with a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Add Dataset button:

    Add_Dataset.png

    The New Dataset dialog will open.

  3. In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups:

    Enter_Dataset_Name.png
  4. Click the Create button.

    The Dataset Settings dialog will open.

  5. Click the Choose Connector button:

    Choose_Connector.png

    The Choose Connector dialog will open.

  6. In the Choose Connector dialog, click the appropriate Relativity Plus connector in the list.

    Note

    If you are not logged in to Relativity, you will be prompted to enter your Relativity username and password before you will be able to add the connector. If you are already logged in to Relativity, the Select a Source dialog will open.

  7. In the Select a Source dialog, click the appropriate Relativity Workspace folder, and then click the Save and Proceed button.

    The Relativity Saved Search dialog will open.

  8. To create the dataset using all documents in the Relativity Workspace, click the Proceed button.

    Note

    To create the dataset using a subset of documents, click a Saved Search in the list before clicking the Proceed button. After clicking the Proceed button, the License Checks dialog will open and run a check on your license document limits.

  9. After Brainspace verifies the license document limits, click the Proceed button. The Field Mapping dialog will open.

  10. Unmap the any of default fields, if necessary, map any additional fields, and then click the Continue button.

    The Dataset Settings dialog will refresh.

    Note

    For information about mapping fields, see Field Mapping Categories and Definitions.

  11. Verify the settings, click the Save button, and then click the Build button. The Dataset Build Options dialog will open.

  12. Choose a build option, and then click the Run This Build Type button. The Schedule Build dialog will open.

  13. Click the Build as soon as possible button.

    Note

    If you choose to build the dataset in the future, click the Schedule Build Time field, select a date and time, and then click the Save button. The Datasets screen will refresh and show the Dataset Queue build in progress. While the build is in progress, you can click the View Status button to view the build steps in progress. For information on each step in the build process, see Build Steps.

After the build completes successfully, the new dataset will move from the Dataset Queue to the list of active datasets in the Datasets screen, and you are ready to create work products with it.

Reference Topics

Dataset Reports

Aliases Report

Provides a list of all the email address aliases within the dataset. (This is generally used by Brainspace, and isn’t a particularly useful report for users. Brainspace recommends using the Person report for alias listings.)

Archive Report

Detailed report of the most recent import or transfer of data.

Batch Tools Version Report

Contains detailed information regarding which Batch Tools version was used to create the dataset,

including hostname, mac address, and PID information, as well as history for each incremental build or full build.

Boilerplate Report

Provides a list and occurrence count of all the unique boilerplate text identified during ingestion.

Build Error Log

Provides a detailed log of all the build errors encountered during ingestion.

Build Log

Provides a complete detailed log of all the ingestion steps during the build process.

Clusters Content

Lists all of the document IDs (for example, Control Numbers) for the ingested documents and maps them to a leaf cluster ID.

Clusters File

Contains the following cluster tree information: Cluster ID, Parent Cluster ID, Count of Documents in Cluster, Intra-cluster Metric, Cluster Type, and Folder Name.

Document Counts

Provides summary document count statistics for the dataset including how many documents were fed into Brainspace for ingestion, how many were ingested, how many were skipped, number of originals, exact duplicates, near duplicates, etc.

Extended Full Report

Includes all of the overlay fields and values from the Full Report and additional language detection fields BRS Primary Language and BRS Languages.

Full Report

Includes all of the overlay fields and values which can be overlaid into a Third Party system such as Relativity either manually through the Relativity Desktop Client or automatically by enabling Overlay within the Configuration screen within the Dataset Settings tab.

Import Error Archive

Compressed file that contains one or more of the files that failed to import.

Ingest Error Details

Text report containing more details about the errors in the Ingest Errors report.

Ingest Errors

*.csv report containing errors that occurred during ingestion with the location of the documents that caused the error.

Person Report

List all of the “Persons” automatically or manually created (via People Manager) along with the email addresses (aliases) associated with each person.

Process Report

Summary of the most recent dataset analysis.

Schema XML

The field mapping done via the interface is stored in this file and used to ingest the all of the mapped metadata and text.

Status Report

Summary of the most recent dataset analysis.

Vocabulary File

List of all the unique terms and phrases identified within the set of data during ingestion.

Common Options for Field Mapping

Use for Exact Duplicate

Ticking this checkbox will make this field part of the definition of exact duplicate. Two documents will only be considered as exact duplicates if the analyzed text fields, this field and all other fields that have this selected are the same. Examples would be “Sent Date,” “From,” and “Subject.”

Faceted

Ticking this checkbox will make this field available for display and search in the faceted field column of the Dashboard. If this field is a Date field, then ticking the checkbox will make it available in the Timeline display of the Dashboard.

Add Exact Text

will create a sibling field with an “-exact” extension to the name, and when searching that field, it will not be stemmed.

For example, a field called, “Highlights.” When searching for “indices” in that field, documents having “indices” and also “indicates”, and all other forms of that root.

If “Add Exact Text” is checked, then during a build, a field called “Highlights” will be created and a field called “Highlights-exact.” Searching the latter will return only documents that match the exact term.

Multi-value Separator

Used to provide a non-default delimiter to Brainspace to be used to divide a metadata field into separate values. For example, if a field has the value “Burger|Pizza|Tofu” then putting | in the Multi-value Separator will turn this into a field with three values of “Burger” and “Pizza” and “Tofu” rather than just one value of all three together.

Field Mapping Categories and Definitions

Attachment

The ID or IDs of a email’s attachments. Typically not used in conjunction with datasets using Family ID or Parent ID.

BCC

Contents of the BCC Field of an email should be used with full email addresses or names.

Body Text

The primary text field used for analysis. Example: Extracted Text

CC

Contents of the CC Field of an email should be used with full email addresses or names.

Conversation Index

Contents of the conversation index field. If valid for a document, this becomes the method to provide Email Threading for that document. Is also examined to see if any documents in the email chain are missing from the dataset. If so, there absence will be flagged in the field, EMT_ThreadHasMissingMessage.

Custodian

Contents of the Custodian Field, it is surfaced in the Advanced Search as a unique field.

Date

Contents of any other date field relevant to the document. In this category, faceted means that the data is broken down in a manner that the system can use the date field in the timeline view of dashboard (see Supported Date Formats).

Date Sent

Contents of the Date Sent field of an email. Used by Email Threading.

Enumeration

When a field has a category of enumeration the whole field is put into the index as a single token. One can only get results when searching for the whole value in quotes. The GUI will present a drop down for selection when searching an enumeration.

Exact

Used to provide a metadata field when you do not want to have stemming involved in a search. Family ID

A Unique ID that is used to represent the entire family of documents. This ID should be the Parent ID (See Parent ID) of the Family of documents. In the event that it is not the Parent document ID then Brainspace analytics will also require the configuration of the Parent ID field for all documents in the family to properly determine the relationships between parent documents and their attachments. FamilyID is not required, but can/should always be specified if available since it is used to populate the family id field used for indexing and EMT_FamilyId in the full report.

File Size

Used to provide special handling and search for documents based upon their size in advanced search.

File Type

Used to provide special handling and search for documents based upon their type in advanced search.

From

Contents of the From Field of an email should be used with full email addresses or names.

ID

The unique document identifier with the document population. (Examples include “Control Number,” “DocNo”, “DocID”, “BegBates.”)

NATIVE_PATH

Points to the native file on disk.

Numeric Bytes

Used to provide special handling and search for documents based upon their size in advanced search. Numeric Float

Used to provide special handling and search for documents based upon your custom numeric metadata in advanced search.

Parent ID

The ID associated with the parent of a document (e.g., a word document attachment) would have the ID of the email it was sent in. In order to identify attachments, the Parent ID field, Attachments field or Family field must be used. Only one of these is required, but it is best to specify two of these: either Parent and Family, OR Attachments and Family. All three can be specified, but that is not recommended since Parent and Attachments can conflict. If Parent is available it should be used instead of Attachments. If only Family is available, it will work to identify attachments, but only if the Family Id values correspond to the Key of the parent document. After all processing, if the provided Family field was blank, Brainspace analytics will populate the metadata field family_id with the key of the parent document.

Reference

Deprecated, do not use,

String

When a field has a category of string, each word in the value is a separate token. One can search for individual words, phrases, or the whole value (if you know what it is).

Subject

The subject line of an email or the title of a document. Used for Email Threading.

Text

An additional Text field, typically metadata such as comments, that can be part of your search, but you don’t want analyzed. Example “Lawyer Notes”

Text Path

Used when the DAT file does not contain the body_text of the document being imported. This field will have the path, as known by the tool that exported the data. Options include the ability to trim the beginning of the field value, and to point to an absolute disk address.

To

Contents of the CC Field of an email should be used with full email addresses or names.

Unfiltered Text

Retains filter words as defined in the Filter Words text files and in boilerplate content.

Total Documents

The total number of documents in a dataset.

New Documents

The total number of new documents added to a dataset. This number is cumulative until it resets the new document count to zero after a new build.

Pinned Dataset

A dataset card that has been moved from the unpinned Datasets pane to the Pinned Datasets pane. Unpinned Dataset

View a dataset card that is located in the default Datasets pane.

Activity Status

The status of the dataset:

  • Active: Indicates that the dataset is available for use in Brainspace.

  • Inactive: Indicates that the dataset remains in Brainspace but is not available for use (see Disable a Dataset).

Connectors

Relativity and Relativity Plus

Concept Topics

Relativity Overlay

When using a Relativity connector for a dataset, you can overlay a group of analytics fields from Brainspace into Relativity after creating a new dataset or after rebuilding an existing dataset. These fields can be used to organize and to accelerate linear document review in Relativity.

You can choose to run overlay to Relativity automatically every time you build a dataset, or you can choose to run overlay to Relativity manually as needed.

Multiple Relativity Overlays

When overlaying multiple datasets or classifiers to a single Relativity Workspace, Brainspace will display duplicate fields appended with additional characters to identify that a particular field in Relativity has more than one corresponding field in Brainspace. This also applies to multiple Brainspace datasets that use the Relativity Plus connector.

Relativity Plus Connector

Brainspace’s Relativity Plus connector is compatible with Relativity v9.7 and newer versions of Relativity. Relativity v9.7 and v10 work with the legacy Relativity connector and the Relativity Plus connector in Brainspace.

Note

The Relativity Plus connector only works with Relativity v9.7 and v10.x (including RelativityOne). Brainspace strongly recommends that customers upgrade to Brainspace v6.2 or newer to use the most recent API.

Brainspace-Relativity Document Links

By default, documents are linked between Relativity and Brainspace. Clicking the document link in Brainspace opens the source document in Relativity if network access (http or https) is permitted and the user is logged in to Relativity. Document links can be disabled using the Advanced Settings feature in Relativity(see third-party Relativity documentation for more information).

Multiple Relativity Web Servers

Relativity 9.7.229.5 does not support database-backed authorization codes with load-balanced web servers. Using multiple web servers will result in the Relativity Plus connector failing to authenticate. This can be resolved by configuring the Relativity Plus connector to explicitly communicate with a single Relativity web server.

Overlay Process

The Relativity Plus connector overlays Analytics field data in batches after a build. The Relativity connector overlays data as a single action. The Relativity Plus connector no longer causes the Relativity Workspace to

hold a full-table lock on the documents table while overlay is occurring. In the case of an overlay failure, the documents will have field values partially written to the Analytics field.

Pause and Resume

The Relativity Plus connector does not support pause and resume. Because of the concurrent nature of the implementation, Brainspace could not guarantee that a document would not be missed during the resume. The pause button works from the UI, but when resumed, the ingest will start from the beginning of the entire saved search or Workspace.

Predictive Coding

The Brainspace Addons Relativity (*.rap) application is still required for predictive coding (PC). This creates the choice fields (BDPC Is Responsive) that Brainspace is not able to create via the API. It also creates the views and saved searches that are useful for the PC workflow.

Note

CMML with ACS provides a control set solution, so PC is no longer required. The CMML solution does not require the *.rap file.

Ingest and Overlay Performance

Ingest performance with Relativity Plus should be significantly faster than the Relativity connector; however, the Relativity Plus connector is highly dependent on the values chosen for the connector configuration and the number of CPUs on the Brainspace servers, as well as the network bandwidth between the Brainspace host and the Relativity host.

Relativity Server Maintenance

Based on testing results and interaction with the kCura team, temporary resources are created on the Relativity server-side that correspond to each export initiated during the dataset ingestion process. The Relativity services have a cron-job that occurs weekly to clean-up temporary resources. These resources consume large amounts of space on disk, so it is important to monitor disk space for environments where many or large ingestion processes are being done. If more frequent clean-up jobs are required, contact the kCura team for assistance.

Task Topics

Run Overlay Automatically after a Dataset Build

The overlay to Relativity feature can be activated to run automatically each time you build an existing dataset with a Relativity connector.

Note

To run overlay automatically when creating a new dataset with a Relativity connector, see Create a Dataset with a Relativity Connector.

To run overlay automatically after a dataset build:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset with the Relativity connector, and then click the Settings icon:

    Select_Settings_for_Dataset_with_Connector.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane of the Dataset Settings dialog, toggle the Overlay switch to the On position:

    Select_Connector_Overlay.png

    The Overlay switch will become green.

  4. Click the Save button. Do one of the following:

    • To close the Dataset Setting dialog without running overlay to Relativity now, click the Close icon.

    • To overlay to Relativity now, click the Build button.

      Select_Relativity_Build.png

If you choose to close the Dataset Settings dialog without overlaying to Relativity, overlay to Relativity will run automatically every time you run a dataset build in the future. If you choose to run overlay to Relativity immediately and without the auto-overlay feature, overlay to Relativity will only run when manually initiated.

Run Overlay Manually on an Existing Dataset

After creating a dataset with a Relativity connector, you can use the overlay to Relativity feature at any time whether or not automatic overlay to Relativity feature has been enabled.

To run overlay manually on an existing dataset:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, locate the dataset with the Relativity connector, and then click the Settings icon:

    Select_Settings_for_Dataset_with_Connector.png

    The Dataset Settings dialog will open.

  3. In the Dataset Configuration pane of the Dataset Settings dialog, click Run Now:

    Run_Overlay_Now.png

    Note

    If the Run Now option is not visible or is greyed out, confirm that the dataset has a connector to Relativity, and you have fields selected for overlaying.

  4. Click the Save button.

  5. Click the Close (X) icon.

    Close_Dataset_Configuration.png

After running the overlay to Relativity, you set up automatic overlays or manually run the overlay feature at any time.

Enable Multiple Relativity Overlays on an Existing Relativity Plus Connector

After or while creating a Relativity Plus connector, you can enable the multiple Relativity overlay feature to overlay Relativity field sets in multiple Brainspace datasets to a single Relativity Workspace.

Note

This feature is only available for the Relativity Plus connector.

Note

When a dataset build completes with this feature enabled on the Relativity Plus connector, Brainspace creates a unique field in Relativity to map each of the BD fields with the datasets in Brainspace. For more information on Brainspace fields, see Relativity Overlay Fields on page 42.

To enable multiple Relativity overlay field sets:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. In the Datasets screen, click the Connectors button.

  3. Locate the Relativity Plus connector, and then click the Update Connector icon.

    The Relativity Plus connector configuration dialog will open.

  4. In the Overlay pane, toggle the Enable Multiple Overlay Field Sets switch to the On position:

    Overlay_Pane_ON.png
  5. Click the Test connector button.

  6. After verifying that the connector configuration is valid, click the Update Connector button.

    The connector configuration dialog will close automatically.

Every dataset in Brainspace that employs this connector will produce unique fields in the Relativity Workspace.

Configure a Relativity OAuth2 Client for a Relativity Plus Connector

Configuring a Relativity OAuth2 client is the first step in creating a Brainspace dataset with a Brainspace Relativity Plus connector for Relativity v9.7 and newer versions of Relativity.

Note

Relativity 9.7.229.5 does not support database-backed authorization codes with load-balanced web servers. Using multiple web servers will result in the Relativity Plus connector failing to authenticate. This can be resolved by configuring the Relativity Plus connector to communicate with a single Relativity web server.

To configure a Relativity OAuth2 client:

  1. Open a Relativity instance in a web browser, type your username, and then click the Continue button:

    Relativity9_Username_prompt.png

    The Relativity password dialog will open.

  2. Type your password, and then click the Login button:

    Relativity9_Password_prompt.png

    The Relativity Workspaces window will open.

  3. Click the Authentication menu dropdown arrow, and then click the OAuth2 Client option:

    Relativity_authentication_option_select.png
  4. Click the New OAuth2 Client button:

    Relativity_choose_New_OAuth2.png

    The OAuth2 Client Information dialog will open.

  5. Type a name for the client.

  6. Set OAuth2 Flow to Code.

  7. Type the redirect URL (fully-qualified domain name) with /oauth as the URL endpoint:

    Relativity_OAuth2_Client_information.png
  8. In the Access Token Lifetime field, type a session timeout value:

    Relativity_access_token_timeout_setting.png

    Note

    Relativity does not issue refresh tokens. If your OAuth2 session exceeds the session timeout value, you must clear your credentials and create a new OAuth2 token. The OAuth2 session timeout can be set to a low value to be more secure, or it can be set to a maximum of one year.

  9. Click the Save button.

    The OAuth2 Client Information screen will refresh.

    Relativity_Save_OAuth2_config.png
  10. Make note of the Client ID and Client Secret codes.

    You will need this information when creating the Relativity Plus connector in Brainspace.

After configuring a Relativity OAuth2 client, you are ready to create a Relativity Plus connector in Brainspace (see Create a Relativity Plus Connector).

Create a Relativity Plus Connector

After configuring a Relativity OAuth2 client, you are ready to create a Relativity Plus connector in Brainspace. You will need the Client ID and Client Secret that you created for the Relativity OAuth2 client.

To create a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Add Connector button:

    Add_Connector_button.png

    The Connector menu will open.

  4. Click the Relativity Plus option in the menu.

    The Relativity Plus connector dialog will open.

  5. In the Connector Name field, type a name for the connector.

  6. In the Relativity Base Document URL field, type the Relativity base URL that points to the Relativity domain user-interface (fully-qualified domain name) with /Relativity as the URL endpoint.

  7. In the API field, toggle the switch to the On position to allow self-signed certificates.

  8. In the Brainspace Analytics Fields To Overlay After A Full Build field, click an option in the list, and then select all options using the keyboard command Ctrl-A.

  9. In the Relativity API Host Machine Name (FQDN) field, type the fully-qualified domain name for the Relativity REST / ObjectManager API.

  10. In the HTTPS field, toggle the switch to the On position to enable HTTPS.

  11. In the Client ID and Client Secret fields, paste or type the codes that were generated by Relativity when you created the OAuth2 client (see Configure a Relativity OAuth2 Client for a Relativity Plus Connector).

  12. In the Concurrency field, type a value (minimum of 2) for the number of threads to use for ingest and mass operations.

  13. Click the Test Connector button.

  14. After the connector test is successful, click the Create Connector button.

    The new Relativity Plus connector will be added to the Connectors screen.

After creating a Relativity Plus connector, you are ready to create a dataset. You can also manage the connector settings or permanently delete the connector from Brainspace at any time. To configure the advanced settings for a Relativity Plus connector, click the Advanced link in the Relativity Plus connector dialog.

Update a Relativity Plus Connector

After creating a Relativity Plus connector, you can change the connector’s settings at any time.

To update a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Update Connector icon:

    Update_Connector.png

    The connector configuration dialog will open.

  4. Modify any of the connector options, and then click the Test connector button.

  5. After the connector test is successful, click the Create Connector button. The updated Relativity Plus connector will refresh on the Connectors screen.

After updating a Relativity Plus connector, you are ready to create a dataset or continue using it for Brainspace datasets. You can also manage the connector settings or permanently delete the connector from Brainspace at any time. To configure the advanced settings for a Relativity Plus connector, click the Advanced link in the Relativity Plus connector dialog.

Delete a Relativity Plus Connector

After creating a Relativity Plus connector, you can permanently delete it from Brainspace at any time. To delete a Relativity Plus connector:

  1. In the user dropdown menu, click Administration:

    Administration_Menu.jpg

    The Datasets screen will open.

  2. Click the Connectors button:

    Brainspace_Connectors_button.png

    The Connectors screen will open.

  3. Click the Delete Connector icon as shown in the following image:

    Delete_connector.png

    A confirmation dialog will open.

  4. Click the Delete button.

The confirmation dialog will close, and the Relativity Plus connector will be permanently deleted from Brainspace.

Reference Topics

Brainspace Supported Connectors

Beginning with Brainspace v6.3, all new features will be developed for the new Relativity Plus connector. The classic Relativity connector has been deprecated, and Relativity will discontinue direct SQL access with their version 11.x release.

Brainspace supports the following connectors:

Discovery v5.5

Brainspace v6.0

Brainspace v6.1

Brainspace v6.2

Brainspace v6.3

Relativity 10.3*

X

X

X

x

Relativity 10.1*

X

X

X

x

Relativity 9.7*

X

X

X

x

Relativity 9.6

X

X

X

x

==========

========

========

========

========

=======

Nuix 8.0

x

x

Nuix 7.8

X

X

x

Nuix 7.4

X

X

x

Nuix 7.2

X

X

X

X

x

Nuix 7.0

X

X

X

X

x

Nuix 6.2

X

X

X

X

x

Legend: X – Full Support

*Relativity v9.7 and v10.x work with the legacy Relativity connector and the Relativity Plus connector. Relativity Plus connector only works with Relativity v9.7 and v10.x (including RelativityOne). Brainspace strongly recommends that customers upgrade to Brainspace v6.3 or newer to use the most recent APIs.

Relativity Overlay Fields

When configuring a Relativity or Relativity Plus connector, you will decide which fields to overlay (see Create a Relativity Connector and Create a Relativity Plus Connector.

brs_strict_dup_set_id

If the document is in an SDG, this is the SDG ID of the SDG. Otherwise it is NULL.

brs_strict_dup_pivot is

If the document is in an SDG, this is the document ID of the pivot member of the SDG. Otherwise it is NULL.

BD EMT Duplicate ID

The document identifier of the duplicate email message or attachment. Unique group identifier used to group all documents within each of the exact text duplicate sets.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which control number or document IDs an email message or attachment is a duplicate of within an email thread.

BD EMT EmailAction

Identifies the specific action for each message within an email thread (send, forward, or reply).

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing whether an email was the send (the original message in an email thread), a forward, or a reply within an email thread.

BD EMT FamilyID

The control number of document identifier of the parent email message within a document family within an email thread.

This field can be removed from overlay. Relativity users will not know the Relativity control number of document identifier of the parent email message when reviewing a document family (message and attachments) within an email thread if this field is removed from the overlay.

BD EMT

Intelligent sort field that allows you to sort email threads hierarchically in descending order so that the most inclusive messages for each branch within an email thread are sorted to the top along with any attachments to those inclusive messages.

This field can be removed from overlay. Removing this field from the overlay will not allow Relativity users to sort the Brainspace Email Threads hierarchically in Relativity.

BD EMT IsDuplicate

Identifies whether an email message is a duplicate within the email thread.

This field can be removed from overlay. This field is “Yes” if the email message or attachment is a duplicate of another message or attachment within the email thread. Removing this field from the overlay will prevent Relativity users from knowing which email messages or attachments are duplicates within an email thread.

BD EMT IsMessage

Identifies which documents within an email thread are actual email message. Documents are consider emails if they have a Populated From field and are not identified as attachments.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which documents within an email thread are actual email messages.

BD EMT IsUnique

Identifies which messages within the email threads are the inclusive message.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing which email messages are inclusive within an email thread. Relativity users are only required to review the inclusive messages within an email thread as they contain the content of all the non-inclusive messages within the email thread.

BD EMT MessageCt

The total number of messages within an email thread.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing how many email messages are within an email thread.

BD EMT ThreadID

Unique identifier assigned to a group of messages within a single email thread.

This field can be removed from overlay. This same BD EMT FamilyID is assigned to all emails and attachments that belong to the same email thread. If this field is not overlaid into Relativity, users will not be able to take advantage of Brainspace’s email threading for batching and review in Relativity. There are several custom Relativity Views created by Brainspace that require this field to be populated in order for the Views feature to function properly.

BD EMT ThreadIndent

This field is used for displaying the Email Thread view in Relativity where messages are properly indented in the view based on the order in which the messages were created within the email thread. For example, a reply to a message will have one greater thread indent than the message it replies to.

This field can be removed from overlay. Relativity users will not be able to use the custom Brainspace Email Thread Views if this field is not included in the overlay.

BD EMT ThreadPath Full

Contains a semicolon delimited list of the document IDs (control numbers) for all the messages that are included within each inclusive message.

This field can be removed from overlay. Relativity users will not know which non-inclusive email messages are contained in each inclusive message in the email thread if this field is not included in the overlay. This will make inclusive-only reviews in Relativity difficult to manage.

BD EMT ThreadSort

Field that sorts email threads by ThreadIndent first and then by chronology (the order in which the messages were generated within each email thread).

This field can be removed from overlay. Relativity users will not be able to sort the Brainspace email threads in Relativity chronologically if this field is not included in the overlay.

BD EMT UniqueReason

Indicates why the message is inclusive. “Attach” means the message had an attachment that is not present in the previous messages or is different from the attachment in the previous messages within an email thread. “Message” means the content of the message is not inclusive in another email in the same email thread. “Message” and “Attach” both contain unique information.

This field can be removed from overlay. Relativity users will not know why a message has been marked IsUnique if this field is not included in the overlay.

BD EMT ThreadHasMissingMessage

Indicates that parsing the ConversationIndex has revealed that a document in the thread has not been included in the Brainspace dataset.

This field can be removed from overlay. Users will not be able to see that the document was missing from the thread if this field is not included in the overlay.

BD EMT WasUnique

Indicates that this document was considered to contain unique content. However, a new document introduced in a subsequent build has all of this document’s content and more. This status will be preserved across all subsequent builds.

This field can be removed from overlay. Users will not be able to see that this document was previously considered having unique content if this field is not included in the overlay.

BD EMT WasUniqueReason

Indicates why the message was unique. “Attach” means the message had an attachment that was not present in the previous message or was different from the attachment in the previous message in an email thread. “Message” means the content of the message was not unique within another email within the same email thread. “Message” and “Attach” both contain unique information.

This field can be removed from overlay. Relativity users will not know why a message has been marked WasUnique if this field is not included in the overlay.

BD EMT Intelligent Sort

Alternative sorting algorithm that presents the most complete document in an email thread first.

This field can be removed from overlay. Users will not be able to see the most complete version of the email thread if this field is not included in the overlay.

BD EMT AttachmentCount

The number of attachments included with this email.

This field can be removed from overlay. Users will not be able to see how many attachments are included with this email if this field is not included in the overlay.

BD StrictDupStatus

This field identifies the status of a document with regard to its strict exact-duplicate state. With the option to include metadata in CMML classifiers, it becomes necessary to consider that two documents may be textual exact duplicates but have differences in metadata; therefore, this field represents the strict exact-duplicate state (see note below).

It will be populated with one of three values:

  • unique: This document may not be considered a strict exact-duplicate of any other document.

  • duplicate: This document is considered a strict exact-duplicate of another document

  • pivot: This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will not know whether this document is considered a strict exact duplicate if this field is not included in the overlay.

Note

Two documents are considered to be strict exact-duplicates if the analyzed text fields are identical (except for normalized whitespace), if all fields flagged as usedForExactDup in the schema.xml are identical and if all fields are flagged as “analyzed = true” in the schema.xml are identical.

Brainspace supplies a default schema that makes certain choices for which fields are marked as usedForExactDup and/or analyzed. The user can override those choices.

BD ExactDupSetID

Unique identifier for each Exact Duplicate group. Documents that are exact duplicates of one another are grouped together using this ID.

This field can be removed from overlay. This group identifier is used in Relativity to understand which documents are part of the same exact text duplicate grouping. Documents that are exact text duplicates of one another will all get the same BD EM Duplicate ID.

BD ExactDupStatus

This field identifies the status of a document with regard to its exact-duplicate state.

It will be populated with one of three values:

  • unique - This document may not be considered an exact duplicate of any other document.

  • duplicate - This document is considered an exact duplicate of another document.

  • pivot - This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Removing this field from the overlay will prevent Relativity users from knowing whether this document is considered an exact duplicate.

Note

Two documents are considered exact duplicates if the analyzed text fields are identical (except for normalized whitespace) and all fields flagged as usedForExactDup in the schema.xml are identical.

BD IsExactPivot

Identifies the original document against which all exact duplicates were compared.

This field can be removed from overlay. Relativity users will not know which document is considered to be the original against which all documents are compared to identify exact text duplicates if this field is removed from the overlay.

BD IsNearDupPivot

Identifies the original document against which all near duplicates were compared.

This field can be removed from overlay. Relativity users will not know which document is considered to be the original against which all documents are compared to identify near duplicates if this field is removed from the overlay.

BD NearDupSimilarityScore

Contains the near duplicate similarity score for near-duplicate documents.

The score is a number between the near duplicate threshold (by default 0.8) and 1.0. It is calculated based upon all fields in the schema marked as `analyzed=true`. Note that the configuration of true/false setting is not controlled through the UI and should not be altered without consulting Reveal/Brainspace support.

This field can be removed from overlay to your third party review platform. If that is done users will not know how similar a near-duplicate document is to its original document.

BD Languages

A semi-colon delimited list of the languages potentially within a document.

This field can be removed from overlay. Relativity users will not know what mix of languages are contained within a document if this field is removed from the overlay.

BD NearDupSetID

Unique identifier for each near-duplicate group. Documents that are near duplicates if one another are grouped together using this ID.

This field can be removed from overlay. Relativity users will not know which documents belong to the same near duplicate set if this field is removed from the overlay. Users will also not be able to propagate coding decisions to near-duplicate documents in Relativity.

BD NearDupStatus

This field identifies the status of a document with regard to its near-duplicate state.

It will be populated with one of three values:

  • unique - This document may not be considered an exact duplicate of any other document.

  • duplicate - This document is considered an exact duplicate of another document.

  • pivot - This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will now know whether this document is considered to be a near duplicate if this field is removed from overlay.

Note

By default, two documents are considered to be near duplicates if they share 80 percent of their text shingles in common.

BD Primary Language

The primary (or dominant) language identified within a document.

This field can be removed from overlay. Relativity users will not know the primary language identified within a document if this field is removed from the overlay.

BD RelatedSetID

Identifies the first parent cluster that is normal (not an exact duplicate or near duplicate). Directly correlates to ClusterID in Brainspace. This field identifies which documents are highly similar in terms of content but not similar enough to be considered near duplicates. Documents that are highly similar but not quite near duplicates are assigned the same BD RelativitySetID.

This field can be removed from overlay. Relativity users will not be able to organize batches and perform review on documents that are highly similar if this field is removed from overlay.

BD StrictDupStatus

This field identifies the status of a document with regard to its strict exact-duplicate state. With the option to include metadata in CMML classifiers, it becomes necessary to consider that two documents may be textual exact duplicates but have differences in metadata; therefore, this field represents the strict exact-duplicate state.

It will be populated with one of three values:

  • unique: This document may not be considered a strict exact duplicate of any other document.

  • duplicate: This document is considered to be a strict exact-duplicate of another document.

  • pivot: This document is the original document of which other documents are listed as duplicates.

This field can be removed from overlay. Relativity users will not know whether this document is considered to be a strict exact duplicate if this field is removed from overlay.

Note

Two documents are considered strict exact duplicates if the analyzed text fields are identical (except for normalized white space), all fields flagged as usedForExactDup in the schema.xml are identical, and all fields flagged as “analyzed = true” in the schema.xml are identical. Brainspace supplies a default schema that makes certain choices for which fields are marked as usedForExactDup and/or analyzed, but the user may override those choices.

BD Summary

A summary of the document using six words or phrases. For near duplicates, this field will have the six terms or phrases that best distinguish this document from the pivot. For pivots, this field will have the six terms or phrases that best represent this document.

This field can be removed from overlay. Relativity users will not have a high-level summary of every document if this field is removed from overlay.

BDID

Brainspace unique identifier for every document ingested. This is an ID that BD gives every document that, when used sequentially, will show an evolution of documents. Every BDID is adjacent to its most similar document (e.g., BD_000000001, BD_000000002 with enough zeros for 999 million docs). Zeros are needed to maintain string sort order. Sorting documents by BDID will result in neighbor documents being highly related to each other, which expedites the review process.

This field allows Relativity users to sort documents when batching for review so that the documents within Relativity review batches are highly similar to one another in terms of content and vocabulary. Sorting by this field when creating batches will force “like” documents to be included in the same review batch. This has been proven to accelerate document review by as much as 90 percent.

This field can be removed from overlay. Relativity users will not be able to take advantage of this field and sorting feature if this field is removed from overlay.

Predictive Coding Overlay Fields

BDPC Auto Code

For a predictive coding (PC) classifier (model), this field contains the recommended coding decision for every document. This field is only populated when the predictive coding session in Brainspace is closed by clicking on “Close Session.” This Relativity field will only be populated if the user closes out the active PC session.

This field cannot be removed from overlay. Users may choose not to close out the PC session, which will leave this field blank or null in Relativity.

BDPC Control Set

This field identifies all the documents that are included in the control set (model). This field is only populated when using Brainspace’s predictive coding (PC) workflow.

This field cannot be removed from overlay. This Relativity field will only be populated if the user creates a control set for a PC session in Brainspace.

BDPC Is Responsive

This is the coding field used to code documents in Relativity that will be used to train a Brainspace classifier. This field is only populated when using Brainspace’s predictive coding workflow.

This field cannot be removed from overlay. The field will only be populated if the user applies this field in Relativity to code documents for a Brainspace PC session.

BDPC Needs Review

This field identifies all the documents in Relativity that need to be reviewed for a given Brainspace training round. This field is only populated when using Brainspace’s predictive coding (PC) workflow.

This field cannot be removed from overlay. This Relativity field will only be populated if the user creates a PC session in Brainspace and creates a control set or training round.

BDPC Predictive Rank

This field contains the most recent predictive rank. This field is only populated when using Brainspace’s predictive coding (PC) workflow. This Relativity field will only be populated if the user creates a Brainspace PC session.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a PC training round in Brainspace.

BDPC Use for Training

This field identifies which documents will be used for training the model.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a PC training round in Brainspace.

CMML Overlay Fields

BD CMML ## Score Relativity Field Name

This field is only populated when using Brainspace’s CMML workflow. This Relativity field will only get populated if the user creates a Brainspace CMML classifier where a “Connect Tag” (Relativity coding field) was used to train the classifier. A “BD CMML ## Score Relativity Field Name” field will be created in Relativity to store the predictive rank for that classifier where ## is the corresponding CMML classifier ID in Brainspace and “Relativity Field Name” is the name of the Relativity field connected to the classifier in Brainspace.

This field cannot be removed from overlay. This field is populated and then updates each time the user runs a training round in Brainspace for a CMML classifier. Multiple CMML classifiers cam be created and ran concurrently if more than one issue must be investigated.

Relativity Plus Configuration Options

Ingest Batch Size

The number of documents to be retrieved by each HTTP export request from Relativity.

Analytics Overlay Batch Size

The number of documents to be send by each HTTP overlay request to Relativity.

Embed Native Viewer URL

Whether or not a Relativity Document URL should be generated, per document.

HTTP(s) Request Timeout

The maximum number of milliseconds that any given HTTP request will wait for Relativity to respond.

Maximum HTTP(s) Requests Per Second

The maximum number of HTTP requests that the Brainspace application will send to Relativity, per second.

Validate User Facing URLs

Whether or not the Brainspace application should verify the base document URL and OAuth URLs.

API Query Page Size

The number of objects (documents, Relativity Workspaces, saved searches, fields) that should be retrieved when querying the Relativity API, per request.

Document Condition Size Limit

The number of documents that will be used for the optimized incremental ingest query.

Brainspace Lifecycle Version Support

Beginning with Brainspace v6.3, all new features will be developed for the new Relativity Plus connector.

The classic Relativity connector has been deprecated, and Relativity will discontinue direct SQL access with their version 11.x release.

Version

Release date

End of Standard Support

End of Critical Updates

6.5

Apr 2 2021

Next release + 12 months

Next release + 6 months

6.4

Dec 23 2020

Apr 2 2022

Oct 2 2021

6.3

June 8 2020

Dec 23 2021

June 23 2021

6.2

Apr 18 2019

June 8 2021

Dec 8 2020

Brainspace Language Support

Brainspace’s patented algorithms work with all tokenized languages. The analytics experience is dramatically improved by adding stop words for most common business languages, including the ability to automatically detect terms and phrases, group documents and cluster terms on the Cluster Wheel, and execute concept searches for each supported language. Brainspace also automatically detects primary and secondary languages within each document and provides a set of fields that store the language-detection information.

Note

Brainspace identifies languages for a document and then applies language-specific stop words to documents. The Common stop-words list is empty by default. You can create a custom stop-word list and upload it to Common if you want certain stop words to be applied to all languages. For example, Brainspace does not provide a stop-word list for Estonian. If a you have a large Estonian population, it might be useful to upload an Estonian stop-word list to common; however, any tokens that overlap with other languages will be applied to those languages as well. For example, if the word “face” is a stop word in Estonian, that word will be stopped in English documents as well.

The following is a support summary of all Brainspace 6 languages. For languages that only have identification support, Brainspace still provides the following analysis:

  • Tokenize a document using space as the separator between terms (English-based).

  • Use n-gram phrase detection.

  • Index original token along with an English-based normalization token.

    Note

    This can at times lead to inconsistent results.

Phrase detection using parts of speech is generally more meaningful than n-gram because Brainspace has tailored detection to that specific language by leveraging parts of speech. Phrase detection using n-gram is statistically-based and does not incorporate language specific customization.

The following table describes the level of support that Brainspace 6 provides for different languages. In addition to default stop words, you can upload custom stop words to any language included in the Languages and Stop Words pane (see Manage Stop Words).

Note

Out of the following list of all supported languages, Chinese and Icelandic are the two that don't get any kind of stemming, lemmatizing, or other language specific handling when indexing.

Table 1. Feature Support

Language

Language Identification

Stop Words

Phrase Detection (Parts of Speech)

Phrase Detection (n-gram)

Entity Extraction

Albanian

x

Arabic

x

x

x

x

x

Bengali

x

Bulgarian

x

Catalan

x

Chinese

x

x

x

x

x

Croatian

x

Czech

x

x

x

Danish

x

x

Dutch

x

x

x

x

English

x

x

x

x

Estonian

x

Finnish

x

x

French

x

x

x

x

German

x

x

x

x

Greek

x

x

x

Gujarati

x

Hebrew

x

x

x

x

Hindi

x

Hungarian

x

x

x

Icelandic

x

x

Indonesian

x

x

Italian

x

x

x

x

Japanese

x

x

x

x

Kannada

x

Korean

x

x

x

x

Kurdish

x

Latvian

x

Lithuanian

x

Macedonian

x

Malay

x

x

Malayalam

x

Norwegian

x

x

Pashto

x

x

Persian

x

x

x

x

Polish

x

x

x

Portuguese

x

x

x

x

Romanian

x

x

x

Russian

x

x

x

x

Serbian

x

Slovak

x

Slovenian

x

Somali

x

Spanish

x

x

x

x

Swedish

x

x

Tagalog

x

Tamil

x

Telugu

x

Thai

x

Turkish

x

Ukraine

x

Urdu

x

x

x

x

Uzbek

x

Vietnamese

x

x