Reveal Help Center

Getting Started: Supervised Learning

Supervised Learning

Supervised Learning is a type of machine learning whereby the user manually identifies positive and negative examples and the machine uses those training examples to train a predictive model. After training, the machine will assign predictive ranks to each document in the Dataset. These ranks can be used to determine which documents are “more like” what the user is looking for (positive examples).

Predictive Model

In addition, a Predictive Model is a series of terms and weights used by the machine to train how to identify positive and negative document examples.

There are two specific Supervised Learning workflows supported by Brainspace.

  • Continuous Multimodal Learning (CMML) is the most common of the two workflows and provides users with the ability to create a Predictive Model with no extra effort.

  • The other workflow is called Predictive Coding which we will not cover in this document as it has a very specific use case for the Legal industry.

Continuous Multimodal Learning (CMML)

Continuous Multimodal Learning is Supervised Learning workflow that allows you to pursue lines of inquiry using a lightweight set of features. You can leverage information discovered across all visualizations to drive training of predictive models. The result is a fast-moving workflow where machine learning accelerates discovery rather than getting in the way.

Brainspace uses the positive (relevant or important) and negative (not relevant or not important) examples identified by the user to train the machine to automatically find additional positive and negative documents. Each time you train the machine to find positive examples (info you are most interested in) you are creating a Training Round. Typically, you will want to run a Training Round each time you find additional positive or negative examples.

To identify which documents are positive and which ones are negative, the machine assigns a Predictive Rank to each document. After each Training Round, the system will apply a Predictive Rank to each document. The higher the score the more likely the document is Relevant or important to your investigation. The lower the score the more likely the document is Not Relevant or not important. Predictive Ranks above .90 are considered the most relevant or important documents to your investigation or analysis. See diagram below.

Diagram – Predictive Ranks

image1.png
  • After you have tagged your positive and negative document examples click on the Supervised Learning tab in the upper right-hand corner to begin building your predictive model.

    Note

    Please Note: It’s important to note that the system ONLY requires a single positive document example to begin training a predictive model. The system also does NOT require any negative examples to begin training a predictive model. In the course of analyzing and reviewing your data you are likely to come across additional positive and negative examples that you can elect to include in additional training rounds.

    image2.png
  • To create a new predictive model, click on New Classifier in the upper right-hand corner.

    image3.png
  • Enter a name for your predictive model. Typically, you would want to name your predictive model so that it matches the Tag you used to identify the positive and negative examples (if have any negative examples).

    image4.png
  • Select the Positive choice that is associated with the Tag you’re using to training the Classifier by clicking on “Select a Positive Choice”.

  • The Negative choice will automatically be associated after you select the Positive choice.

  • Now click on Save to save your new CMML predictive model.

  • You should now see CMML Dashboard screen (see example below).

    image5.png
  • The CMML Dashboard is where you train your model and evaluate the results of each Training Round. You will see that in this example we have reviewed and tagged just 1 document. There are 1 positive and 0 negative document example

  • You will also notice that the total number of documents in the Dataset minus the number of documents tagged appears in the upper right-hand corner of the dashboard. Since we tagged 1 total documents for our first Training Round, that means there are 968,113 documents in the Dataset that have not yet been tagged

  • The graph in the middle of the screen is a guide to help you assess the progress of training the predictive model. Each time you run a Training Round the system will plot the number of positive examples out of the total number of documents tagged. In the example above, you can see that 1 document was tagged in this first Training Round. With each Training Round a new plot point will appear in the chart. When you have multiple Training Rounds where the line remains “flat” (see example below) then you can say that you have completed training your predictive model.

  • To run the first Training Round, click on the Train Now button in the lower left-hand corner of the dashboard.

    image6.png
  • Each Training Round will run for several minutes depending upon the number of positive and negative examples contained within the round. While the Training Round is running a Round Status will appear where the Train Now button was located.

    image7.png
  • When the Training Round is complete, the dashboard will automatically update and appear like the screen below.

    image8.png
  • How to interpret the Predictive Ranks – To reiterate from earlier, the higher the rank the more likely the document is to be Relevant or important to your investigation. The system applied high ranks (above .50) to the documents that are most similar in content to your positive examples you tagged as “Relevant”. The lower ranked documents (below .50) are most similar in content to the negative (if you had any) examples you tagged as Not Relevant.

  • The Document List shown in the dashboard lists ALL of the untagged documents (43,042,324 documents) in Predictive Rank order with the highest ranked document listed first. This is the most critical feature to understand within the CMML Dashboard and the biggest benefit of CMML. The system has automatically organized/sorted your remaining untagged documents in Predictive Rank order which means the most Relevant documents have risen to the top of this very large pile of documents.

    image9.png
  • You should notice immediately that the documents that have risen to the top of the document pile in the Document List are the highest ranked.

  • At this point you can begin reviewing the high ranked documents by clicking on the first document in the Document List and using the Tag Layout in the upper right-hand corner to tag the document as either Relevant or Not Relevant. Remember, the system has assigned a Predictive Rank to help quickly guide you to the most Relevant documents. It’s up to you to confirm whether those high-ranking documents are actually Relevant or not.

    image10.png
  • As you tag a document as Relevant (positive) or Not Relevant (negative) the system will automatically navigate you to the next highest-ranking document in the Document List. You can continue to review the high-ranking documents until you either come across several consecutive negative (Not Relevant) examples or until you have found whatever you’re looking for. If you come across consecutive Non Relevant documents but haven’t yet found the evidence or facts you were looking for, you can run another Training Round with newly tagged documents and continue training the Predictive Model.

  • To run another Training Round, click on the X in the upper right-hand corner of the full document viewer to close the document and return to the CMML Dashboard.

  • You should see the Train Now button appear now that you have some additional tagged documents that can be used to train the predictive model. Click Train Now to run another training round.

    image11.png