Extracting mathematical constraints from Credit Agreements — Can GPT4 do it?

8 min readFeb 14, 2023

Understanding Credit Agreements?

A credit agreement is a crucial document that outlines all the details of a loan agreement between a borrower and a lender. It serves as a legally binding contract that captures the lending terms such as the annual interest rate, interest calculation method, associated fees, loan duration, payment terms, and consequences for late payments. The credit agreement must be written in a manner that leaves no room for interpretation, which contributes to the convoluted nature of the clauses used to define these terms. The complexity of the language used in credit agreements arises from the need for a precise and unambiguous definition of loan terms. The precise definition of loan terms ensures that both the borrower and the lender are aware of their respective obligations and rights. Let’s have a look at one of these complex clauses in detail and see how we can convert them into a machine-readable format.

Pre-LLM Era!

Identifying Elements

The first step in this process is to recognize the individual components that represent the structure of the constraint. This includes the Relation, such as “not to exceed”, the Operator, “the greater of” and “the lesser of”, the Governors, such as “Consolidated Total Assets” and “Consolidated Adjusted EBITDA” and the Objects, such as “$100,000,000”, “11.5%”, “45%” and “$200m”. All these elements can be extracted by simple methods like dictionary spotting or by the utilization of more complex sequence taggers that have been trained for this purpose. We can train multiple individual sequence taggers, or jointly train a sequence tagger which models all the above elements together. We can use language models which are pretrained on public legal text or we can pre-train a language model specifically on credit agreements from scratch to further improve the accuracy of our sequence tagger. There are multiple ways to proceed further, which one should we choose? When faced with this decision, the recommended course of action is to quickly establish a baseline solution and iteratively enhance it.

Conducting a preliminary analysis of the data can yield valuable insights for the modelling of our solutions. For instance, when an object is identified, it is known that there must also be a governor present. And if the object is a percentage, the recognition can benefit from joint inference by a sequence tagger. As a baseline solution, implementing a straightforward dictionary spotting method for the governing factors and utilizing regular expressions for the objects can prove to be effective. However, determining temporality through dictionary spotting is not a straightforward task, and thus, it may be necessary to train a dedicated sequence tagger for this purpose.

Identifying Relations between them

Once the individual elements are successfully extracted, it is crucial to determine the relationship or interdependencies between them. The objects must be assigned to the operators and the operators must be linked to a relation. This task can be accomplished through the training of a relation extraction model, either by creating our features and utilizing a traditional random forest-based classifier, or by implementing a more complex relation extraction model based on a BERT base architecture. A crucial aspect to consider is the level at which the objects reside, whether they are nested or exist at the same level. In some instances, the text may contain markers such as “(x)”, “(y)”, or “(z)” that provide valuable signals for the algorithm. However, not all cases may be explicitly marked. For example, the contents of node “z” may not be indicated, and there may be two elements within that node that are not segregated by the markers. To address this, a classifier can be trained to make binary decisions for each of these questions.

Once the structure of the constraint has been detected, it is crucial to normalize it and link it to a relevant knowledge base. This involves normalizing the Provision and Type, which have a limited number of possible values. This information can be obtained from the business desks. In addition, the title stack, or the section hierarchy of the document, can be utilized to add additional context for the Provision and Type classifier.

Behold the power of LLMs!

Large language models like GPT-4 and PALM-2, trained on a diverse range of internet text, are incredibly adept at understanding and generating human-like text. These models can handle tasks ranging from drafting emails to writing code to answering trivia questions. And importantly for our context, they can also help parse and interpret the complexities of credit agreements.

Parsing Credit Agreements with AI

GPT-4 can be effectively used to understand the constraints and dependencies of negative covenants in credit agreements. It can identify and categorize the individual elements and their hierarchical structure.

For the example in consideration, it can accurately identify ‘x’, ‘y’ and ‘z’ as three separate conditions, it identifies that ‘z’ is made up of 2 sub-conditions, understand their relationship with the ‘greater of’ and ‘lesser of’ operators, and categorize the ‘Consolidated Adjusted EBITDA’ and ‘Consolidated Total Assets’ terms correctly in a zero-shot setting.

Prompt: An example of a clause present in the Negative Covenant Section of Credit Agreements — “Indebtedness of the borrower in an aggregate outstanding principal amount at any time outstanding not to exceed the greater of x) $100,000,000 y) 11.5% of Consolidated Total Assets (determined at the time of incurrence) and z) the lesser of 45% of Consolidated Adjusted EBITDA (for the most recent fourth fiscal quarter period) and $200 million.”. Pay attention to the details around the use of starts of enumerated lists for representing information hierarchically. Can you tell me the elements that are part of the above constraint and represent them in JSON LD?

{
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "constraintType": "http://example.com/constraintType",
    "elements": "http://example.com/elements",
    "FixedLimit": "http://example.com/FixedLimit",
    "limit": "http://example.com/limit",
    "label": "http://example.com/label",
    "PercentageLimit": "http://example.com/PercentageLimit",
    "percentage": "http://example.com/percentage",
    "of": "http://example.com/of",
    "determinationTime": "http://example.com/determinationTime",
    "LesserOf": "http://example.com/LesserOf",
    "subElements": "http://example.com/subElements",
    "determinationPeriod": "http://example.com/determinationPeriod"
  },
  "constraintType": "greatestValue",
  "elements": [
    {
      "@type": "FixedLimit",
      "limit": "100000000",
      "label": "x"
    },
    {
      "@type": "PercentageLimit",
      "percentage": "11.5",
      "of": "consolidatedTotalAssets",
      "determinationTime": "At the time of incurrence",
      "label": "y"
    },
    {
      "@type": "LesserOf",
      "label": "z",
      "subElements": [
        {
          "@type": "PercentageLimit",
          "percentage": "45",
          "of": "consolidatedAdjustedEBITDA",
          "determinationPeriod": "For the most recent fourth fiscal quarter period",
          "label": "a"
        },
        {
          "@type": "FixedLimit",
          "limit": "200000000",
          "label": "b"
        }
      ]
    }
  ]
}

Can we have named elements?

After iterating for a couple of rounds with feedback on the named elements and asking it to incorporate them into the main relation, instead of adding them separately at the end, here is what we get.

Prompt1:The individual components that represent the structure of the constraint are the Relation, such as “not to exceed”, the Operator, “the greater of” and “the lesser of”, the Governors, such as “Consolidated Total Assets” and “Consolidated Adjusted EBIDTA” and the Objects, such as “$100,000,000”, “11.5%”, “45%” and “$200m”. Can you now regenerate your answer by taking this into account ?

Prompt2: I don’t want to see the list of governors and objects separately. Devise a way to incorporate it into the main relation.

{
  "@context": {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "relation": "http://example.com/relation",
    "operator": "http://example.com/operator",
    "governor": "http://example.com/governor",
    "object": "http://example.com/object"
  },
  "relation": "not to exceed",
  "object": {
    "operator": "greatestValue",
    "elements": [
      {
        "@type": "FixedLimit",
        "limit": "100000000",
        "label": "x"
      },
      {
        "@type": "PercentageLimit",
        "percentage": "11.5",
        "governor": "consolidatedTotalAssets",
        "determinationTime": "At the time of incurrence",
        "label": "y"
      },
      {
        "@type": "operator",
        "operator": "LesserOf",
        "label": "z",
        "elements": [
          {
            "@type": "PercentageLimit",
            "percentage": "45",
            "governor": "consolidatedAdjustedEBITDA",
            "determinationPeriod": "For the most recent fourth fiscal quarter period",
            "label": "a"
          },
          {
            "@type": "FixedLimit",
            "limit": "200000000",
            "label": "b"
          }
        ]
      }
    ]
  }
}

Can we create an Ontology for our extractions?

Prompt: Can you now generate an ontology which supports the extractions that you made and is extensible to incorporate the negative covenant constraints/clauses present in Credit agreements ?

The screenshot by the author — Screenshot from ChatGPT’s response

How about a new negative covenant itself?

Prompt: Can you now generate a negative covenant constraint using the Ontology, knowledge about extractions, and the feedback given to you on Assets for a Credit agreement that could be signed between Goldman Sachs and Stripe in text?

All this is good, but how can the business benefit from this?

To enhance accessibility and improve efficiency, the extracted terms can be displayed in a user-friendly web interface. This can present the terms in a clear tabular format, saving analysts time and resources. Furthermore, powerful semantic search functionality can be applied to multiple credit agreements, enabling users to perform advanced queries such as finding all terms with a capital lease exceeding $50 million or locating tech companies with a leverage maintenance covenant. The system can also provide the capability to analyze a set of multiple companies and determine if they contain a cap on cost savings.

Extensibility and Future Possibilities

The ontology-based approach we’ve adopted is extensible, allowing us to define additional classes, properties, and relationships as needed to capture the diversity of constraints in credit agreements. As the AI’s understanding evolves, it can handle increasingly complex clauses and covenants.

Moreover, the approach is not limited to credit agreements. Any legal document with similarly structured constraints could potentially be parsed and understood by the AI in the same way.

Conclusion

While AI technology will never replace the expertise and judgement of financial analysts and legal professionals, it can significantly aid in their work, automating the parsing of complex legal documents and flagging potential issues. This not only saves time and reduces the likelihood of human error, but also allows professionals to focus on more strategic tasks.

As the field of AI continues to evolve, we can only anticipate the emergence of even more powerful tools for legal and financial analysis, reshaping the landscape of these industries.