In the process of helping its customers deal with the new variety and volatility of data sources, CRIF has come across some common concerns to think about when considering a new ML classification project. Here are some of them:

HOW MUCH DATA IS NEEDED TO CREATE A CATEGORISATION ENGINE?

This is the most frequently asked question, and the answer is not straightforward because it depends on many factors, and given that the data the categorisation process is working with is highly regulated, this makes things even more difficult.

In general, to create a categorisation engine, the data sample must be “representative”:

  • It should contain all the types of cases that need to be detected. For example, transaction data at the end of the year is completely different from a mid-year sample: pensions/salaries/mortgages/taxes have a natural internal periodicity that the model needs to capture.

  • Transaction proportions must be properly sampled over the whole period you want to cover, as well as the amounts.

  • It must cover the different types of customers that the bank has, so that the algorithm is not biased between high and low spending customers.

In order to ensure that the data to be used is promising, an initial statistical check is recommended to ensure that these requirements are met so that the engine performs as expected in each possible scenario.

HOW MANY TRANSACTIONS IS A CATEGORISATION ENGINE ABLE TO ENRICH? WHAT PERCENTAGE OF THE RESULTS ARE CORRECT?

The evaluation of performance is essential for the continuous improvement of the categorisation engine. Therefore, CRIF put a lot of effort into studying and defining state-of-the-art metrics to inspect every corner of the system’s algorithm, presenting a summary of the most important metrics for multiclass classification problems to the scientific community (for more information, see the CRIF paper Metrics for Multiclass Classification: an Overview) and developing accountability tools to study the algorithm.

Among all the metrics, the two most important KPIs from a business perspective are Coverage and Accuracy:

  • Model Coverage is the percentage of transactions that can be classified by the model: the Coverage level is normally higher than 95% and is measured using the most recent production data. The transactions that cannot be categorized are mostly those for which the description and other fields leveraged by the categorisation engine are empty or simply filled with casual strings or series of numbers.

  • Model Accuracy is the percentage of transactions classified in the most appropriate category included in the Taxonomy. The CRIF Categorisation Engine Accuracy level is higher than 90% and is also measured using the most recent production data. It’s important to remember that a realistic top performance value is around 93%-94% due to the ambiguous nature of the data used by the model.

DOES A CATEGORISATION ENGINE REQUIRE MAINTENANCE?

Transaction data, by its very nature, is constantly evolving, with new merchants entering the market every day, and spending habits that can change dramatically (think of the impact of the pandemic on food deliveries and, more generally, online shopping). Similarly, the categorisation engine should not be thought of as a static model, but as a product that needs to be constantly tuned and maintained to keep a high level of performance. CRIF models are frequently monitored and finetuned: this constant evolution allows the algorithms used by the categorisation engine to be kept at the cutting edge of technology.

WHAT IS THE BEST ANALYTICS APPROACH TO CLASSIFYING A BANKING TRANSACTION USING A CATEGORIsATION ENGINE?

At first glance, rule-based classification systems are more effective: you have absolute certainty of the results and full explainability. In practice, the definition of these rules and their hierarchy is not an easy task: if a rule that filters the keyword “tax” as a “taxes” category is used, this could lead to the incorrect categorisation of “taxi” as a tax instead of transportation. Also, a rule-based system raises performance issues, since rules must be processed one by one until a match is found, and of course, the more rules there are, the greater the computation time.

A machine learning model can differentiate between ambiguous cases by using the other elements of the transactions, such as the description and the amount. Therefore, better classification results can be achieved when the available structured data is limited. In addition, artificial intelligence allows automation and scaling of the solution, with continuous learning over hundreds of millions of transactions, which is otherwise impossible with only human defined rules.

Finally, CRIF’s experience over the past few years suggests that the most effective approach is a hybrid one: rules are more effective when rich metadata is available and can be used to uniquely associate a category with a specific value of a variable, while machine learning excels when less, unstructured information is available.

HOW IS ARTIFICIAL INTELLIGENCE USED IN THE CRIF CATEGORISATION ENGINE?

  • The CRIF Categorisation Engine uses a hybrid combination of machine learning (ML) and rules engine (RE) to understand and interpret the information contained in a banking transaction. The ML core does most of the work, leaving the rules to deal with specific and deterministic cases.

  • The ML algorithms used during the training phase are based on supervised learning techniques that automatically process predictions based on a series of examples that are initially and progressively provided to the algorithm. The learning process requires a training phase involving a user whose task is to read, understand and manually assign a category to a transaction. The user is guided through the process using an active learning approach that minimises the human effort required.

  • After this training phase, the algorithm can interpret the transactions and make predictions about the category to which they should be assigned, i.e., classifying them.

The CRIF Categorisation Engine is made up of two separate components:

  • Categorisation Trainer: the component responsible for the machine learning training

  • Categorisation Classifier: the component responsible for making the model available for the production environment