A Clinician's Guide to Machine Learning

Introduction

Artificial intelligence is already changing our world more rapidly than anyone would have predicted 10 years ago. While some industries may have the luxury of "moving quickly and breaking things", the pace of innovation poses a problem for clinicians and the developers of clinical software. After all, our patients' health outcomes are literally in our hands. It is on the clinician to balance between embracing modern techniques and continuing to use time-tested methods. However, I can write with confidence that we are now at the point that AI can be easily and safely implemented into a dental practice. Today's AI has the ability to enhance the skill and knowledge possessed by clinicians while also reducing the impact of their deficiencies. AI is not here to replace anyone. Rather, it is here to make treatment more consistent for both clinicians and patients alike.

SugarBot is an AI application developed by Bench7 that helps dentists identify decay on bitewing images. We have many other modules in the works. It was trained by having a team of expert, US-based dentists laboriously hand label a large collection of images. Every time you use the application, you are taking advantage of a crystallized form of that knowledge delivered in real-time. This article provides a general foundation for understanding clinical AI, and we will often use SugarBot as an example case.

The purpose of this article is not to convince the reader to subscribe to SugarBot. Rather, it is to provide a clinician-focused guide to how today's computer vision / machine learning applications work. By the time you're done reading, you will have a high-quality foundation for educated discussion on the topic. We hope you implement AI of some kind into your practice, even if it isn't SugarBot. However, we also hope our commitment to patient privacy, dedication to open knowledge, ease of use, and price offer a compelling option.

Now, let's get geeky.

Defining AI, machine learning, and deep learning

Modern techniques allow us to teach computers to perform tasks classically reserved for humans. Traditional computer programming involves a developer writing explicit instructions using a pre-defined programming language. This works well for tasks like managing a database or defining the rules of a videogame. However, things get fuzzy with tasks like radiographic interpretation. AI allows us to program computers using data instead of explicit instructions. Essentially, we provide lots of example input and lots of example expected output. A model can be developed in which the computer matches the expected output as best it can. This is conceptually similar to plotting data from an experiment on a graph and using a spreadsheet program to find a "best fit" line through the imperfect data points. However, rather than plotting a straight line on a 2-dimensional graph, the computer finds a "best fit" model for abstract computable functions like "identify decay on a bitewing" or "produce a new image from a text prompt."

You will see these concepts discussed as "artificial intelligence", "AI", "machine learning", and "deep learning". These terms have distinct definitions in academia. Deep learning is a subset of machine learning, which is a subset of AI, which is itself a subset of data science. However, the vast majority of the time these terms are used interchangeably and synonymously.

You will also hear people discuss model "training" and model "inference." Training is the process outlined above. It is relatively slow and takes powerful computers to complete. Inference is using that trained model for practical applications. It is generally much less computationally expensive. Whenever a user interacts with an AI model, they are using it for inference.

Generative vs. deterministic AI models

Broadly speaking, AI models can be classified as either generative or deterministic. Generative models include large language models and diffusion models. These have deservingly received tremendous buzz in recent years. Large language models (LLMs) are the human-like chat bots such as ChatGPT. Diffusion models can generate unique images or videos from short text prompts. The output of these models are rapidly improving, but they have an inherent flaw from the clinician's perspective: their output is unpredictable. This isn't necessarily a bad thing depending on your use case. If you ask ChatGPT to write you a poem about a butterfly 10 times, it will produce 10 unique poems. However, we don't want such randomness and unpredictability in our clinical setting.

Deterministic models, on the other hand, are designed to produce the same output given the same input. While these models lack the perceived creativity of generative models, they are far more useful in domains where predictability and reliability are essential. Deterministic models are used in tasks like image classification, object detection, and segmentation. These models have the added benefit of generally being smaller and less computationally expensive than generative models. They can often be run on a laptop or desktop computer without the need for a powerful GPU.

Assembling data and undergoing the training process

Perhaps the most important step in developing a high-quality model is assembling a good dataset. In the case of computer vision tasks, the dataset is composed of two portions: the images and the labels. In order to have confidence that the resulting model will generalize and perform well in the real world, it is important that the images are diverse and representative. Often, the easiest and most accessible sources of data do not meet these criteria. In the case of SugarBot, we needed to assemble a dataset that was representative in terms of age, ethnicity, gender, and sensor type used for acquisition. Furthermore, we needed lots of images showing diverse scenarios where decay occurs. This included recurrent, occlusal, class V, and incipient class II lesions.

Once the images are assembled, they need to be labeled. In the case of SugarBot, this was done by a team of expert dentists. Just like the images need to be diverse, these annotators needed to be diverse in terms of training and education. SugarBot was trained using a supervised learning approach. That is to say, for every image in the dataset, an expert dentist laboriously labeled individual pixels. While more time-efficient techniques are available, we believe that this approach is still the most accurate and reliable.

Once the dataset is prepared it is time to train the model. The team training the model has a variety of parameters to define prior to training. These include the image model architecture, number of epochs, loss function, augmentation techniques, and more. While discussion of each of these is beyond the scope of this article, suffice it to say that these are knobs and levers which can be nudged to create the best model possible.

At training time, the dataset is itself split into training, validation, and test sets. The training set are the images and labels shown to the model. The validation set is used during training as an unbiased look at the model's performance. The test set is used once the model is done training to assess the model's performance. It is critical that these sets are high quality and do not share similar images.

The sales pitch

If you've made it this far, congratulations. You are now equipped with the knowledge to understand and evaluate AI products for yourself and decide which, if any, suit the needs of your practice. I hope you will consider SugarBot. It uses on-device processing which assures that your patients' data won't be warehoused in the cloud. It can be implemented in less than 15 minutes without the need for a consultant to set it up. To my knowledge, it is the most affordable solution on the market today.

End sales pitch. Thanks for reading.