Mar 12, 2024
In this tutorial, I will show you a step by step to display Optical Character Recognition (OCR) in React. This article will only cover the rendering aspect of OCR. We will use Azure AI for the text extraction process. The resources that is used in this article can be downloaded from these links:
Before we start, I want to introduce you into a thing called Bounding Box. Bounding box is a rectangle that surrounds an object on certain position. For the example, you can see a bounding box that mark the extracted text on a fake credit card number below. We will try to display these bounding boxes from extracted OCR result.
Source: OCR Demo from Azure AI
For the simplicity, I will use an example of OCR result from Azure AI. You can try their demo page to get familiar with their OCR service. The structure of OCR result may vary depending on the service/library you are currently using. But the main concept will still apply. OCR result usually consist of array of lines. Each line object will have words with its bounding box or boundingPolygon position. We will try to extract the bounding box from words object and ignore the other object for now.
Here is the corresponding typescript type:
For the first step, let's create an image component to display credit card image and add some styling to it. Import the credit card image file and put it into image source attribute.
The result will be like this:
Add a canvas element and style it with absolute position, we also need to add relative position for canvas parent element (the wrapper
class). This is necessary to make sure that the canvas will be on the same position as the image. Make sure the size of canvas element is same as the size of the image.
To manipulate the canvas element, we need to use useRef
hook. useRef
will be used to access canvas element context and draw bounding box rectangles. Assign useRef
into a variable named canvasRef
and initialize its value with null
. After that, fill the canvas element ref
attribute with canvasRef
variable we created.
We will use useEffect
hook for drawing mechanism. Add useEffect
and declare a variable named context
to store the canvas context. The getContext()
method will return an object that provides methods for drawing.
Let's get back a bit and remember the structure of our OCR result. Store the OCR result into a separate file named ocrResult.js
and import it into our main component. To get the first bounding box on a word, we can access it by using result.lines[0].words.boundingPolygon
. A line can consists of multiple words. Hence, to draw the bounding box for each word, we need to iterate on lines and words object. Our useEffect
will be like this:
context.strokeStyle = 'LawnGreen'
is used to set the color of the strokecontext.lineWidth = 2
is used to set the line widthThe canvas drawing process will be started by calling beginPath()
method to begin a drawing path. moveTo()
method will move to the first bounding box point. lineTo()
method will draw in the path and we call this method multiple times to draw all points. Don't forget to call closePath()
method because we need to close the current path and get ready to start drawing another bounding box for different word. After that, we call store()
method to draw all the paths. We will ended up with the result below.
That's it, we already draw the OCR result in React.
There are many ways of how you can improve the OCR result. As you can see below, a demo from Google Cloud OCR service will display the result in four different ways:
Fields
Some documents like identity card or invoice have a defined structure. With the fields view, you can see that the result will follow the document structure. For examples, we have currency
filled with $
and due_date
filled with March 17, 2024
Key Value Pairs This will display the result based on detected key value pairs.
Tables You can also display the result in the table format. This will be helpful if the document has a lot of tables.
Detected sentences/words This option will detect any sentence on your document. Usually, each sentence can also contain words and its corresponding bounding box.
By seeing these examples from Google Cloud OCR, you can see that we can also add interactivity aspect. For the example, when the user mouse position is placed around the bounding box, it will display a label containing the corresponding text that can be copied. The opposite also applies when the user select the content in the fields, it will set the mark the corresponding bounding box on OCR result.