Using AWS Textract with React

Mohit Kumar Srivastava
4 min readFeb 28, 2023

--

This article is a guide for generating text from an image using AWS Textract in a ReactJs application

Recently I had an opportunity to work on AWS Textract.

Amazon Textract is a machine learning service provided by Amazon Web Services (AWS) that helps extract text and data from documents/images. This is often called OCR — optical character recognition.

In this article, we will try to implement a simple React Application in which we will be able to upload an image and get all the text in that image.

Since we are using an AWS service, you will need to have an AWS account. If you have an AWS account, you will need to generate an accessKeyId and a secretAccessKey which can be generated through the AWS console.

To set up our react app, we will be using Vite

You can set up your app using Vite by following this guide or by using a create-react-app.

We will use React as the framework and Typescript as the variant.

We will need to clear the fields src/App.tsx which has some template code.

We will be using @aws-sdk/client-textract package so we run the command:

npm i @aws-sdk/client-textract

We start by adding input for uploading the picture.

  <div>
<input
className="inputfile"
id="file"
type="file"
name="file"
/>
</div>

We can add a function to process this image. This function will be called onChange of our input. We will also need to add a state for storing the source URL of the uploaded image.

  const [src, setSrc] = useState("");

const onSelectFile = (e: React.ChangeEvent<HTMLInputElement>) => {
if (!e.target.files || e.target.files.length === 0) {
return;
}

const reader = new FileReader();
const file = e.target.files[0];

reader.onload = function (upload: ProgressEvent<FileReader) {
setSrc(upload?.target?.result as string);
};
reader.readAsDataURL(file);
};

Now that we have our file, we will write a function to process this image.

We need the following imports:

import { DetectDocumentTextCommand, TextractClient } from "@aws-sdk/client-textract";

We add a state data to save the information we will get from the function:

We receive data inside the Blocks list and we can see the text in BlockItem.Text;


const [data, setData] = useState([]);

const onRunOCR = async () => {
const client = new TextractClient({
region: 'YOUR_AWS_REGION',
credentials: {
accessKeyId: 'YOUR_ACCESS_KEY_ID',
secretAccessKey: 'YOUR_SECRET_ACCESS_KEY',
},
});

// convert image to byte Uint8Array base 64
const blob = Buffer.from(src.split(',')[1], 'base64');

const params = {
Document: {
Bytes: blob,
},
FeatureTypes: ['TABLES', 'FORMS'],
};

const command = new DetectDocumentTextCommand(params);
try {
const data = await client.send(command);
// process data.
if (data?.Blocks) {
setData(data.Blocks as []);
}
} catch (error) {
console.log('err', error);
// error handling.
}
};

Note: we use buffer here, which is not supported by Vite directly,

so we need to do:

npm i buffer

and import it as:

import { Buffer } from "buffer";
globalThis.Buffer = Buffer;

Finally, we add a button to call the function above and map on data to show the texts in the image.

<div>
<button onClick={onRunOCR} style={{ margin: "10px" }}>
Run OCR
</button>
</div>

<div>
{data?.map(
(
item: {
Text: string;
},
index
) => {
return (
<span
key={index}
style={{ margin: "2px", padding: "2px" }}
>
{item.Text}
</span>
);
}
)}
</div>

Our application is now ready for action.

You can see the final code here: Github Repo

It is considered bad practice to keep secrets on the client-side code. It would be more secure to move them to a .env file or use something like AWS Secrets Manager.

You can always make this app better by using better type checking, moving configs to env file, allowing file preview, loading indicator while image is being processed etc.

We can now upload an image and list the text inside the image.

We can further add more functionalities to this, like the ability to copy the text generated or have a CSV of all the texts.

We can also use the relationships between the blocks to show the data in terms of tables or forms with form-field with their values etc.

Further reading

We can scan invoices, bills, documents, medical prescriptions, and ingredients on the back of a packaged food item. The possibilities are endless.

What application would you create with AWS Textract?

--

--