Smart Receiving: A Warehouse Automation Solution
In This White Paper
Warehouse automation is the process of using sensors, robots and AI technologies to automate tasks of moving inventory into, within and out of warehouses. Among the many benefits of warehouse automation are increased efficiency and accuracy, better resource utilization, and improved employee satisfaction.
World Wide Technology (WWT) receives over 6 million components of technology equipment across our global warehouses yearly. WWT's Data Science team and Supply Chain Process Improvement team worked together on a Smart Receiving solution that leverages computer vision technologies in aiding operators in the material validation process at the receipt. The current receiving process is quite manual and often requires operators to check items and quantities on the packing slip attached to the package and verify those on the purchase order (PO) stored in the database.
The new solution uses Optical Character Recognition (OCR) technology to extract important shipment information from the packing slips and a matching algorithm to automatically validate that information against the PO database. The operators can then quickly go through a list of pre-matched items and quantities upon receiving a package. In this paper, we will layout the key steps of our application, from image pre-processing, text recognition to fuzzy matching.
In recent years, a newer receiving process has been used in WWT's warehouses: the Advanced Shipment Notice (ASN) receipt. ASN is the digital form of packing slips provided by vendors, which allows us to validate shipments in advance. However, not all vendors will provide ASN, and those who provide ASN may have a different format that does not comply with our system. In fact, only 58 percent of packages WWT received have ASN and among those that do, about 30 percent failed in the matching process with the PO data.
The goal of this project is to apply OCR technology to quickly digitize packing slips upon receiving them. Applying the OCR method is like generating our own "ASN" data; it is independent of vendors' compliance status. Thus, it reduces receiving time and cost on average, as the ASN receipt has proven to be four times faster than manual receipts and saves $78 on average per order.
Methodology and experimental setup
The idea of OCR tech is dated back to the early 1900s with efforts to help visually impaired people read. Advancements also brought text-to-tactile sensations in 1921 and text-to-morse in 1951. Eventually, OCR made its way into airports and stores as passport and price tag scanners in the 1980s. Today, OCR is incorporated into
many applications with use cases across various industries. With the help of machine learning and computer vision, OCR models can be trained to recognize over 200 languages and can provide over 99 percent accuracy in typed characters from high-quality images. With that being said, the performance of an OCR model can depend on the quality of images, fonts and types of languages. To improve the outcome of an OCR model, image pre-processing and language post-processing are often suggested.
A typical process flow with OCR technology involves image/document acquisition, image processing, text extraction and conversion of free text to structured data (see Figure 2).
On the front end of the smart receiving process, packing slips attached on the packages are scanned or imaged. These images are then processed by our algorithm that includes several image processing steps before the OCR engine and a few steps of text processing afterwards to extract important shipment information, including PO numbers, item serial numbers and quantities.
On the back end, shipment information in PO data is pulled from the database using the PO numbers. A fuzzy matching algorithm is applied to match items from both ends.
In our experiment, we tested sample packing slips from five different vendors. The template of packing slips varies for each vendor. All the packing slips were scanned in as black and white images. The images were in PNG or PDF format.
Image processing was done using OpenCV and Pillow. For the OCR engine, we use pytesseract, the Python wrapper of Google's Tesseract-OCR Engine. Regex ("re" Python module) was used to extract useful information from the extracted text. The FuzzyWuzzy package was used to do fuzzy text matching of the extracted text from the images and PO data from the database. See appendix for the full list of python packages used.
Key steps and results
These are the three high-level steps in our algorithm (see Figure 3):
- Image preprocessing - To avoid output inaccuracy from Tesseract we need to preprocess images. This includes binarization, noise removal, rescaling and removing skewness.
- Text Extraction – For our use case, we used the pretrained Tesseract algorithm to extract text from the image then used regex to extract relevant information.
- Text Matching – Extracted text is matched with purchase order data to verify the shipment.
This is the first step in the process after image acquisition. This step is crucial because the accuracy of the OCR engine depends on it. The goal of preprocessing is to increase the contrast between text characters and the background. We can break down the steps for image pre-processing into the following categories.
Rescaling and noise removal
We need to scale the images to a larger size to recognize smaller characters in the images. To resize the image, we used the resize function from OpenCV with the INTER_CUBIC interpolation method. To remove the noise, i.e., objects with a small number of pixels, we performed morphological operations: image dilation followed by erosion. After that, we applied the Gaussian Blur to smoothen the image. (see Figure 4)
Removing skewness in the images was crucial as the Tesseract engine fails to extract information if the skewed image is passed as an input. We used the projection profile method to remove skewness. The steps are as follows:
- Convert the image to binary
- Calculate the sums of pixels across rows of the image
- Retrieve the count of rows where the sum is zero (all black pixels)
- Rotate the image at various angles from -5 degrees to 5 degrees with step size of .5 degrees and repeat step 2 and 3
- Retrieve the rotation angle where the count of rows is maximum (see Figure 5)
- Rotate image based on the rotation angle determined in step 5 to remove skewness (see Figure 6)
Removing background colors
We also removed darker background colors on the image from locations like table headers. These background colors make it harder for the Tesseract engine to read the lines just below it. To remove background colors from images, we performed median filtering from SciPy. We have seen a significant improvement on the accuracy of text extraction by doing this. (see Figure 7)
The OCR engine tends to be more accurate in identifying characters when we pass a smaller amount of information to it. In this case, we process the image in portions. We set some vendor-specific rules for cropping the image and then pass it to the OCR engine. For example, if a vendor always puts PO numbers on the upper left area of their packing slips, the "region of interest" (ROI) for finding PO numbers will be the upper left quarter of the image. Figure 8 is an example showing how text extraction is more accurate when we define region of interest while extracting PO#:
In this example (Figure 8), the PO number's clue words "Customer PO Reference" and the actual PO number "38661XX" are not close enough that it's hard to identify and extract. By defining a smaller region of interest and zooming in to the region (see Figure 9), the PO number can be quickly extracted.
Following the image processing steps, the pretrained OCR model in pytesseract can already recognize texts with high accuracy. However, there are still a few steps required to turn the free texts into structured and useful information. We use regular expression operations to locate purchase order number, serial number, quantity, etc. In this section we show an example of how the algorithm works. Figure 10 is a scanned packing slip with the important shipment information highlighted in the red boxes.
Step 1: Extracting PO#
To extract the PO number information, firstly we pass the cropped image based on the region of interest as in Fig 11- a, the OCR engine extracts the free text (Fig 11-b) and then regular expressions match a pattern to get the purchase order number (Fig 11-c)
Step 2: Extracting shipment detail
To extract shipment detail, firstly we pass the cropped image based on the region of interest as in Fig 12-top, the OCR engine then extracts the free text (Fig 12-middle) and then regular expressions are applied to match a pattern to get the structured data (Fig 12-bottom).
The last step is to match the extracted data to the PO data. We use the FuzzyWuzzy Python package to calculate string similarity scores between serial numbers from the OCR engine and those from PO data. By allowing the similarity score to be as low as 80%, we can amend some OCR engine errors such as recognizing "o" as "0" or "4" as "A". For every matched serial number, the quantity will also be verified. A side-by-side comparison of items and quantities will be displayed with any discrepancies highlighted. Operators can make manual corrections before accepting the matches. The package will then be received or rejected based on the final matches.
We have outlined the concept and the basic methodology of using the OCR technique to aid the warehouse
receiving process. The new process will help workers verify items in the package quickly and with less repeated effort. The algorithm is promising in terms of being robust, efficient and accurate; we tested it on packing slips from 5 different vendors.
The current application was developed and tested in a static test environment. We are planning to host the smart receiving application on the AWS cloud infrastructure through flask, which eases integration with purchase order data and ASN data along with https access to the end users. As for the next steps, we will assemble hardware on the receiving lines in the warehouse so that high quality images from packing slips can be captured fast. The algorithm will be optimized in terms of direct database connections, enhanced throughput, and more robust receiving process logic.
Leveraging innovative technology and approaches to automate labor-intensive and time-consuming tasks in our warehouses will yield best-in-class efficiency and quality. By combining data science techniques with OCR technology, we can enable our business to easily process packing slips from vendors regardless of their integration capabilities.
- AI Multiple – Supply Chain Automation https://research.aimultiple.com/supply-chain-automation/
- Best OCR by Text Extraction Accuracy in 2022. https://research.aimultiple.com/ocr-accuracy/
- Improve the quality of OCR output. https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html