Python - OpenCV pytesseract not extracting string from cropped image

Question

Welcome To Ask or Share your Answers For Others

Python - OpenCV pytesseract not extracting string from cropped image

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python - OpenCV pytesseract not extracting string from cropped image

I have an image (attached) and want to extract certain fields from the form. For example the name 'Sarah', her email address etc. I have the region of interest, which is being highlighted, then cropped. For some reason my output from image to string is showing up as empty?

The desired output should extract the data. Please could someone point me in the right direction? I am following this great tutorial for context: https://www.youtube.com/watch?v=cUOcY9ZpKxw

['', '', '', '', '', '']

Code below:


import cv2
import numpy as np
import pytesseract
import os
pytesseract.pytesseract.tesseract_cmd = r'Tesseract-OCResseract.exe'

imgQ = cv2.imread('sarah.png')

#cv2.imshow('output',imgQ)
#cv2.waitKey(0)

roi = [[(98, 984), (680, 1074), 'text', 'Name'],
       [(740, 980), (1320, 1078), 'text', 'Phone'],
       [(100, 1418), (686, 1518), 'text', 'Email'],
       [(740, 1416), (1318, 1512), 'text', 'ID'],
       [(110, 1598), (676, 1680), 'text', 'City'],
       [(748, 1592), (1328, 1686), 'text', 'Country']]

myData=[]
for x,r in enumerate(roi):
        #highlighted the regions
        cv2.rectangle(imgQ, (r[0][0],r[0][1]),(r[1][0],r[1][1]),(0,255,0),cv2.FILLED)
        imgShow = cv2.addWeighted(imgQ,0.99,imgQ,0.1,0)
        #crop regions
        imgCrop = imgShow[r[0][1]:r[1][1], r[0][0]:r[1][0]]
        cv2.imshow(str(x),imgCrop)
        if r[2] == 'text':

            print('{} :{}'.format(r[3],pytesseract.image_to_string(imgCrop)))
            myData.append(pytesseract.image_to_string(imgCrop))
print(myData)

question from:https://stackoverflow.com/questions/65873670/python-opencv-pytesseract-not-extracting-string-from-cropped-image

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:23:20+0000

The problem in your code is the below line:

cv2.rectangle(img, (r[0][0], r[0][1]), (r[1][0], r[1][1]), (0, 255, 0), cv2.FILLED)

What does this line executes?

Finds the roi in the given image and fills with green color. Like:

Then you are trying to read data from this green rectangle for enumerate(roi) times.

Second, why imgShow = cv2.addWeighted(img, 0.99, img, 0.1, 0)?
Third imgCrop = imgShow[r[0][1]:r[1][1], r[0][0]:r[1][0]]

How about we crop from img?

..

and the output is

Name :Sarah

Phone :+ (00) 765-43-21

Email :[email protected]

ID :1356856

City :London

Country :United Kingdom

Code:

import cv2
from pytesseract import image_to_string

img = cv2.imread("hzt5U.png")
# gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 21)
# txt = image_to_string(thr, config="--psm 6")
# print(txt)

roi = [[(98, 984), (680, 1074), 'text', 'Name'],
       [(740, 980), (1320, 1078), 'text', 'Phone'],
       [(100, 1418), (686, 1518), 'text', 'Email'],
       [(740, 1416), (1318, 1512), 'text', 'ID'],
       [(110, 1598), (676, 1680), 'text', 'City'],
       [(748, 1592), (1328, 1686), 'text', 'Country']]

my_data = []

for x, r in enumerate(roi):
    # highlighted the regions
    # cv2.rectangle(img, (r[0][0], r[0][1]), (r[1][0], r[1][1]), (0, 255, 0), cv2.FILLED)
    # imgShow = cv2.addWeighted(img, 0.99, img, 0.1, 0)

    # crop regions
    # imgCrop = imgShow[r[0][1]:r[1][1], r[0][0]:r[1][0]]
    imgCrop = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
    cv2.imwrite("/Users/ahx/Desktop/res{}.png".format(x), imgCrop)
    cv2.imshow(str(x), imgCrop)
    cv2.waitKey(0)

    if r[2] == 'text':
        print('{} :{}'.format(r[3], image_to_string(imgCrop)))
        my_data.append(image_to_string(imgCrop))

# print(my_data)

Categories

Python - OpenCV pytesseract not extracting string from cropped image

Python - OpenCV pytesseract not extracting string from cropped image

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags