Image text extractor python

# if the pixel isn't white, set it to blackĪgain, this is an extreme approach. Pixels = img.load() # create the pixel mapįor i in range(img.size): # for every col:įor j in range(img.size): # For every row # The extreme approach, keeping only white pixels If image source file is inputted as a parameter if inputfile: Reading image using opencv img cv2.imread(inputfile) Preserve a copy of this image for comparison purposes initialimg img.copy() highlightedimg img.copy() Convert image to binary binimg convertimg2bin(img) Calling Tesseract Tesseract Configuration parameters.

This would work since your text is always white, although any purely white areas of the background would be saved, so hopefully pytesseract can handle that. If that doesn't work, the most extreme approach would be to iterate over all the pixels in the image, test if they are white, and if they aren't, set them to black. There's a million different filters that you could apply with PIL.

Since you pointed out that this approach doesn't work, I would try to use PIL to get it to be as black and white as possible. Text/Number extractor from image positional arguments: images path (s) to input image (s) optional arguments: -h, -help show this help message and exit -east EAST path to input EAST text detector -c CONFIDENCE, -confidence CONFIDENCE minimum probability required to inspect a region -w WIDTH, -width WIDTH resized image width (should be. Using PIL, you can simply call img.convert("1") to convert PIL Images to a black and white version. As pointed out, some further image manipulation might be necessary.