This app uses OCR to remove text from an image. You might use this as a stand-alone microservice, or as a pre-processing step in a computer vision pipeline. This tutorial is an adaptation of this post .
Define the environment
You’ll start by creating a Beam app definition. In this file, you’re defining a few things:
The libraries you want installed in the environment
The compute settings (some of the CV operations are heavy, so 16Gi of memory is a safe choice)
from beam import App, Runtime, Image, Output
app = App(
name = "rmtext" ,
runtime = Runtime(
cpu = 1 ,
memory = "16Gi" ,
image = Image(
python_packages = [
"numpy" ,
"matplotlib" ,
"opencv-python" ,
"keras_ocr" ,
"tensorflow" ,
],
commands = [ "apt-get update && apt-get install -y libgl1" ],
),
),
)
Removing the text from an image
You’ll use the code below to accomplish the following:
Identify text in the base-64 encoded image and create bounding boxes around each block of text
Add a mask around each box of text
Paint over each text-mask to remove the text
We’ve added an app.run()
decorator to remove_text
. This decorator will allow us to run this code on Beam, instead of your laptop.
from beam import App, Runtime, Image, Output
import base64
import matplotlib.pyplot as plt
import keras_ocr
import cv2
import math
import numpy as np
app = App(
name = "rmtext" ,
runtime = Runtime(
cpu = 1 ,
memory = "16Gi" ,
image = Image(
python_packages = [
"numpy" ,
"matplotlib" ,
"opencv-python" ,
"keras_ocr" ,
"tensorflow" ,
],
),
),
)
def midpoint ( x1 , y1 , x2 , y2 ):
x_mid = int ((x1 + x2) / 2 )
y_mid = int ((y1 + y2) / 2 )
return (x_mid, y_mid)
@app.run ()
def remove_text ( ** inputs ):
# Grab the base64 from the kwargs
encoded_image = inputs[ "image" ]
# Convert the base64-encoded input image to a buffer
image_buffer = base64.b64decode(encoded_image)
pipeline = keras_ocr.pipeline.Pipeline()
# Read the image
img = keras_ocr.tools.read(image_buffer)
# Generate (word, box) tuples
prediction_groups = pipeline.recognize([img])
mask = np.zeros(img.shape[: 2 ], dtype = "uint8" )
for box in prediction_groups[ 0 ]:
x0, y0 = box[ 1 ][ 0 ]
x1, y1 = box[ 1 ][ 1 ]
x2, y2 = box[ 1 ][ 2 ]
x3, y3 = box[ 1 ][ 3 ]
x_mid0, y_mid0 = midpoint(x1, y1, x2, y2)
x_mid1, y_mi1 = midpoint(x0, y0, x3, y3)
thickness = int (math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2 ))
cv2.line(mask, (x_mid0, y_mid0), (x_mid1, y_mi1), 255 , thickness)
img = cv2.inpaint(img, mask, 7 , cv2. INPAINT_NS )
img_rgb = cv2.cvtColor(img, cv2. COLOR_BGR2RGB )
# Save the generated image to the Beam Output path
cv2.imwrite( "output.png" , img_rgb)
if __name__ == "__main__" :
input_image = "./coffee.jpeg"
with open (input_image, "rb" ) as image_file:
encoded_image = base64.b64encode(image_file.read())
remove_text( image = encoded_image)
You can run this code on Beam by running beam run
:
beam run app.py:remove_text
Make sure to include a sample image in your working directory, and update the script with the path. In this example, I’m using this image as a sample:
Deployment
If you’re satisfied with this function and want to deploy it as an API, you can do so by updating the decorator:
Just replace @app.run()
with @app.task_queue()
# This function will be exposed as a web API when deployed!
@app.task_queue ()
def remove_text ( ** inputs ):
...
You can deploy this app by running:
You’ll call the API by copying the task queue URL from the dashboard.
Since this task runs asynchronously, you’ll use the /v1/task/{task_id}/status/
API to retrieve the task status and a link to download the image output.
This will return a response, which contains:
Task ID
The start and end time
A dictionary with pre-signed URLs to download the outputs
{
"task_id" : "edbcf7ff-e8ce-4199-8661-8e15ed880481" ,
"started_at" : "2023-04-24T22:44:06.911920Z" ,
"ended_at" : "2023-04-24T22:44:07.184763Z" ,
"outputs" : {
"my-output-1" : {
"path" : "output_path" ,
"name" : "my-output-1" ,
"url" : "http://data.beam.cloud/outputs/6446df99cf455a04e0335d9b/hw6hx/hw6hx-0001/edbcf7ff-e8ce-4199-8661-8e15ed880481/my-output-1.zip?..."
}
},
"status" : "COMPLETE" ,
}
Enter the outputs url
in your browser to download the image. You’ll see that the text has been removed: