Define the environment

First, we’ll define our environment:

app.py
from beam import App, Runtime, Image


app = App(
    name="web-scraper",
    runtime=Runtime(
        cpu=1,
        memory="8Gi",
        image=Image(
            python_version="python3.8",
            python_packages=["bs4", "transformers", "torch"],
        ),
    ),
)

Write scraping logic

Now, we’ll write logic to scrape the headlines from The New York Times.

In order to run this on Beam, we add an @app.run() decorator to the function:

@app.run()
def scrape_nyt():
    ...

Running the scraper

Now, we’re ready to run our code using Beam. In your terminal, run:

beam run your_file.py:scrape_nyt

You should see the headlines and the detected sentiment of each:

(.venv) beta9@MacBook-Air-2 web-scraping % beam-stage run app.py:scrape_nyt
 i  Using cached image.
 ✓  App initialized.
 ✓  Container scheduled, logs will appear below.
Starting app...
Loading handler in 'app.py:scrape_nyt'...
Running task: c021040d-aea7-4406-9b5e-79d898f7592a
This Hummus Holds Up After 800 Years
{'POSITIVE': 0.9985199570655823, 'NEGATIVE': 0.0014800893841311336}
Task complete: c021040d-aea7-4406-9b5e-79d898f7592a, duration: 177.36207103729248s