Web Development
Running a Web Scraper
Let's build a simple web scraper which extracts headlines from The New York Times and uses a BERT model from Huggingface to detect the sentiment of each.
Define the environment
First, we’ll define our environment:
app.py
from beam import App, Runtime, Image
app = App(
name="web-scraper",
runtime=Runtime(
cpu=1,
memory="8Gi",
image=Image(
python_version="python3.8",
python_packages=["bs4", "transformers", "torch"],
),
),
)
Write scraping logic
Now, we’ll write logic to scrape the headlines from The New York Times.
In order to run this on Beam, we add an @app.run()
decorator to the function:
@app.run()
def scrape_nyt():
...
Running the scraper
Now, we’re ready to run our code using Beam. In your terminal, run:
beam run your_file.py:scrape_nyt
You should see the headlines and the detected sentiment of each:
(.venv) beta9@MacBook-Air-2 web-scraping % beam-stage run app.py:scrape_nyt
i Using cached image.
✓ App initialized.
✓ Container scheduled, logs will appear below.
Starting app...
Loading handler in 'app.py:scrape_nyt'...
Running task: c021040d-aea7-4406-9b5e-79d898f7592a
This Hummus Holds Up After 800 Years
{'POSITIVE': 0.9985199570655823, 'NEGATIVE': 0.0014800893841311336}
Task complete: c021040d-aea7-4406-9b5e-79d898f7592a, duration: 177.36207103729248s
Was this page helpful?