Feature

Extract Metadata with Screenshots

·
Jonathan Markwell
·
1 min read

Knowing the URL and the time you took a screenshot isn't enough information when you have hundreds to organise each month. Especially when your URLs aren't very descriptive:

https://www.bbc.co.uk/news/technology-63539246

Making another request to that URL to find out the title of the page and gather other data is one option. Sadly some pages change between requests and other pages are short lived. Either way you're doubling the amount of work that website has to do for you and potentially your costs too.

What if you could get all this metadata (and much much more) for free with every one of your screenshots - all in one request?

"publisher": "BBC News",
"title": "Ten days of Twitter chaos",
"date": "2022-11-07T13:22:12.000Z",
"author": "James Clayton",
"description": "Elon Musk’s first week-and-a-half at Twitter has been a rollercoaster of big changes.",
"favicon": "https://static.files.bbci.co.uk/core/website/assets/static/icons/touch/news/touch-icon-36.413a37b22764b74a2793.png",
"image": "https://ichef.bbci.co.uk/news/1024/branded_news/833A/production/_127549533_twitter_cracked_bird_getty.jpg"

How about by simply adding?:

"metadata": true

You can now do just that with Urlbox. You'll need to make an API request that can respond with JSON. Here's an example curl request:

curl -X POST \
https://api.urlbox.io/v1/render/sync \
-H 'Authorization: Bearer YOUR_URLBOX_API_SECRET' \
-H 'Content-Type: application/json' \
-d '{"url":"https://www.bbc.co.uk/news/technology-63539246", "metadata": true}'

That's it. I won't include the full response payload here but you'll get a response structured as follows:

{
  "meta": {
    "endTime": "2022-11-08T12:34:17.033Z",
    "startTime": "2022-11-08T12:34:09.627Z"
  },
  "event": "render.succeeded",
  "result": {
    "size": #,
    "metadata": {"publisher": "BBC News",
                 "title": "Ten days of Twitter chaos", ...},
    "renderUrl": "https://renders.urlbox.io/urlbox1/renders/...png"
  },
  "renderId": "..."
}

You can also use this with our Webhook and S3 features.