Feature

How to extract custom metadata

·
Jonathan Markwell
·
1 min read

Knowing all the bog standard metadata for a screenshot isn't enough information for everyone.

Saving the HTML and parsing it server side is one option. But that takes time and can be a memory intensive pain to do at scale.

What if you could get any data you want from a page at the same moment your screenshot is being taken?

We've supported the execution of custom JavaScript with Urlbox for years. Now that JavaScript can return data to you by assigning a value to a special variable window.customUrlboxData.

Here's an example of a simple request you could make including some JavaScript you'd like us to execute:

curl -X POST \
https://api.urlbox.io/v1/render/sync \
-H 'Authorization: Bearer YOUR_URLBOX_API_SECRET' \
-H 'Content-Type: application/json' \
-d '{"url":"https://www.bbc.co.uk/news/technology-63635380", "metadata": true, \"js": "window.customUrlboxData = {h1: document.querySelectorAll(\"h1\")[0].textContent}"}'

It results in a response like the following:

{
"renderUrl":"http://renders.urlbox.io/urlbox1/renders/...png","size":608845,
"metadata":{
  "author":"BBC News",
  ...
  "custom":{
    "h1":"Google to pay record $391m privacy settlement"
  }
}
}

You can view the full set of json here.