Product Data Scraping

How Bambuser Live Shopping product data scraping works

This document provides a more detailed view of how the Bambuser Live Shopping Product data scraping works.


When you add a product to a show in the Bambuser Live Shopping Dashboard, some basic product details are scraped from the content of the given product URL (e.g. https://yourcompany.com/products/pink-shirt).

The following properties will be extracted from the page:

  • a product name or title
  • an image URL (product thumbnail)
  • a brand name
  • a reference (often called SKU - these can all be fetched or entered manually by the admin when setting up the show)

All fields can also be inserted and modified manually, however, the Bambuser product scraper tends to reduce the manual work by automating the product data insertion and make the consumers' life easier.

The scraper looks for different kinds of structured product data and metadata, using the following priority order:

  1. Schema.org markup
    1. JSON-LD
    2. Microdata
  2. OpenGraph meta-tags (og:)
  3. Generic HTML tags

Note: If the scraper is not able to find a product reference (SKU) it will use the provided product URL as a reference.

1. Schema.org markup

Specification: https://schema.org/Product

Google's testing tool can be used to see if your site supports this: https://search.google.com/structured-data/testing-tool/u/0/

JSON-LD

Example:

<script type="application/ld+json"> 
 { 
   "@type": "Product", 
   "@context": "http://schema.org/", 
   "name": "My Product Name", 
   "description": "My Description", 
   "brand": { "@type": "Thing", "name": "My Brand Name" }, 
   "image": "https://yoursite.com/path-to-image.jpg", 
   "sku": "product-sku-12345" 
 } 
</script>

Microdata

Example:

<div itemscope itemtype=" 
 "> 
     <span itemprop="name">My Product Name</span> - 
     <span itemprop="brand">My Brand Name</span><br> 
     <img itemprop="image" src="https://yoursite.com/path-to-image.jpg"><br> 
     <span itemprop="description">Some optional description</span><br> 
     Product number: <span itemprop="sku" content="product-sku-12345"></span><br> 
</div>

2. OpenGraph meta-tags (og:)

Specification: https://developers.facebook.com/docs/payments/product/

Example:

<meta property="og:type"  content="og:product" />
<meta property="og:title" content="My Product Name">
<meta property="product:brand" content="My Brand Name" />
<meta property="og:image" content="http://path-to-thumbnail">
<meta property="og:description" content="Some optional description!" />
<meta property="product:retailer_item_id" content="product-sku-12345" />

3. Generic meta tags

If the aforementioned structured product data are not found, the product scraper looks for generic information found on most websites such as the title element, images.

<head>
  <title> My Product Page Name </title>
</head>
<body>  
  ...  
  <img src="https://yoursite.com/path-to-image.jpg">
  ...
</body>

Whitelist the scraper


An example use case for when you need to whitelist our scraper is when you intend to add products from your staging/test environment that is not publicly accessible. You can make an exemption for our scraper user-agent or whitelist static IP address.

User-agent:

The scraper will identify itself with the following user-agent: BambuserLiveShopping/1.0. You can make an exception for requests made by this user-agent or a whitelist static IP address.

Static IP address:

You can also whitelist our scraper through a static IP address: `35.224.84.15`.  Besides whitelisting static IP address from your side, you also need to inform Bambuser staff to enable it for your organization


FAQ/Troubleshoot

We highly recommend you to use the JSON-LD format of the Schema.org/Product

Can I add product URLs if we do not have structured data?

Absolutely! You can then update product details such as Title and Thumbnail manually. The product scraper is only a tool to automate manual data insertion and make your life easier.

My entered URL is publicly accessible but I still get an error.

- Ensure your URL is also accessible from the US regions since our scraper is located in the United States. If you are outside of the US, you can test this using a VPN.

- You can check the forwarded error response in the network tab of your browser dev tool.

When I add a product, some fields are empty or have incorrect values.

For the scraper to initialize the product template fields with correct values, there must be valid Schema.org/Product structured data available and accessible on your product page.

We do have structured product markups on our product page but it gets initialized with incorrect data.

- Ensure the structured data is valid and does not have critical errors using Google's testing tool

- Ensure that the structured data are available on the page load response and not loaded and rendered after the main page request.

How to check that?
Navigate to your product page (same URL you are trying to add), right-click, and select the "View Page Source" to open the response of the request to the product URL. Then look into the source code and double-check if the structured data exist, is valid, and has correct values.

Google structured data tool successfully recognizes our JSON-LD data but Bambuser product scraper does not.

There might be errors in JSON-LD data that Google Structured data tool does not complain about. Use https://json-ld.org/playground/ to check the validity of your JSON-LD data.

If everything still looks fine on your side, contact the support department at support@bambuser.com