Skip to content

How to Scrape E-Commerce Websites Using Zyte API

Published: at 06:54 AM

Web scraping has gained popularity over the past decade. Its ability to extract data from websites has many functionalities, including gathering important data and saving it in a database. There are many ways to scrape websites using several programming languages, but this time, we’re going to use a platform that makes web scraping easier.

Table of contents

Open Table of contents

1. Setup

I will use Next.js for this tutorial. If you don’t know how to set up a Next.js project, I have made a post about it. If you already have a basic Next.js setup, let’s create a new file in the root of the project called .env.

create-env-file

For the key, write inside the .env file like NEXT_PUBLIC_ZYTE_API_KEY=. Then, we will grab the value from the official website of Zyte API. First, you should register using your email, Google, or GitHub account. After you’ve signed in, head over to the Zyte API tab.

zyte-api-tab

Hit API Access, copy the value, and then paste it into your .env file. For example: NEXT_PUBLIC_ZYTE_API_KEY=83pe98r3kqwljkdf98

copy-api-key-value

2. Server-Side Way

Back to our project. Create a new folder called extract under the app folder because this is where our route will be created. Create two files: page.tsx and actions.ts. I will extract the data on the server side, but if you need to do it on the client side (input text & button), just create page.tsx inside of it. In actions.ts, write the following code:

export const getExtractedData = async ({ url }: { url: string }) => {
  const extractedData = await fetch("https://api.zyte.com/v1/extract", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Basic ${btoa(
        process.env.NEXT_PUBLIC_ZYTE_API_KEY! + ":" + ""
      )}`,
    },
    body: JSON.stringify({
      url: url,
      product: true,
      productOptions: { extractFrom: "httpResponseBody" },
    }),
  });
  const data = await extractedData.json();

  if (!data) throw new Error("This data does not exist.");

  return data;
};

Inside page.tsx, create the following code:

import { getExtractedData } from "./actions";

const Page = async () => {
  const data = await getExtractedData({
    url: "https://www.therealreal.com/products/women/handbags/totes/chanel-patent-puzzle-tote-lpztj",
  });

  return (
    <div className="h-screen w-full flex items-center justify-center">
      {data && (
        <div>
          <img src={data.product?.mainImage.url} alt="main-product" />
          <h1 className="text-3xl mt-4">
            <span className="font-bold">Name:</span> {data.product?.name}
          </h1>
          <h2 className="text-xl">
            <span className="font-bold">Price:</span> {data.product?.price}
          </h2>
          <h2 className="text-xl">
            <span className="font-bold">Brand:</span>{" "}
            {data.product?.brand?.name}
          </h2>
          <p className="max-w-sm">
            <span className="font-bold">Description:</span>{" "}
            {data?.product?.description}
          </p>
        </div>
      )}
    </div>
  );
};

export default Page;

For the URL, you can replace it with whatever you want. You can even use a text input from the user so they can enter any e-commerce link. If you want to do so, follow the next step.

3. Client-Side Way

If you want to do it on the client side, just write all of this code in page.tsx:

"use client";

import { useState } from "react";

type Product = {
  product: {
    mainImage: {
      url: string;
    };
    name: string;
    price: string;
    brand: {
      name: string;
    };
    description: string;
  };
} | null;

const Page = () => {
  const [extractedData, setExtractedData] = useState<Product>(null);
  const [message, setMessage] = useState<string>("");
  const [url, setUrl] = useState<string>("");

  const getExtractedData = async () => {
    const extractedData = await fetch("https://api.zyte.com/v1/extract", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Basic ${btoa(
          process.env.NEXT_PUBLIC_ZYTE_API_KEY! + ":" + ""
        )}`,
      },
      body: JSON.stringify({
        url: url,
        product: true,
        productOptions: { extractFrom: "httpResponseBody" },
      }),
    });
    const data = await extractedData.json();

    if (!data) {
      setMessage(data.detail);
      return;
    }

    setExtractedData(data);
  };

  return (
    <div className="h-screen w-full flex flex-col items-center justify-center">
      <input
        type="text"
        className="text-black placeholder:text-black px-4 py-2"
        placeholder="Enter URL"
        onChange={(e) => setUrl(e.target.value)}
      />
      <button
        className="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded mt-4"
        onClick={getExtractedData}
      >
        Extract
      </button>
      <p>{message}</p>
      {extractedData && (
        <div>
          <img src={extractedData.product?.mainImage.url} alt="main-product" />
          <h1 className="text-3xl mt-4">
            <span className="font-bold">Name:</span>{" "}
            {extractedData.product?.name}
          </h1>
          <h2 className="text-xl">
            <span className="font-bold">Price:</span>{" "}
            {extractedData.product?.price}
          </h2>
          <h2 className="text-xl">
            <span className="font-bold">Brand:</span>{" "}
            {extractedData.product?.brand?.name}
          </h2>
          <p className="max-w-sm">
            <span className="font-bold">Description:</span>{" "}
            {extractedData?.product?.description}
          </p>
        </div>
      )}
    </div>
  );
};

export default Page;

This is the result:

web-scrape-demo

In case you want to see the raw result from the Zyte API response, here is the JSON structure:

{
  url:
    'https://www.therealreal.com/products/women/handbags/totes/chanel-patent-puzzle-tote-lpztj',
  statusCode: 200,
  product: {
    name: 'Patent Puzzle Tote',
    price: '1600.0',
    currency: 'USD',
    currencyRaw: '$',
    availability: 'OutOfStock',
    sku: 'CHA1073989',
    brand: { name: 'Chanel' },
    breadcrumbs: [
      { name: 'All', url: 'https://www.therealreal.com/designers/chanel' },
      {
        name: 'Women',
        url: 'https://www.therealreal.com/designers/chanel/women'
      },
      {
        name: 'Handbags',
        url: 'https://www.therealreal.com/designers/chanel/women/handbags'
      },
      {
        name: 'Totes',
        url: 'https://www.therealreal.com/designers/chanel/women/handbags/totes'
      }
    ],
    mainImage: {
      url:
        'https://product-images.therealreal.com/CHA1073989_1_enlarged.jpg?width=335'
    },
    images: [
      {
        url:
          'https://product-images.therealreal.com/CHA1073989_1_enlarged.jpg?width=335'
      }
    ],
    description: 'Chanel Tote\n' +
      'From the 2008-2009 Collection by Karl Lagerfeld\n' +
      'Vintage\n' +
      'Black Patent Leather\n' +
      'Interlocking CC Logo & Chain-Link Accent\n' +
      'Gold-Tone Hardware\n' +
      'Chain-Link Shoulder Straps\n' +
      'Canvas Lining & Three Interior Pockets\n' +
      'Snap Closure at Top\n' +
      'Protective Feet at Base',
    descriptionHtml: '<article>\n' +
      '\n' +
      '<ul><li>Chanel Tote</li><li>From the 2008-2009 Collection by Karl Lagerfeld</li><li>Vintage</li><li>Black Patent Leather</li><li>Interlocking CC Logo &amp; Chain-Link Accent</li><li>Gold-Tone Hardware</li><li>Chain-Link Shoulder Straps</li><li>Canvas Lining &amp; Three Interior Pockets</li><li>Snap Closure at Top</li><li>Protective Feet at Base</li></ul>\n' +
      '\n' +
      '</article>',
    additionalProperties: [
      { name: 'shoulder strap length', value: '7.25"' },
      { name: 'height', value: '10.25"' },
      { name: 'width', value: '12.5"' },
      { name: 'depth', value: '4"' },
      { name: 'item #', value: 'CHA1073989' }
    ],
    features: Array(10) [
      'Chanel Tote', 'From the 2008-2009 Collection by Karl Lagerfeld', 'Vintage',
      'Black Patent Leather', 'Interlocking CC Logo & Chain-Link Accent', 'Gold-Tone Hardware',
      'Chain-Link Shoulder Straps', 'Canvas Lining & Three Interior Pockets', 'Snap Closure at Top',
      'Protective Feet at Base'
    ],
    url:
      'https://www.therealreal.com/products/women/handbags/totes/chanel-patent-puzzle-tote-lpztj',
    canonicalUrl:
      'https://www.therealreal.com/products/women/handbags/totes/chanel-patent-puzzle-tote-lpztj',
    metadata: {
      probability: 0.96113121509552,
      dateDownloaded: '2024-07-08T09:05:09Z'
    }
  }
}

That’s it. Hope this helps, and see you in the next post!