Universal Retrieval API

4 endpoints · Fetch, extract, and analyze any web page

GET

/api/v1/platforms/universal/retrieve

Fetch any public URL and extract structured data. Automatically handles bot protection and JavaScript rendering. Returns metadata, links, images, headings, tables, JSON-LD, and content in your chosen format.

1 credit

Parameters

Field	Type	Required	Default
`url`	string	required	—
`format`	string	optional	full
`render`	string	optional	—
`selector`	string	optional	—
`headers`	string	optional	—

Format Options

Field	Type	Description
`full`	default	HTML + Markdown + text + all extracted data
`html`	—	Raw HTML + all extracted data
`markdown`	—	Markdown + all extracted data — ideal for LLM/AI consumption
`text`	—	Plain text + all extracted data — ideal for NLP
`metadata`	—	Extracted data only, no content body — fastest

Response Fields

Field	Type	Description
`meta`	object	Title, description, OG tags, Twitter card, canonical URL, language
`json_ld`	array	All JSON-LD structured data (Product, Article, etc.)
`headings`	array	All h1-h6 headings with level and text (max 50)
`links`	array	All links with URL and anchor text (max 200)
`images`	array	All images with URL, alt text, dimensions (max 100)
`tables`	array	Tables auto-converted to JSON with headers (max 10)
`forms`	array	Form actions, methods, and fields (max 5)
`selector_results`	array	CSS selector matches (max 50, when selector param used)
`html`	string	Raw HTML (format: full or html)
`markdown`	string	Clean Markdown (format: full or markdown)
`text`	string	Plain text content (format: full or text)

Example Request

cURL

curl "https://api.nodesnack.com/api/v1/platforms/universal/retrieve?url=https://example.com&format=metadata" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "status_code": 200,
    "content_length": 45230,
    "meta": {
      "title": "Example",
      "description": "...",
      "language": "en",
      "open_graph": {
        "title": "Example"
      }
    },
    "json_ld": [
      {
        "@type": "WebPage",
        "name": "Example"
      }
    ],
    "headings": [
      {
        "level": 1,
        "text": "Example Domain"
      }
    ],
    "links": [
      {
        "url": "https://example.com/about",
        "text": "About"
      }
    ],
    "link_count": 12,
    "images": [
      {
        "url": "https://example.com/logo.png",
        "alt": "Logo"
      }
    ],
    "image_count": 3,
    "tables": [
      {
        "headers": [
          "Col1",
          "Col2"
        ],
        "rows": [
          {
            "Col1": "A",
            "Col2": "B"
          }
        ]
      }
    ],
    "html": "<!DOCTYPE html>...",
    "markdown": "# Example\n...",
    "text": "Example Domain\n..."
  }
}

Try in Playground

GET

/api/v1/platforms/universal/personas

Extract people and leadership information from any page. Finds names, titles, social media profiles, images, and contact information from team pages, about pages, and leadership directories.

2 credits

Parameters

Field	Type	Required	Default
`url`	string	required	—
`render`	string	optional	—

Response Fields

Field	Type	Description
`persona_count`	integer	Number of people found
`personas`	array	List of extracted people
`personas[].name`	string	Full name
`personas[].title`	string	Job title (CEO, CTO, VP Engineering, etc.)
`personas[].image`	string	Profile image URL
`personas[].email`	string	Email address (if found)
`personas[].social_profiles`	object	LinkedIn, Twitter, GitHub, Instagram, etc.
`page_contacts`	object	Organization-level emails, phones, and social links

Example Request

cURL

curl "https://api.nodesnack.com/api/v1/platforms/universal/personas?url=https://www.zoominfo.com/about/leadership" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json

{
  "success": true,
  "data": {
    "url": "https://example.com/about/leadership",
    "persona_count": 3,
    "personas": [
      {
        "name": "Jane Smith",
        "title": "CEO",
        "image": "https://example.com/jane.jpg",
        "social_profiles": {
          "linkedin": "https://linkedin.com/in/janesmith",
          "twitter": "https://twitter.com/janesmith"
        }
      },
      {
        "name": "John Doe",
        "title": "CTO",
        "image": "https://example.com/john.jpg",
        "social_profiles": {
          "linkedin": "https://linkedin.com/in/johndoe",
          "github": "https://github.com/johndoe"
        }
      }
    ],
    "page_contacts": {
      "emails": [
        "hello@example.com"
      ],
      "social_profiles": {
        "twitter": "https://twitter.com/example"
      }
    }
  }
}

Try in Playground

GET

/api/v1/platforms/universal/sitemap

Discover and classify all pages from a website's sitemap. Fetches sitemap.xml (handles indexes, nested sitemaps, robots.txt fallback), extracts all URLs with metadata, and classifies them into categories like careers, leadership, blog, products, pricing, and more.

1 credit

Parameters

Field	Type	Required	Default
`url`	string	required	—

Response Fields

Field	Type	Description
`total_pages`	integer	Total URLs found in sitemap
`categorized_pages`	integer	Pages assigned to at least one category
`uncategorized_pages`	integer	Pages with no category match
`categories`	object	Dict of category label → list of URLs
`category_counts`	object	Dict of category label → count
`pages`	array	All URLs with metadata (last_modified, priority, categories)

Example Request

cURL

curl "https://api.nodesnack.com/api/v1/platforms/universal/sitemap?url=https://www.zoominfo.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "total_pages": 156,
    "categorized_pages": 98,
    "uncategorized_pages": 58,
    "categories": {
      "blog": [
        "https://example.com/blog/post-1",
        "https://example.com/blog/post-2"
      ],
      "careers": [
        "https://example.com/careers",
        "https://example.com/careers/engineering"
      ],
      "products": [
        "https://example.com/products",
        "https://example.com/products/api"
      ],
      "leadership": [
        "https://example.com/about/leadership"
      ],
      "pricing": [
        "https://example.com/pricing"
      ]
    },
    "category_counts": {
      "blog": 45,
      "careers": 12,
      "products": 18,
      "leadership": 1,
      "pricing": 2
    },
    "pages": [
      {
        "url": "https://example.com/",
        "categories": [
          "homepage"
        ],
        "last_modified": "2026-04-10",
        "priority": 1
      }
    ]
  }
}

Try in Playground

GET

/api/v1/platforms/universal/locations

Extract office, store, or branch locations from any page. Finds addresses, phone numbers, GPS coordinates, business hours, and directions links from location/office/store pages.

2 credits

Parameters

Field	Type	Required	Default
`url`	string	required	—
`render`	string	optional	—

Response Fields

Field	Type	Description
`location_count`	integer	Number of locations found
`locations`	array	List of extracted locations
`locations[].name`	string	Location name (e.g., 'Headquarters', 'NYC Office')
`locations[].address`	string	Full formatted address
`locations[].street`	string	Street address
`locations[].city`	string	City
`locations[].state`	string	State/region
`locations[].zip`	string	Postal code
`locations[].country`	string	Country
`locations[].phone`	string	Phone number
`locations[].email`	string	Email address
`locations[].latitude`	number	GPS latitude
`locations[].longitude`	number	GPS longitude
`locations[].hours`	array	Business hours by day
`locations[].directions_url`	string	Google/Apple Maps link
`locations[].image`	string	Location image URL

Example Request

cURL

curl "https://api.nodesnack.com/api/v1/platforms/universal/locations?url=https://about.google/locations/" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json

{
  "success": true,
  "data": {
    "url": "https://example.com/locations",
    "location_count": 3,
    "locations": [
      {
        "name": "Headquarters",
        "address": "100 Main St, San Francisco, CA 94105",
        "street": "100 Main St",
        "city": "San Francisco",
        "state": "CA",
        "zip": "94105",
        "country": "US",
        "phone": "+1 (415) 555-0100",
        "latitude": 37.7749,
        "longitude": -122.4194
      },
      {
        "name": "New York Office",
        "address": "200 Broadway, New York, NY 10001",
        "phone": "+1 (212) 555-0200",
        "directions_url": "https://maps.google.com/..."
      },
      {
        "name": "London Office",
        "address": "10 Downing St, London, UK"
      }
    ]
  }
}

Try in Playground

Notes

Credits are only used when the request returns data successfully.
For JS-rendered pages (SPAs, React), use render=html.
The personas and locations endpoints auto-retry with JS rendering when the initial fetch finds no data.
Content is capped at 50KB for text and markdown formats.
The retrieve endpoint works with any publicly accessible URL.
Social profile detection covers LinkedIn, Twitter/X, GitHub, Instagram, Facebook, TikTok, YouTube, Bluesky, Threads, Mastodon, Medium, and Substack.

Universal Retrieval API

/api/v1/platforms/universal/retrieve

Parameters

Format Options

Response Fields

Example Request

Example Response

/api/v1/platforms/universal/personas

Parameters

Response Fields

Example Request

Example Response

/api/v1/platforms/universal/sitemap

Parameters

Categories

Response Fields

Example Request

Example Response

/api/v1/platforms/universal/locations

Parameters

Response Fields

Example Request

Example Response

Notes