U

Universal Retrieval API

4 endpoints · Fetch, extract, and analyze any web page

GET

/api/v1/platforms/universal/retrieve

Fetch any public URL and extract structured data. Uses a cost-optimized proxy waterfall (direct → residential → Web Scraper API → Web Unblocker). Returns metadata, links, images, headings, tables, JSON-LD, and content in your chosen format.

1 credit

Parameters

FieldTypeRequiredDefault
urlstringrequired
formatstringoptionalfull
renderstringoptional
selectorstringoptional
headersstringoptional

Format Options

FieldTypeDescription
fulldefaultHTML + Markdown + text + all extracted data
htmlRaw HTML + all extracted data
markdownMarkdown + all extracted data — ideal for LLM/AI consumption
textPlain text + all extracted data — ideal for NLP
metadataExtracted data only, no content body — fastest

Proxy Waterfall

Cheapest tier tried first. tier_used tells you which succeeded.

FieldTypeDescription
1. DirectfreeSimple public pages with no bot protection
2. Residential~$0.0001Pages blocking datacenter IPs
3. Web Scraper API~$0.01Pages with bot detection or CAPTCHAs
4. Web Unblocker~$0.005JS-heavy pages with strong anti-bot measures

Response Fields

FieldTypeDescription
metaobjectTitle, description, OG tags, Twitter card, canonical URL, language
json_ldarrayAll JSON-LD structured data (Product, Article, etc.)
headingsarrayAll h1-h6 headings with level and text (max 50)
linksarrayAll links with URL and anchor text (max 200)
imagesarrayAll images with URL, alt text, dimensions (max 100)
tablesarrayTables auto-converted to JSON with headers (max 10)
formsarrayForm actions, methods, and fields (max 5)
selector_resultsarrayCSS selector matches (max 50, when selector param used)
htmlstringRaw HTML (format: full or html)
markdownstringClean Markdown (format: full or markdown)
textstringPlain text content (format: full or text)

Example Request

cURL
curl "https://api.nodesnack.com/api/v1/platforms/universal/retrieve?url=https://example.com&format=metadata" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json
{
  "success": true,
  "data": {
    "url": "https://example.com",
    "status_code": 200,
    "content_length": 45230,
    "tier_used": "direct",
    "meta": {
      "title": "Example",
      "description": "...",
      "language": "en",
      "open_graph": {
        "title": "Example"
      }
    },
    "json_ld": [
      {
        "@type": "WebPage",
        "name": "Example"
      }
    ],
    "headings": [
      {
        "level": 1,
        "text": "Example Domain"
      }
    ],
    "links": [
      {
        "url": "https://example.com/about",
        "text": "About"
      }
    ],
    "link_count": 12,
    "images": [
      {
        "url": "https://example.com/logo.png",
        "alt": "Logo"
      }
    ],
    "image_count": 3,
    "tables": [
      {
        "headers": [
          "Col1",
          "Col2"
        ],
        "rows": [
          {
            "Col1": "A",
            "Col2": "B"
          }
        ]
      }
    ],
    "html": "<!DOCTYPE html>...",
    "markdown": "# Example\n...",
    "text": "Example Domain\n..."
  }
}
GET

/api/v1/platforms/universal/personas

Extract people and leadership information from any page. Finds names, titles, social media profiles, images, and contact information from team pages, about pages, and leadership directories.

2 credits

Parameters

FieldTypeRequiredDefault
urlstringrequired
renderstringoptional

Response Fields

FieldTypeDescription
persona_countintegerNumber of people found
personasarrayList of extracted people
personas[].namestringFull name
personas[].titlestringJob title (CEO, CTO, VP Engineering, etc.)
personas[].imagestringProfile image URL
personas[].emailstringEmail address (if found)
personas[].social_profilesobjectLinkedIn, Twitter, GitHub, Instagram, etc.
page_contactsobjectOrganization-level emails, phones, and social links

Example Request

cURL
curl "https://api.nodesnack.com/api/v1/platforms/universal/personas?url=https://www.zoominfo.com/about/leadership" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json
{
  "success": true,
  "data": {
    "url": "https://example.com/about/leadership",
    "tier_used": "direct",
    "persona_count": 3,
    "personas": [
      {
        "name": "Jane Smith",
        "title": "CEO",
        "image": "https://example.com/jane.jpg",
        "social_profiles": {
          "linkedin": "https://linkedin.com/in/janesmith",
          "twitter": "https://twitter.com/janesmith"
        }
      },
      {
        "name": "John Doe",
        "title": "CTO",
        "image": "https://example.com/john.jpg",
        "social_profiles": {
          "linkedin": "https://linkedin.com/in/johndoe",
          "github": "https://github.com/johndoe"
        }
      }
    ],
    "page_contacts": {
      "emails": [
        "hello@example.com"
      ],
      "social_profiles": {
        "twitter": "https://twitter.com/example"
      }
    }
  }
}
GET

/api/v1/platforms/universal/sitemap

Discover and classify all pages from a website's sitemap. Fetches sitemap.xml (handles indexes, nested sitemaps, robots.txt fallback), extracts all URLs with metadata, and classifies them into categories like careers, leadership, blog, products, pricing, and more.

1 credit

Parameters

FieldTypeRequiredDefault
urlstringrequired

Categories

Pages are classified into 16 categories based on URL path patterns:

homepageleadershipcareersblogproductspricingaboutcontactlocationsinvestorspartnersdocumentationlegalcase_studieseventsloginsignup

Response Fields

FieldTypeDescription
total_pagesintegerTotal URLs found in sitemap
categorized_pagesintegerPages assigned to at least one category
uncategorized_pagesintegerPages with no category match
categoriesobjectDict of category label → list of URLs
category_countsobjectDict of category label → count
pagesarrayAll URLs with metadata (last_modified, priority, categories)

Example Request

cURL
curl "https://api.nodesnack.com/api/v1/platforms/universal/sitemap?url=https://www.zoominfo.com" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json
{
  "success": true,
  "data": {
    "url": "https://example.com",
    "tier_used": "direct",
    "total_pages": 156,
    "categorized_pages": 98,
    "uncategorized_pages": 58,
    "categories": {
      "blog": [
        "https://example.com/blog/post-1",
        "https://example.com/blog/post-2"
      ],
      "careers": [
        "https://example.com/careers",
        "https://example.com/careers/engineering"
      ],
      "products": [
        "https://example.com/products",
        "https://example.com/products/api"
      ],
      "leadership": [
        "https://example.com/about/leadership"
      ],
      "pricing": [
        "https://example.com/pricing"
      ]
    },
    "category_counts": {
      "blog": 45,
      "careers": 12,
      "products": 18,
      "leadership": 1,
      "pricing": 2
    },
    "pages": [
      {
        "url": "https://example.com/",
        "categories": [
          "homepage"
        ],
        "last_modified": "2026-04-10",
        "priority": 1
      }
    ]
  }
}
GET

/api/v1/platforms/universal/locations

Extract office, store, or branch locations from any page. Finds addresses, phone numbers, GPS coordinates, business hours, and directions links from location/office/store pages.

2 credits

Parameters

FieldTypeRequiredDefault
urlstringrequired
renderstringoptional

Response Fields

FieldTypeDescription
location_countintegerNumber of locations found
locationsarrayList of extracted locations
locations[].namestringLocation name (e.g., 'Headquarters', 'NYC Office')
locations[].addressstringFull formatted address
locations[].streetstringStreet address
locations[].citystringCity
locations[].statestringState/region
locations[].zipstringPostal code
locations[].countrystringCountry
locations[].phonestringPhone number
locations[].emailstringEmail address
locations[].latitudenumberGPS latitude
locations[].longitudenumberGPS longitude
locations[].hoursarrayBusiness hours by day
locations[].directions_urlstringGoogle/Apple Maps link
locations[].imagestringLocation image URL

Example Request

cURL
curl "https://api.nodesnack.com/api/v1/platforms/universal/locations?url=https://about.google/locations/" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example Response

response.json
{
  "success": true,
  "data": {
    "url": "https://example.com/locations",
    "tier_used": "direct",
    "location_count": 3,
    "locations": [
      {
        "name": "Headquarters",
        "address": "100 Main St, San Francisco, CA 94105",
        "street": "100 Main St",
        "city": "San Francisco",
        "state": "CA",
        "zip": "94105",
        "country": "US",
        "phone": "+1 (415) 555-0100",
        "latitude": 37.7749,
        "longitude": -122.4194,
        "source": "json-ld"
      },
      {
        "name": "New York Office",
        "address": "200 Broadway, New York, NY 10001",
        "phone": "+1 (212) 555-0200",
        "directions_url": "https://maps.google.com/...",
        "source": "html-card"
      },
      {
        "name": "London Office",
        "address": "10 Downing St, London, UK",
        "source": "address-tag"
      }
    ]
  }
}

Notes

  • Credits are only used when the request returns data successfully.
  • For JS-rendered pages (SPAs, React), use render=html.
  • The personas and locations endpoints auto-retry with JS rendering when the initial fetch finds no data.
  • Content is capped at 50KB for text and markdown formats.
  • The retrieve endpoint works with any publicly accessible URL.
  • Social profile detection covers LinkedIn, Twitter/X, GitHub, Instagram, Facebook, TikTok, YouTube, Bluesky, Threads, Mastodon, Medium, and Substack.