Universal Retrieval API
4 endpoints · Fetch, extract, and analyze any web page
/api/v1/platforms/universal/retrieve
Fetch any public URL and extract structured data. Uses a cost-optimized proxy waterfall (direct → residential → Web Scraper API → Web Unblocker). Returns metadata, links, images, headings, tables, JSON-LD, and content in your chosen format.
Parameters
| Field | Type | Required | Default |
|---|---|---|---|
url | string | required | — |
format | string | optional | full |
render | string | optional | — |
selector | string | optional | — |
headers | string | optional | — |
Format Options
| Field | Type | Description |
|---|---|---|
full | default | HTML + Markdown + text + all extracted data |
html | — | Raw HTML + all extracted data |
markdown | — | Markdown + all extracted data — ideal for LLM/AI consumption |
text | — | Plain text + all extracted data — ideal for NLP |
metadata | — | Extracted data only, no content body — fastest |
Proxy Waterfall
Cheapest tier tried first. tier_used tells you which succeeded.
| Field | Type | Description |
|---|---|---|
1. Direct | free | Simple public pages with no bot protection |
2. Residential | ~$0.0001 | Pages blocking datacenter IPs |
3. Web Scraper API | ~$0.01 | Pages with bot detection or CAPTCHAs |
4. Web Unblocker | ~$0.005 | JS-heavy pages with strong anti-bot measures |
Response Fields
| Field | Type | Description |
|---|---|---|
meta | object | Title, description, OG tags, Twitter card, canonical URL, language |
json_ld | array | All JSON-LD structured data (Product, Article, etc.) |
headings | array | All h1-h6 headings with level and text (max 50) |
links | array | All links with URL and anchor text (max 200) |
images | array | All images with URL, alt text, dimensions (max 100) |
tables | array | Tables auto-converted to JSON with headers (max 10) |
forms | array | Form actions, methods, and fields (max 5) |
selector_results | array | CSS selector matches (max 50, when selector param used) |
html | string | Raw HTML (format: full or html) |
markdown | string | Clean Markdown (format: full or markdown) |
text | string | Plain text content (format: full or text) |
Example Request
curl "https://api.nodesnack.com/api/v1/platforms/universal/retrieve?url=https://example.com&format=metadata" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response
{
"success": true,
"data": {
"url": "https://example.com",
"status_code": 200,
"content_length": 45230,
"tier_used": "direct",
"meta": {
"title": "Example",
"description": "...",
"language": "en",
"open_graph": {
"title": "Example"
}
},
"json_ld": [
{
"@type": "WebPage",
"name": "Example"
}
],
"headings": [
{
"level": 1,
"text": "Example Domain"
}
],
"links": [
{
"url": "https://example.com/about",
"text": "About"
}
],
"link_count": 12,
"images": [
{
"url": "https://example.com/logo.png",
"alt": "Logo"
}
],
"image_count": 3,
"tables": [
{
"headers": [
"Col1",
"Col2"
],
"rows": [
{
"Col1": "A",
"Col2": "B"
}
]
}
],
"html": "<!DOCTYPE html>...",
"markdown": "# Example\n...",
"text": "Example Domain\n..."
}
}/api/v1/platforms/universal/personas
Extract people and leadership information from any page. Finds names, titles, social media profiles, images, and contact information from team pages, about pages, and leadership directories.
Parameters
| Field | Type | Required | Default |
|---|---|---|---|
url | string | required | — |
render | string | optional | — |
Response Fields
| Field | Type | Description |
|---|---|---|
persona_count | integer | Number of people found |
personas | array | List of extracted people |
personas[].name | string | Full name |
personas[].title | string | Job title (CEO, CTO, VP Engineering, etc.) |
personas[].image | string | Profile image URL |
personas[].email | string | Email address (if found) |
personas[].social_profiles | object | LinkedIn, Twitter, GitHub, Instagram, etc. |
page_contacts | object | Organization-level emails, phones, and social links |
Example Request
curl "https://api.nodesnack.com/api/v1/platforms/universal/personas?url=https://www.zoominfo.com/about/leadership" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response
{
"success": true,
"data": {
"url": "https://example.com/about/leadership",
"tier_used": "direct",
"persona_count": 3,
"personas": [
{
"name": "Jane Smith",
"title": "CEO",
"image": "https://example.com/jane.jpg",
"social_profiles": {
"linkedin": "https://linkedin.com/in/janesmith",
"twitter": "https://twitter.com/janesmith"
}
},
{
"name": "John Doe",
"title": "CTO",
"image": "https://example.com/john.jpg",
"social_profiles": {
"linkedin": "https://linkedin.com/in/johndoe",
"github": "https://github.com/johndoe"
}
}
],
"page_contacts": {
"emails": [
"hello@example.com"
],
"social_profiles": {
"twitter": "https://twitter.com/example"
}
}
}
}/api/v1/platforms/universal/sitemap
Discover and classify all pages from a website's sitemap. Fetches sitemap.xml (handles indexes, nested sitemaps, robots.txt fallback), extracts all URLs with metadata, and classifies them into categories like careers, leadership, blog, products, pricing, and more.
Parameters
| Field | Type | Required | Default |
|---|---|---|---|
url | string | required | — |
Categories
Pages are classified into 16 categories based on URL path patterns:
Response Fields
| Field | Type | Description |
|---|---|---|
total_pages | integer | Total URLs found in sitemap |
categorized_pages | integer | Pages assigned to at least one category |
uncategorized_pages | integer | Pages with no category match |
categories | object | Dict of category label → list of URLs |
category_counts | object | Dict of category label → count |
pages | array | All URLs with metadata (last_modified, priority, categories) |
Example Request
curl "https://api.nodesnack.com/api/v1/platforms/universal/sitemap?url=https://www.zoominfo.com" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response
{
"success": true,
"data": {
"url": "https://example.com",
"tier_used": "direct",
"total_pages": 156,
"categorized_pages": 98,
"uncategorized_pages": 58,
"categories": {
"blog": [
"https://example.com/blog/post-1",
"https://example.com/blog/post-2"
],
"careers": [
"https://example.com/careers",
"https://example.com/careers/engineering"
],
"products": [
"https://example.com/products",
"https://example.com/products/api"
],
"leadership": [
"https://example.com/about/leadership"
],
"pricing": [
"https://example.com/pricing"
]
},
"category_counts": {
"blog": 45,
"careers": 12,
"products": 18,
"leadership": 1,
"pricing": 2
},
"pages": [
{
"url": "https://example.com/",
"categories": [
"homepage"
],
"last_modified": "2026-04-10",
"priority": 1
}
]
}
}/api/v1/platforms/universal/locations
Extract office, store, or branch locations from any page. Finds addresses, phone numbers, GPS coordinates, business hours, and directions links from location/office/store pages.
Parameters
| Field | Type | Required | Default |
|---|---|---|---|
url | string | required | — |
render | string | optional | — |
Response Fields
| Field | Type | Description |
|---|---|---|
location_count | integer | Number of locations found |
locations | array | List of extracted locations |
locations[].name | string | Location name (e.g., 'Headquarters', 'NYC Office') |
locations[].address | string | Full formatted address |
locations[].street | string | Street address |
locations[].city | string | City |
locations[].state | string | State/region |
locations[].zip | string | Postal code |
locations[].country | string | Country |
locations[].phone | string | Phone number |
locations[].email | string | Email address |
locations[].latitude | number | GPS latitude |
locations[].longitude | number | GPS longitude |
locations[].hours | array | Business hours by day |
locations[].directions_url | string | Google/Apple Maps link |
locations[].image | string | Location image URL |
Example Request
curl "https://api.nodesnack.com/api/v1/platforms/universal/locations?url=https://about.google/locations/" \
-H "Authorization: Bearer YOUR_API_KEY"Example Response
{
"success": true,
"data": {
"url": "https://example.com/locations",
"tier_used": "direct",
"location_count": 3,
"locations": [
{
"name": "Headquarters",
"address": "100 Main St, San Francisco, CA 94105",
"street": "100 Main St",
"city": "San Francisco",
"state": "CA",
"zip": "94105",
"country": "US",
"phone": "+1 (415) 555-0100",
"latitude": 37.7749,
"longitude": -122.4194,
"source": "json-ld"
},
{
"name": "New York Office",
"address": "200 Broadway, New York, NY 10001",
"phone": "+1 (212) 555-0200",
"directions_url": "https://maps.google.com/...",
"source": "html-card"
},
{
"name": "London Office",
"address": "10 Downing St, London, UK",
"source": "address-tag"
}
]
}
}Notes
- Credits are only used when the request returns data successfully.
- For JS-rendered pages (SPAs, React), use
render=html. - The
personasandlocationsendpoints auto-retry with JS rendering when the initial fetch finds no data. - Content is capped at 50KB for
textandmarkdownformats. - The
retrieveendpoint works with any publicly accessible URL. - Social profile detection covers LinkedIn, Twitter/X, GitHub, Instagram, Facebook, TikTok, YouTube, Bluesky, Threads, Mastodon, Medium, and Substack.
