Executive diagnostic

Hotelbeds sandbox returns prices that are decoupled from production market — consistent with HB's documented anti-scrape posture and our empirical ±95–340% variance vs public Booking rates. LiteAPI sandbox does not show this protection: the data looks real and tracks the typical 21–28% B2B distributor discount band. The question for Nuitée: should LiteAPI's sandbox add the same protection, or does developer-experience justify the competitive-intel exposure?

Universe (Overture + OSM, bbox-filtered)
canonical lodging entities
In LiteAPI catalog
In Hotelbeds sandbox
sandbox subset, not production
High-importance entities missing from LiteAPI
score ≥ 0.65, not in LA catalog
LA & HB vs Booking.com (3 clean-parity refs)
−21% to −28%
Consistent with B2B distributor discount band • n = 3, not generalizable
HB sandbox price variance vs Booking
−52% to +341%
Two-sided extremes — consistent with deliberate anti-scrape decoupling
Median LA API latency
7–11s
vs Hotelbeds sandbox ~500ms

Three-way price cross-check (LA vs HB sandbox vs Booking.com)

  • 7 audit pairs — Booking.com public rate fetched via persistent-profile browser, compared to both distributor quotes
  • 3 near-parity references (WestCord, Van der Valk, Conscious): LA ≈ HB, both 21–28% below Booking — consistent with typical B2B-distributor discount band
  • HB sandbox outliers, both directions: Hotel van Gelder +95% (inflated), Joy Hotel −52% (deflated)
  • Hypothesis: HB sandbox decouples prices deliberately (docs warn against real-time extraction; test DB separate from prod; 50/day quota)
Hotel Date Room LiteAPI HB sandbox Booking LA vs B HB vs B

The price-delta number that didn't survive cross-check

  • Started as apples-to-apples: median (LA − HB) / HB across 24 combos × 146 matched hotel pairs, exact room-name match
  • Bars below zero = LA cheaper. Headline read: "LA cheaper than HB by 13%"
  • Cross-check killed it — numbers too good to be true. The dispersion is HB sandbox's price-decoupling behavior (see Booking cross-check above)
  • Reads as structural divergence between two sandbox APIs, not a real LA undercut

High-importance properties missing from LiteAPI

  • High-importance = shows up in 2+ open datasets (Overture + OSM) AND is a known chain or commercial lodging — i.e., real properties LA should already have, not OSM noise
  • Action: flag this list to LA supply ops for direct outreach. Each property is a concrete acquisition target.
  • Several major chains surface here (Holiday Inn, Hilton-adjacent, ibis variants) plus prominent local brands. Click any row to inspect.
Name Category Score Brand Postcode In HB sandbox?

LiteAPI vs Hotelbeds sandbox

Head-to-head on the things that matter for downstream products: data quality, surface fidelity, integration cost. Cross-functional read of a demand-side API surface that supply-side products have to live with.

Where LiteAPI looks more mature

  • Content-hash rateIds — same room+board+price returns same rateId. HB rotates per call.
  • Nationality-respecting pricing — LA varies ~2% for NL/US/GB/DE (real rate-fence). HB sandbox: identical.
  • Broader catalog — 1,215 Amsterdam hotels vs HB sandbox 532. Long-tail B&Bs surface on LA.
  • Structured cancellation deadlinescancelPolicyInfos[] with ISO timestamps per step.
  • Per-rate commission visibilitycommission[] on each rate. HB's bedbank net hides markup.

Where LiteAPI looks less mature

  • 30× slower — 7–11s per city call vs HB 100–200ms. Punishes agentic AI chains.
  • Mixed tax convention — 56% rates inc-VAT, 44% exc. HB consistent at 0% inc.
  • Catalog dedup gap — 16 name-confirmed duplicates in 1,215 (Rembrandtplein variants pass through).
  • Non-deterministic cheapest-rate — Conscious Westerpark: 10 calls in 60s → 3 different "best" prices.
  • No anti-scrape — realistic rates, free dev key, scrape contracted prices at scale.
  • Freeform room names — no shared vocabulary across hotels.
  • 60% catalog-vs-live gap — only 460–535 of 1,215 return live rates.

Map drilldown

Click any marker to inspect the canonical record: every source ID, every catalog field, rate matrix aggregates per distributor, room-level delta if matched, DQ flags. Color coding below.

in LA AND HB sandbox LA-only HB-sandbox-only in NEITHER distributor (supply moat candidate) low importance (<0.5)

Methodology & caveats

How the numbers were built + what they don't claim.

Pipeline

  • Universe (1,582) — Overture Maps Places + OSM Overpass cross-validated, 1:1 matched on (haversine ≤ 150m, name token-set ≥ 0.65 OR brand-wikidata OR postcode + name)
  • LiteAPI catalog (1,215) — paginated /data/hotels?countryCode=NL&cityName=Amsterdam + lat/lng/radius union, best-matched to universe
  • Hotelbeds (532, sandbox subset)/hotel-content-api/hotels?destinationCode=AMS, bbox-filtered; production catalog is materially larger
  • Designed rate matrix (24 combos) — T+0..T+7 daily, T+14/30/60/90 anchors, weekend/weekday, 1-7 nights, solo/couple/family/group, source markets NL/US/GB/DE
  • Cross-check (n=6) — Booking.com public rates fetched via persistent-profile browser, tax-normalized to consumer-total
  • Property importance score — composite of universe corroboration (0.30) + category (0.25) + chain (0.10) + postcode (0.10) + distributor presence (0.15) + rooms-known (0.10); 0.65 threshold for "high-importance"

What this doesn't claim

  • Hotelbeds = sandbox only. Production HB catalog probably 5-10× larger. LA-vs-universe is the more defensible claim.
  • Cross-check is n=6. Hotel van Gelder LA −36% is 8-15pp outside the clean parity band — possibly an LA promo/private rate.
  • OSM is volunteer-mapped. Stars, room count, brand, operating status not guaranteed. Used as evidence layer; Overture is primary.
  • Phase 2: Wikidata + Foursquare enrichment, n=20 cross-check with hotel-direct + another OTA, end-to-end booking integrity.

Caveats & data quality flags

Confidence calibration per claim (graded against Codex round-2 review)

Claim Confidence Why
"Clean LA/HB parity sits 21–28% below Booking.com" 0.70 Consistent across 3 references but n = 3. Not generalizable without n = 20+ cross-check including hotel-direct and other OTAs.
"Hotelbeds sandbox has data-quality variance in outliers" 0.75 n = 7 shows two extreme outliers in both directions. Pattern is suggestive but small-sample.
"HB sandbox is decoupled from market prices as anti-scrape" (hypothesis) 0.55 HB does NOT randomize prices per call (stable within session). Decoupling lives at the sandbox-DB / rate-mix level. Plausible but not provable without an HB PM confirming.
"LiteAPI covers 51% of the Amsterdam universe vs HB sandbox 32%" 0.70 Universe (1,582 entities) is defensible. HB number is sandbox-bounded; production HB is materially larger.
"Should LiteAPI sandbox add the same protection?" (open product question for Med) tradeoff Not a claim — a deliberate decision Nuitée would make on developer-experience vs competitive-intel exposure.

Anti-scrape hypothesis — supporting evidence and what weakens it

  • HB Content API docs warn against real-time scraping verbatim: "By no means should the ContentAPI be used to retrieve the static information in real-time. This could result in the blocking of credentials." (source )
  • HB test environment runs a separate database from production: "LIVE provides all the real information from HBX Group Databank" — implying TEST does not. (source )
  • Sandbox quota deliberately tight (50 requests/day) — consistent with anti-bulk-extract design.
  • Empirical: ±95–340% bidirectional price variance vs Booking across 7 audit hotels. Two-sided extreme variance is harder to explain as innocent data staleness.
  • Weakener (repeat-pull test, 3 hotels × 10 calls): Within a single 60-second window, HB rotates rateKeys per call (10 unique IDs) but keeps the cheapest price stable. So the anti-scrape pattern is NOT at the per-call level — HB does not randomize prices between repeat calls. The pattern lives at the sandbox-DB level (HB's TEST DB is decoupled from prod) and at the rate-mix level (which rate plans surface). Per-call rotation of rateKey is standard booking-flow session-token behavior, not anti-scrape pricing.
  • Honest framing: HB sandbox catalog and rate-mix are decoupled from production in a way that produces extreme outliers vs market. Whether that's deliberate anti-scrape or just neglected sandbox data quality is not provable without an HB PM confirming. The competitive-intel exposure tradeoff for LiteAPI is real either way: LA sandbox returns real-looking prices that a scraper can use; HB sandbox does not.