{ Sitemap URL Extractor }

// extract urls from xml sitemaps

Extract and list all URLs from XML sitemaps instantly. Paste sitemap XML or enter a URL to pull every loc value for SEO audits and crawls.

FETCH FROM URL

Enter a sitemap URL to load it automatically, or paste XML below.

PASTE SITEMAP XML

Sort A→Z Remove duplicates Include sitemap index URLs

🗺️

Ready to extract

Paste XML or fetch a sitemap URL, then click Extract

HOW TO USE

01
Input Sitemap
Paste XML directly into the editor, or enter a sitemap URL and click Fetch to load it automatically.
02
Configure Options
Enable sorting, deduplication, or filter by keyword to narrow your URL list.
03
Extract & Export
Click Extract URLs, then copy to clipboard or download as TXT or CSV for your SEO workflow.

FEATURES

XML Parser Export TXT/CSV URL Filter Deduplicate Remote Fetch Sitemap Index

USE CASES

🔍 SEO audits — extract all crawlable URLs
📊 Content inventories for large sites
🤖 Feed URLs to scrapers or crawlers
🔄 Compare sitemaps across deployments
📋 Generate link lists for redirects

WHAT IS THIS?

The Sitemap URL Extractor parses XML sitemaps and pulls out every <loc> value — the actual page URLs — into a clean, usable list. Works with standard sitemaps, sitemap index files, and news or image sitemaps.

RELATED TOOLS

FREQUENTLY ASKED QUESTIONS

What is an XML sitemap?

An XML sitemap is a file that lists all important URLs on a website, helping search engines like Google discover and index pages. It follows the Sitemap Protocol and uses <loc> tags to specify each URL.

Does this tool support sitemap index files?

Yes — enable the "Include sitemap index URLs" option to also extract <loc> entries from <sitemapindex> files. These typically point to sub-sitemaps rather than individual pages.

Can I fetch a live sitemap from a URL?

Yes. Enter a full URL (e.g. https://example.com/sitemap.xml) in the Fetch bar and click Fetch. The server will retrieve the XML and load it into the editor. Note: some servers block external requests.

What formats can I export URLs in?

You can copy all URLs to your clipboard, download as a plain .txt file (one URL per line), or export as a .csv file with a header row — useful for spreadsheets and analysis tools.

Does the tool handle image and news sitemaps?

Yes. The extractor targets all <loc> tags regardless of namespace. This means it works for standard sitemaps, news sitemaps, image sitemaps, and video sitemaps equally.

Is my data sent to a server?

XML parsing happens entirely in your browser using JavaScript — no data is uploaded. The fetch feature uses a lightweight server-side proxy only to retrieve remote sitemaps that would otherwise be blocked by CORS.

How do I filter results to a specific section?

Use the Filter input to show only URLs containing a specific string. For example, type /blog/ to see only blog posts, or .pdf to find document URLs. The filter is applied before counting and export.

What's the maximum sitemap size supported?

The tool runs entirely in your browser so the practical limit depends on your device memory. Most sitemaps (under 50,000 URLs, the Google-recommended max) parse in under a second with no issues.

What Is a Sitemap URL Extractor?

A sitemap URL extractor is a tool that reads XML sitemaps and pulls out all the <loc> values — the actual page URLs your site tells search engines about. Instead of manually reading through verbose XML, you get a clean, flat list of every URL in seconds.

Whether you're running an SEO audit, building a content inventory, feeding URLs into a crawler, or comparing two versions of a sitemap, this tool eliminates the tedious step of parsing XML by hand.

💡 Looking for SEO-optimized themes to pair with your sitemap work? MonsterONE offers unlimited downloads of templates, UI kits, and assets — worth checking out.

Understanding XML Sitemaps

XML sitemaps follow the Sitemap Protocol, a standard supported by Google, Bing, and other major search engines. A standard sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-03-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

Each <url> block contains a <loc> tag — the canonical URL of the page. The other tags (lastmod, changefreq, priority) are optional hints for crawlers. This tool focuses on the <loc> values since those are what actually matter for indexing and auditing.

Types of XML Sitemaps

Not all sitemaps are the same. Here are the main variants you'll encounter:

Standard URL Sitemap — Lists regular web pages using <urlset> and <url><loc> elements.
Sitemap Index — A "sitemap of sitemaps" using <sitemapindex> that points to individual sub-sitemaps. Commonly used by large sites with thousands of pages.
Image Sitemap — Extends the standard format with image: namespace tags for image-specific metadata.
News Sitemap — Used by Google News publishers with news: namespace tags for publication date and name.
Video Sitemap — Contains video: namespace entries for video content metadata.

This extractor handles all of them — it reads every <loc> tag in the document regardless of namespace, giving you a complete URL list every time.

How to Use the Sitemap URL Extractor

There are two ways to get your sitemap into the tool. The easiest is the URL fetch bar at the top — enter your sitemap URL (e.g. https://yoursite.com/sitemap.xml) and click Fetch. The tool will retrieve the XML and load it into the editor automatically.

Alternatively, paste your XML directly into the input area. This works great if you've already downloaded a sitemap locally, have it stored in version control, or need to inspect a sitemap that requires authentication to access.

Once your XML is loaded, you can optionally sort URLs alphabetically, remove duplicates (useful if your sitemap was generated with overlapping rules), and filter by keyword to focus on a specific section of your site. Click Extract URLs to see your results.

Common SEO Use Cases

The most common use case is a baseline SEO audit: pull all URLs from your sitemap and compare them against your analytics to find pages that are indexed but get no traffic, or pages getting traffic that aren't in your sitemap.

For site migrations, extract URLs from both the old and new sitemaps, then use a diff tool to identify missing pages that need redirects. This catches coverage gaps before they affect your rankings.

Content teams use URL extraction to build inventories. Paste the list into a spreadsheet, add columns for page type, target keyword, and last-updated date, and you have a working content audit framework.

Developers use extracted URL lists to seed integration tests, warm caches after deployment, or verify that their CMS is generating the correct URLs. Feed the list to a tool like wget or a custom crawler to check status codes at scale.

Sitemap Best Practices

Google recommends keeping each sitemap file under 50,000 URLs and 50MB uncompressed. If your site exceeds that, use a sitemap index file to point to multiple smaller sitemaps — this tool handles both formats.

Only include canonical URLs in your sitemap. Paginated pages, faceted navigation, and filtered views with noindex directives should be excluded. If you extract your sitemap and find URLs you didn't expect — especially with query strings like ?sort= or ?page=2 — that's a sign your sitemap generator needs tuning.

Lastmod values should reflect genuine content changes, not template updates or trivial edits. Crawlers use this signal to prioritize re-crawling. If every URL in your sitemap shows today's date, that's a red flag that your CMS is auto-updating timestamps inappropriately.

Diagnosing Sitemap Issues

After extracting your URLs, scan them for common problems: URLs with trailing query strings, protocol mismatches (http vs https), missing trailing slashes where your server expects them, or paths that include development subdirectories. Any of these can cause indexing issues even if the page itself is technically accessible.

Run the extracted URLs through a bulk status-code checker to identify 301s, 404s, or 500s hiding in your sitemap. Search engines that encounter repeated errors on sitemap URLs may reduce their crawl budget for your site.

For large sites, filter your extracted list by section (e.g. /blog/, /products/, /docs/) to audit each area independently. This makes it much easier to spot patterns in how URLs are structured across different parts of your site.

☕