Ready to extract
Paste XML or fetch a sitemap URL, then click Extract// extract urls from xml sitemaps
Extract and list all URLs from XML sitemaps instantly. Paste sitemap XML or enter a URL to pull every loc value for SEO audits and crawls.
Ready to extract
Paste XML or fetch a sitemap URL, then click ExtractPaste XML directly into the editor, or enter a sitemap URL and click Fetch to load it automatically.
Enable sorting, deduplication, or filter by keyword to narrow your URL list.
Click Extract URLs, then copy to clipboard or download as TXT or CSV for your SEO workflow.
The Sitemap URL Extractor parses XML sitemaps and pulls out every <loc> value β the actual page URLs β into a clean, usable list. Works with standard sitemaps, sitemap index files, and news or image sitemaps.
An XML sitemap is a file that lists all important URLs on a website, helping search engines like Google discover and index pages. It follows the Sitemap Protocol and uses <loc> tags to specify each URL.
Yes β enable the "Include sitemap index URLs" option to also extract <loc> entries from <sitemapindex> files. These typically point to sub-sitemaps rather than individual pages.
Yes. Enter a full URL (e.g. https://example.com/sitemap.xml) in the Fetch bar and click Fetch. The server will retrieve the XML and load it into the editor. Note: some servers block external requests.
You can copy all URLs to your clipboard, download as a plain .txt file (one URL per line), or export as a .csv file with a header row β useful for spreadsheets and analysis tools.
Yes. The extractor targets all <loc> tags regardless of namespace. This means it works for standard sitemaps, news sitemaps, image sitemaps, and video sitemaps equally.
XML parsing happens entirely in your browser using JavaScript β no data is uploaded. The fetch feature uses a lightweight server-side proxy only to retrieve remote sitemaps that would otherwise be blocked by CORS.
Use the Filter input to show only URLs containing a specific string. For example, type /blog/ to see only blog posts, or .pdf to find document URLs. The filter is applied before counting and export.
The tool runs entirely in your browser so the practical limit depends on your device memory. Most sitemaps (under 50,000 URLs, the Google-recommended max) parse in under a second with no issues.
A sitemap URL extractor is a tool that reads XML sitemaps and pulls out all the <loc> values β the actual page URLs your site tells search engines about. Instead of manually reading through verbose XML, you get a clean, flat list of every URL in seconds.
Whether you're running an SEO audit, building a content inventory, feeding URLs into a crawler, or comparing two versions of a sitemap, this tool eliminates the tedious step of parsing XML by hand.
π‘ Looking for SEO-optimized themes to pair with your sitemap work? MonsterONE offers unlimited downloads of templates, UI kits, and assets β worth checking out.
XML sitemaps follow the Sitemap Protocol, a standard supported by Google, Bing, and other major search engines. A standard sitemap looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-03-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
Each <url> block contains a <loc> tag β the canonical URL of the page. The other tags (lastmod, changefreq, priority) are optional hints for crawlers. This tool focuses on the <loc> values since those are what actually matter for indexing and auditing.
Not all sitemaps are the same. Here are the main variants you'll encounter:
<urlset> and <url><loc> elements.<sitemapindex> that points to individual sub-sitemaps. Commonly used by large sites with thousands of pages.image: namespace tags for image-specific metadata.news: namespace tags for publication date and name.video: namespace entries for video content metadata.This extractor handles all of them β it reads every <loc> tag in the document regardless of namespace, giving you a complete URL list every time.
There are two ways to get your sitemap into the tool. The easiest is the URL fetch bar at the top β enter your sitemap URL (e.g. https://yoursite.com/sitemap.xml) and click Fetch. The tool will retrieve the XML and load it into the editor automatically.
Alternatively, paste your XML directly into the input area. This works great if you've already downloaded a sitemap locally, have it stored in version control, or need to inspect a sitemap that requires authentication to access.
Once your XML is loaded, you can optionally sort URLs alphabetically, remove duplicates (useful if your sitemap was generated with overlapping rules), and filter by keyword to focus on a specific section of your site. Click Extract URLs to see your results.
The most common use case is a baseline SEO audit: pull all URLs from your sitemap and compare them against your analytics to find pages that are indexed but get no traffic, or pages getting traffic that aren't in your sitemap.
For site migrations, extract URLs from both the old and new sitemaps, then use a diff tool to identify missing pages that need redirects. This catches coverage gaps before they affect your rankings.
Content teams use URL extraction to build inventories. Paste the list into a spreadsheet, add columns for page type, target keyword, and last-updated date, and you have a working content audit framework.
Developers use extracted URL lists to seed integration tests, warm caches after deployment, or verify that their CMS is generating the correct URLs. Feed the list to a tool like wget or a custom crawler to check status codes at scale.
Google recommends keeping each sitemap file under 50,000 URLs and 50MB uncompressed. If your site exceeds that, use a sitemap index file to point to multiple smaller sitemaps β this tool handles both formats.
Only include canonical URLs in your sitemap. Paginated pages, faceted navigation, and filtered views with noindex directives should be excluded. If you extract your sitemap and find URLs you didn't expect β especially with query strings like ?sort= or ?page=2 β that's a sign your sitemap generator needs tuning.
Lastmod values should reflect genuine content changes, not template updates or trivial edits. Crawlers use this signal to prioritize re-crawling. If every URL in your sitemap shows today's date, that's a red flag that your CMS is auto-updating timestamps inappropriately.
After extracting your URLs, scan them for common problems: URLs with trailing query strings, protocol mismatches (http vs https), missing trailing slashes where your server expects them, or paths that include development subdirectories. Any of these can cause indexing issues even if the page itself is technically accessible.
Run the extracted URLs through a bulk status-code checker to identify 301s, 404s, or 500s hiding in your sitemap. Search engines that encounter repeated errors on sitemap URLs may reduce their crawl budget for your site.
For large sites, filter your extracted list by section (e.g. /blog/, /products/, /docs/) to audit each area independently. This makes it much easier to spot patterns in how URLs are structured across different parts of your site.