robots.txt
The robots.txt feature generates a /robots.txt file in your build output during astro build. It is configured through the robotsTxt option on the integration and does not affect the <meta name="robots"> tag, which is handled by the Robots component.
If a robots.txt file already exists in the build output, generation is skipped and a warning is logged. If robotsTxt is omitted from your integration config, a warning is logged recommending you add it.
Options
Section titled “Options”RobotsTxtOptions
Section titled “RobotsTxtOptions”| option | type | default | required | description |
|---|---|---|---|---|
rules | RobotsTxtRule | RobotsTxtRule[] | - | Yes | One or more crawl rules defining agent access. |
sitemap | string | string[] | - | No | Absolute http(s) URL(s) or site-relative path(s) to include as Sitemap entries. Relative paths require site to be set in your Astro config. |
RobotsTxtRule
Section titled “RobotsTxtRule”| option | type | default | required | description |
|---|---|---|---|---|
agent | string | string[] | - | Yes | One or more user-agent names. Must be non-empty. |
allow | string | string[] | - | No | Path(s) to allow. Each must start with /. |
disallow | string | string[] | - | No | Path(s) to disallow. Each must start with /. |
noindex | string | string[] | - | No | Path(s) for the Noindex directive. Each must start with /. |
cleanParam | string | string[] | - | No | Query parameter(s) for the Clean-param directive. |
crawlDelay | number | - | No | Non-negative crawl delay in seconds. |
Examples
Section titled “Examples”Input:
eminence({ robotsTxt: { rules: { agent: "*", disallow: "/private/", }, },});Output:
User-agent: *Disallow: /private/With multiple rules
Section titled “With multiple rules”Input:
eminence({ robotsTxt: { rules: [ { agent: "*", disallow: "/" }, { agent: "Googlebot", allow: "/" }, ], },});Output:
User-agent: *Disallow: /
User-agent: GooglebotAllow: /With sitemap reference
Section titled “With sitemap reference”Input:
eminence({ robotsTxt: { rules: { agent: "*" }, sitemap: "/sitemap-index.xml", },});Output at /robots.txt** (assuming site is https://example.com):
User-agent: *
Sitemap: https://example.com/sitemap-index.xmlExplicit opt-out
Section titled “Explicit opt-out”Input:
eminence({ robotsTxt: false,});Output: No file is generated and no warning is logged.
Complete
Section titled “Complete”All options provided explicitly. Assumes site is https://example.com in your Astro config.
Input:
eminence({ robotsTxt: { rules: [ { agent: "*", allow: "/", disallow: "/private/", noindex: "/drafts/", cleanParam: "utm_source /", crawlDelay: 2, }, { agent: ["Googlebot", "Bingbot"], disallow: ["/tmp/", "/cache/"], }, ], sitemap: ["/sitemap.xml", "https://cdn.example.com/sitemap-news.xml"], },});Output at /robots.txt:
User-agent: *Allow: /Disallow: /private/Noindex: /drafts/Clean-param: utm_source /Crawl-delay: 2
User-agent: GooglebotUser-agent: BingbotDisallow: /tmp/Disallow: /cache/
Sitemap: https://example.com/sitemap.xmlSitemap: https://cdn.example.com/sitemap-news.xmlDecisions Made
Section titled “Decisions Made”Existing file is never overwritten
Section titled “Existing file is never overwritten”If a robots.txt already exists in the build output when astro build runs, the integration skips generation and logs a warning. This avoids silently clobbering a manually maintained file.
Relative sitemap values require site to be configured
Section titled “Relative sitemap values require site to be configured”Relative sitemap paths are resolved against Astro.config.site. If site is not set, the integration throws an error rather than producing a broken absolute URL.
All path directives must start with /
Section titled “All path directives must start with /”allow, disallow, and noindex values are validated to start with /. This mirrors the robots.txt specification requirement and surfaces misconfiguration at build time rather than at crawl time.
Setting false is the explicit opt-out
Section titled “Setting false is the explicit opt-out”Passing robotsTxt: false disables the feature cleanly. Omitting robotsTxt logs a recommendation warning so new projects are nudged to make an intentional choice.
Source code
Section titled “Source code”import { constants } from "node:fs";import { access, mkdir, writeFile } from "node:fs/promises";import { dirname, join } from "node:path";import { fileURLToPath } from "node:url";import type { IntegrationRuntimeContext } from "..";
export type CrawlerAgent = string;
export type RobotsTxtRule = { agent: CrawlerAgent | CrawlerAgent[]; allow?: string | string[]; disallow?: string | string[]; noindex?: string | string[]; cleanParam?: string | string[]; crawlDelay?: number;};
export type RobotsTxtOptions = { rules: RobotsTxtRule | RobotsTxtRule[]; sitemap?: string | string[];};
export const ROBOTS_TXT_RECOMMENDATION = "Recommendation: follow eminence-astro-suite.xeffen25.com/recommendations/why-you-should-add-a-robots-txt to learn why adding a robots.txt is important.";
export const ROBOTS_TXT_RELATIVE_PATH = "/robots.txt";
const toArray = <T>(value: T | T[] | undefined): T[] => { if (value === undefined) { return []; }
return Array.isArray(value) ? value : [value];};
const exists = async (path: string): Promise<boolean> => { try { await access(path, constants.F_OK); return true; } catch (error) { if ( error && typeof error === "object" && "code" in error && error.code === "ENOENT" ) { return false; }
throw error; }};
const assertNonEmptyString = (value: string, fieldName: string): string => { if (value.trim().length === 0) { throw new Error(`Invalid ${fieldName} value: expected a non-empty string.`); }
return value;};
const normalizeAgentList = (value: RobotsTxtRule["agent"]): string[] => { const agents = toArray(value).map((agent) => assertNonEmptyString(agent, "agent"), );
if (agents.length === 0) { throw new Error("Invalid robotsTxt rule: at least one agent is required."); }
return agents;};
const normalizePathDirective = ( value: | RobotsTxtRule["allow"] | RobotsTxtRule["disallow"] | RobotsTxtRule["noindex"], fieldName: "allow" | "disallow" | "noindex",): string[] => { return toArray(value).map((entry) => { const normalized = assertNonEmptyString(entry, fieldName);
if (!normalized.startsWith("/")) { throw new Error( `Invalid ${fieldName} value "${entry}": expected a path starting with "/".`, ); }
return normalized; });};
const normalizeCleanParam = (value: RobotsTxtRule["cleanParam"]): string[] => { return toArray(value).map((entry) => assertNonEmptyString(entry, "cleanParam"), );};
const normalizeCrawlDelay = (value: number | undefined): number | undefined => { if (value === undefined) { return undefined; }
if (!Number.isFinite(value) || value < 0) { throw new Error( `Invalid crawlDelay value "${value}": expected a non-negative number.`, ); }
return value;};
const normalizeSite = ( site: IntegrationRuntimeContext["config"]["site"],): URL | undefined => { if (!site) { return undefined; }
return new URL(site);};
const normalizeSitemapValue = ( value: string, site: URL | undefined,): string => { const entry = assertNonEmptyString(value, "sitemap");
let parsed: URL | null = null; try { parsed = new URL(entry); } catch { // not an absolute URL; treat as a site-relative path }
if (parsed !== null) { if (parsed.protocol !== "http:" && parsed.protocol !== "https:") { throw new Error( `Invalid sitemap value "${entry}": expected an http(s) URL or relative path.`, ); }
return parsed.href; }
if (!site) { throw new Error( `Invalid sitemap value "${entry}": relative sitemap values require Astro site to be configured.`, ); }
return new URL(entry, site).href;};
const buildRobotsTxt = ( options: RobotsTxtOptions, site: IntegrationRuntimeContext["config"]["site"],): string => { const normalizedSite = normalizeSite(site); const rules = toArray(options.rules); if (rules.length === 0) { throw new Error( "Invalid robotsTxt configuration: expected at least one rule.", ); }
const blocks = rules.map((rule) => { const lines: string[] = []; for (const agent of normalizeAgentList(rule.agent)) { lines.push(`User-agent: ${agent}`); }
for (const value of normalizePathDirective(rule.allow, "allow")) { lines.push(`Allow: ${value}`); }
for (const value of normalizePathDirective(rule.disallow, "disallow")) { lines.push(`Disallow: ${value}`); }
for (const value of normalizePathDirective(rule.noindex, "noindex")) { lines.push(`Noindex: ${value}`); }
for (const value of normalizeCleanParam(rule.cleanParam)) { lines.push(`Clean-param: ${value}`); }
const crawlDelay = normalizeCrawlDelay(rule.crawlDelay); if (crawlDelay !== undefined) { lines.push(`Crawl-delay: ${crawlDelay}`); }
return lines.join("\n"); });
const sitemaps = toArray(options.sitemap).map((entry) => normalizeSitemapValue(entry, normalizedSite), );
const outputLines: string[] = []; for (const block of blocks) { if (outputLines.length > 0) { outputLines.push(""); }
outputLines.push(block); }
if (sitemaps.length > 0) { outputLines.push(""); for (const sitemap of sitemaps) { outputLines.push(`Sitemap: ${sitemap}`); } }
return `${outputLines.join("\n")}\n`;};
export async function generateRobotsTxt({ config, dir, options, logger,}: IntegrationRuntimeContext): Promise<void> { const input = options.robotsTxt; const outputPath = join(fileURLToPath(dir), "robots.txt"); const outputExists = await exists(outputPath);
if (input === false) { if (outputExists) { logger.info( `No "${ROBOTS_TXT_RELATIVE_PATH}" file was generated nor modified because it already exists.`, ); } else { logger.info( `No "${ROBOTS_TXT_RELATIVE_PATH}" file exists and no file was generated.`, ); }
return; }
if (input === undefined) { logger.warn( `No robots.txt file was generated because robotsTxt is undefined. ${ROBOTS_TXT_RECOMMENDATION}`, ); return; }
if (typeof input !== "object" || input === null) { logger.error( "Invalid robotsTxt configuration: expected an object with a rules field.", ); throw new Error("Invalid robotsTxt configuration: expected an object."); }
if (outputExists) { logger.warn( `Could not generate "${ROBOTS_TXT_RELATIVE_PATH}" because it already exists. Disabling robotsTxt generation for this build.`, ); options.robotsTxt = false; return; }
try { const content = buildRobotsTxt(input, config.site); await mkdir(dirname(outputPath), { recursive: true }); await writeFile(outputPath, content, "utf-8"); logger.info(`Generated "${ROBOTS_TXT_RELATIVE_PATH}"`); } catch (error) { const message = error instanceof Error ? error.message : String(error); logger.error( `Failed to generate "${ROBOTS_TXT_RELATIVE_PATH}": ${message}`, ); throw error; }}