robots.txt

The robots.txt feature generates a /robots.txt file in your build output during astro build. It is configured through the robotsTxt option on the integration and does not affect the <meta name="robots"> tag, which is handled by the Robots component.

If a robots.txt file already exists in the build output, generation is skipped and a warning is logged. If robotsTxt is omitted from your integration config, a warning is logged recommending you add it.

Options

`RobotsTxtOptions`

option	type	default	required	description
`rules`	`RobotsTxtRule \| RobotsTxtRule[]`	-	Yes	One or more crawl rules defining agent access.
`sitemap`	`string \| string[]`	-	No	Absolute http(s) URL(s) or site-relative path(s) to include as Sitemap entries. Relative paths require `site` to be set in your Astro config.

`RobotsTxtRule`

option	type	default	required	description
`agent`	`string \| string[]`	-	Yes	One or more user-agent names. Must be non-empty.
`allow`	`string \| string[]`	-	No	Path(s) to allow. Each must start with `/`.
`disallow`	`string \| string[]`	-	No	Path(s) to disallow. Each must start with `/`.
`noindex`	`string \| string[]`	-	No	Path(s) for the Noindex directive. Each must start with `/`.
`cleanParam`	`string \| string[]`	-	No	Query parameter(s) for the Clean-param directive.
`crawlDelay`	`number`	-	No	Non-negative crawl delay in seconds.

Examples

Basic

Input:

eminence({
  robotsTxt: {
    rules: {
      agent: "*",
      disallow: "/private/",
    },
  },
});

Output:

User-agent: *
Disallow: /private/

With multiple rules

Input:

eminence({
  robotsTxt: {
    rules: [
      { agent: "*", disallow: "/" },
      { agent: "Googlebot", allow: "/" },
    ],
  },
});

Output:

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

With sitemap reference

Input:

eminence({
  robotsTxt: {
    rules: { agent: "*" },
    sitemap: "/sitemap-index.xml",
  },
});

Output at /robots.txt** (assuming site is https://example.com):

User-agent: *

Sitemap: https://example.com/sitemap-index.xml

Explicit opt-out

Input:

eminence({
  robotsTxt: false,
});

Output: No file is generated and no warning is logged.

Complete

All options provided explicitly. Assumes site is https://example.com in your Astro config.

Input:

eminence({
  robotsTxt: {
    rules: [
      {
        agent: "*",
        allow: "/",
        disallow: "/private/",
        noindex: "/drafts/",
        cleanParam: "utm_source /",
        crawlDelay: 2,
      },
      {
        agent: ["Googlebot", "Bingbot"],
        disallow: ["/tmp/", "/cache/"],
      },
    ],
    sitemap: ["/sitemap.xml", "https://cdn.example.com/sitemap-news.xml"],
  },
});

Output at /robots.txt:

User-agent: *
Allow: /
Disallow: /private/
Noindex: /drafts/
Clean-param: utm_source /
Crawl-delay: 2

User-agent: Googlebot
User-agent: Bingbot
Disallow: /tmp/
Disallow: /cache/

Sitemap: https://example.com/sitemap.xml
Sitemap: https://cdn.example.com/sitemap-news.xml

Decisions Made

Existing file is never overwritten

If a robots.txt already exists in the build output when astro build runs, the integration skips generation and logs a warning. This avoids silently clobbering a manually maintained file.

Relative sitemap values require `site` to be configured

Relative sitemap paths are resolved against Astro.config.site. If site is not set, the integration throws an error rather than producing a broken absolute URL.

All path directives must start with `/`

allow, disallow, and noindex values are validated to start with /. This mirrors the robots.txt specification requirement and surfaces misconfiguration at build time rather than at crawl time.

Setting `false` is the explicit opt-out

Passing robotsTxt: false disables the feature cleanly. Omitting robotsTxt logs a recommendation warning so new projects are nudged to make an intentional choice.

Source code

import { constants } from "node:fs";
import { access, mkdir, writeFile } from "node:fs/promises";
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { IntegrationRuntimeContext } from "..";

export type CrawlerAgent = string;

export type RobotsTxtRule = {
  agent: CrawlerAgent | CrawlerAgent[];
  allow?: string | string[];
  disallow?: string | string[];
  noindex?: string | string[];
  cleanParam?: string | string[];
  crawlDelay?: number;
};

export type RobotsTxtOptions = {
  rules: RobotsTxtRule | RobotsTxtRule[];
  sitemap?: string | string[];
};

export const ROBOTS_TXT_RECOMMENDATION =
  "Recommendation: follow eminence-astro-suite.xeffen25.com/recommendations/why-you-should-add-a-robots-txt to learn why adding a robots.txt is important.";

export const ROBOTS_TXT_RELATIVE_PATH = "/robots.txt";

const toArray = <T>(value: T | T[] | undefined): T[] => {
  if (value === undefined) {
    return [];
  }

  return Array.isArray(value) ? value : [value];
};

const exists = async (path: string): Promise<boolean> => {
  try {
    await access(path, constants.F_OK);
    return true;
  } catch (error) {
    if (
      error &&
      typeof error === "object" &&
      "code" in error &&
      error.code === "ENOENT"
    ) {
      return false;
    }

    throw error;
  }
};

const assertNonEmptyString = (value: string, fieldName: string): string => {
  if (value.trim().length === 0) {
    throw new Error(`Invalid ${fieldName} value: expected a non-empty string.`);
  }

  return value;
};

const normalizeAgentList = (value: RobotsTxtRule["agent"]): string[] => {
  const agents = toArray(value).map((agent) =>
    assertNonEmptyString(agent, "agent"),
  );

  if (agents.length === 0) {
    throw new Error("Invalid robotsTxt rule: at least one agent is required.");
  }

  return agents;
};

const normalizePathDirective = (
  value:
    | RobotsTxtRule["allow"]
    | RobotsTxtRule["disallow"]
    | RobotsTxtRule["noindex"],
  fieldName: "allow" | "disallow" | "noindex",
): string[] => {
  return toArray(value).map((entry) => {
    const normalized = assertNonEmptyString(entry, fieldName);

    if (!normalized.startsWith("/")) {
      throw new Error(
        `Invalid ${fieldName} value "${entry}": expected a path starting with "/".`,
      );
    }

    return normalized;
  });
};

const normalizeCleanParam = (value: RobotsTxtRule["cleanParam"]): string[] => {
  return toArray(value).map((entry) =>
    assertNonEmptyString(entry, "cleanParam"),
  );
};

const normalizeCrawlDelay = (value: number | undefined): number | undefined => {
  if (value === undefined) {
    return undefined;
  }

  if (!Number.isFinite(value) || value < 0) {
    throw new Error(
      `Invalid crawlDelay value "${value}": expected a non-negative number.`,
    );
  }

  return value;
};

const normalizeSite = (
  site: IntegrationRuntimeContext["config"]["site"],
): URL | undefined => {
  if (!site) {
    return undefined;
  }

  return new URL(site);
};

const normalizeSitemapValue = (
  value: string,
  site: URL | undefined,
): string => {
  const entry = assertNonEmptyString(value, "sitemap");

  let parsed: URL | null = null;
  try {
    parsed = new URL(entry);
  } catch {
    // not an absolute URL; treat as a site-relative path
  }

  if (parsed !== null) {
    if (parsed.protocol !== "http:" && parsed.protocol !== "https:") {
      throw new Error(
        `Invalid sitemap value "${entry}": expected an http(s) URL or relative path.`,
      );
    }

    return parsed.href;
  }

  if (!site) {
    throw new Error(
      `Invalid sitemap value "${entry}": relative sitemap values require Astro site to be configured.`,
    );
  }

  return new URL(entry, site).href;
};

const buildRobotsTxt = (
  options: RobotsTxtOptions,
  site: IntegrationRuntimeContext["config"]["site"],
): string => {
  const normalizedSite = normalizeSite(site);
  const rules = toArray(options.rules);
  if (rules.length === 0) {
    throw new Error(
      "Invalid robotsTxt configuration: expected at least one rule.",
    );
  }

  const blocks = rules.map((rule) => {
    const lines: string[] = [];
    for (const agent of normalizeAgentList(rule.agent)) {
      lines.push(`User-agent: ${agent}`);
    }

    for (const value of normalizePathDirective(rule.allow, "allow")) {
      lines.push(`Allow: ${value}`);
    }

    for (const value of normalizePathDirective(rule.disallow, "disallow")) {
      lines.push(`Disallow: ${value}`);
    }

    for (const value of normalizePathDirective(rule.noindex, "noindex")) {
      lines.push(`Noindex: ${value}`);
    }

    for (const value of normalizeCleanParam(rule.cleanParam)) {
      lines.push(`Clean-param: ${value}`);
    }

    const crawlDelay = normalizeCrawlDelay(rule.crawlDelay);
    if (crawlDelay !== undefined) {
      lines.push(`Crawl-delay: ${crawlDelay}`);
    }

    return lines.join("\n");
  });

  const sitemaps = toArray(options.sitemap).map((entry) =>
    normalizeSitemapValue(entry, normalizedSite),
  );

  const outputLines: string[] = [];
  for (const block of blocks) {
    if (outputLines.length > 0) {
      outputLines.push("");
    }

    outputLines.push(block);
  }

  if (sitemaps.length > 0) {
    outputLines.push("");
    for (const sitemap of sitemaps) {
      outputLines.push(`Sitemap: ${sitemap}`);
    }
  }

  return `${outputLines.join("\n")}\n`;
};

export async function generateRobotsTxt({
  config,
  dir,
  options,
  logger,
}: IntegrationRuntimeContext): Promise<void> {
  const input = options.robotsTxt;
  const outputPath = join(fileURLToPath(dir), "robots.txt");
  const outputExists = await exists(outputPath);

  if (input === false) {
    if (outputExists) {
      logger.info(
        `No "${ROBOTS_TXT_RELATIVE_PATH}" file was generated nor modified because it already exists.`,
      );
    } else {
      logger.info(
        `No "${ROBOTS_TXT_RELATIVE_PATH}" file exists and no file was generated.`,
      );
    }

    return;
  }

  if (input === undefined) {
    logger.warn(
      `No robots.txt file was generated because robotsTxt is undefined. ${ROBOTS_TXT_RECOMMENDATION}`,
    );
    return;
  }

  if (typeof input !== "object" || input === null) {
    logger.error(
      "Invalid robotsTxt configuration: expected an object with a rules field.",
    );
    throw new Error("Invalid robotsTxt configuration: expected an object.");
  }

  if (outputExists) {
    logger.warn(
      `Could not generate "${ROBOTS_TXT_RELATIVE_PATH}" because it already exists. Disabling robotsTxt generation for this build.`,
    );
    options.robotsTxt = false;
    return;
  }

  try {
    const content = buildRobotsTxt(input, config.site);
    await mkdir(dirname(outputPath), { recursive: true });
    await writeFile(outputPath, content, "utf-8");
    logger.info(`Generated "${ROBOTS_TXT_RELATIVE_PATH}"`);
  } catch (error) {
    const message = error instanceof Error ? error.message : String(error);
    logger.error(
      `Failed to generate "${ROBOTS_TXT_RELATIVE_PATH}": ${message}`,
    );
    throw error;
  }
}

robots.txt

Options

RobotsTxtOptions

RobotsTxtRule

Examples

Basic

With multiple rules

With sitemap reference

Explicit opt-out

Complete

Decisions Made

Existing file is never overwritten

Relative sitemap values require site to be configured

All path directives must start with /

Setting false is the explicit opt-out

Source code

Reference

`RobotsTxtOptions`

`RobotsTxtRule`

Relative sitemap values require `site` to be configured

All path directives must start with `/`

Setting `false` is the explicit opt-out