How to override output of Robots.txt module in Drupal 8 for test/UAT site 2

How to override output of Robots.txt module in Drupal 8 for test/UAT site

As Drupal developer, I work extensively with a variety of websites and customers. Most of such websites are work in progress, meaning I need to show completed bits and pieces of my work to the customer.

Using Dev and Stage environments is the common practice to achieve this. For example, every Acquia Cloud account comes with Dev and Stage environment by default.

I wanted to hide websites in Acquia Dev and Stage environments from the search engines. The typical approach is to use the Shield module, as described in How to protect UAT or Dev environment from indexing by search engines in Drupal article. However, in some cases, a customer can’t use the Drupal Shield module and I have to rely purely on robots.txt to tell the search engines not to index those non-production websites.

The problem is that it is too hard to maintain different versions of robots.txt file. I needed a reliable way to automate it, ideally without touching the robots.txt file or module.Luckily, there is a Drupal 7 and Drupal 8 module Robots.txt. The Robots.txt module moves the content of

Luckily, there is a Drupal 7 and Drupal 8 module Robots.txt. The Robots.txt module moves the content of robots.txt file into the database and allows it to be managed via Drupal admin.

For Drupal 8 development I use configuration export and import extensively. The configuration management allows applying any database configuration changes that come as the result of an ongoing development work. Configuration export in Drupal 8 allows to easily synchronise database changes with your development team.

This advantage of Drupal 8, however, made it hard to customise the configuration of Robots.txt module per environment, since the value of the robots.txt file is updated via the configuration (it is stored in a YAML file).

Luckily, Acquia Cloud supports drush and drush 8 comes with config-set command.

The problem is to apply the multi-line string via drush. I did some researching and a variety of pages such as

Running this drush command

drush config-set robotstxt.settings content "User-agent: *\nDisallow: /" -y

did import the string with \n character, instead of a new line.

I did some researching and a variety of pages such as How to set complex string variables with Drush vset, From $conf to $config and $settings in Drupal 8 and finally In YAML, how do I break a string over multiple lines?, but nothing worked. Apparently drush imports “\n” character “as is”, escaping it to preserve the string value.

The solution was simple, though:

Using Acquia Cloud hooks, I created a hook file under `/hooks/dev/post-code-deploy/00_drush.sh` with the following content:

#!/bin/sh
#
# Cloud Hook: enable-update-module
#

# Map the script inputs to convenient names.
site=$1
target_env=$2
drush_alias=$site'.'$target_env

# Execute a standard drush command.
drush @$drush_alias cim vcs -y
drush @$drush_alias config-set robotstxt.settings content "User-agent: *
Disallow: /" -y

The important points are:

  1. Import configuration before altering robots.txt content
  2. Preserve the new line for `Disallow: /`, which will get imported “as is”, do not use or rely on “\n” character, which will get imported “as is”.

Copy this file for the other hooks available in Acquia Cloud, such as

  • post-code-update
  • post-db-copy

and make them in all environments as you require – ideally all but production.

After deploying new code or copying the database from the production environment, the robots.txt input will always disallow indexing for non-production sites.

Posted in Web Development and tagged , .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.