Fixing a mysterious .ebextensions command time out (AWS Elastic Beanstalk)

July 29, 2015

Our webshop, nettbutikk.netcom.no, runs on AWS Elastic Beanstalk and we use .ebextensions/ to customize the environment. I have been just trying to get Gor running on our leader production instance to replay some traffic to our staging environment so that we get a much richer feedback from it. However the container_command I used caused the instance to time out and trash the environment, against all reason. The documentation doesn't help and troubleshooting this is hard due to lack of feedback and time-consuming. Luckily I have arrived to a solution.

This is the working solution:

files:
  /opt/gor:
    source: "https://s3-eu-west-1.amazonaws.com/elasticbeanstalk-eu-west-1-<our-id>/our_fileserver/gor"
    authentication: S3Access
    mode: "000755"
    owner: root
    group: root
  # Script to start Gor in the background
  # Beware: We need to intercept port 8080 b/c 80 is redirected there via iptables
  /opt/gor-in-background:
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash
      pidof /opt/gor || nohup /opt/gor --input-raw :8080 --output-http 'https://our-staging-server|1' >/dev/null 2>&1 </dev/null &
# Only container_commands can access env variables configured in EB (ENV)
# Start gor, limit it to copy max 1 req / sec
container_commands:
  "Start Gor on Prod leader":
    command: test "$ENV" = "production" && /opt/gor-in-background >/dev/null 2>&1 # ! w/o the redirect it will time-out
    leader_only: true
    ignoreErrors: true # returns an "error" when not in Prod



  

view raw 11gor.config.yaml hosted with ❤ by GitHub

(The S3 private bucket access is a story of its own, requiring and addition of AWS::CloudFormation::Authentication and a change of the bucket policy.)

The key points are starting gor in the background with & and, the magic ingredient that took me so long to figure out, the redirection of the command's output to /dev/null (the redirection inside the script likely doesn't need to be there with respect to this problem; I have it because I don't want any output to accumulate on the disk).

I do not know if I need to redirect both stdin and stderr of /opt/gor-in-background and why I need to do it but without it I got the infamous

[time N+1] INFO Command execution completed on all instances. Summary: [Successful: 1, TimedOut: 1].
[time N] WARN The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own): [i-1e35c2b3].

and the instance continued to time out even when I tried to re-deploy a working version and never managed to deliver logs.

Troubleshooting tip: Clone the target env a few times and use those to test changes multiple times and multiple changes in parallel to speed up the process.

Summary

If you (container) command leads to a time out, try to redirect its stdout and/or stderr to /dev/null.

Thanks to João Abrantes for the redirection idea!

Tags: DevOps

	files:

	/opt/gor:
	source: "https://s3-eu-west-1.amazonaws.com/elasticbeanstalk-eu-west-1-<our-id>/our_fileserver/gor"
	authentication: S3Access
	mode: "000755"
	owner: root
	group: root

	# Script to start Gor in the background
	# Beware: We need to intercept port 8080 b/c 80 is redirected there via iptables
	/opt/gor-in-background:
	mode: "000755"
	owner: root
	group: root
	content: \|
	#!/usr/bin/env bash
	pidof /opt/gor \|\| nohup /opt/gor --input-raw :8080 --output-http 'https://our-staging-server\|1' >/dev/null 2>&1 </dev/null &

	# Only container_commands can access env variables configured in EB (ENV)
	# Start gor, limit it to copy max 1 req / sec
	container_commands:
	"Start Gor on Prod leader":
	command: test "$ENV" = "production" && /opt/gor-in-background >/dev/null 2>&1 # ! w/o the redirect it will time-out
	leader_only: true
	ignoreErrors: true # returns an "error" when not in Prod