Fixing a mysterious .ebextensions command time out (AWS Elastic Beanstalk)
.ebextensions/to customize the environment. I have been just trying to get Gor running on our leader production instance to replay some traffic to our staging environment so that we get a much richer feedback from it. However the
container_commandI used caused the instance to time out and trash the environment, against all reason. The documentation doesn't help and troubleshooting this is hard due to lack of feedback and time-consuming. Luckily I have arrived to a solution.
This is the working solution:
|# Script to start Gor in the background|
|# Beware: We need to intercept port 8080 b/c 80 is redirected there via iptables|
|pidof /opt/gor || nohup /opt/gor --input-raw :8080 --output-http 'https://our-staging-server|1' >/dev/null 2>&1 </dev/null &|
|# Only container_commands can access env variables configured in EB (ENV)|
|# Start gor, limit it to copy max 1 req / sec|
|"Start Gor on Prod leader":|
|command: test "$ENV" = "production" && /opt/gor-in-background >/dev/null 2>&1 # ! w/o the redirect it will time-out|
|ignoreErrors: true # returns an "error" when not in Prod|
(The S3 private bucket access is a story of its own, requiring and addition of AWS::CloudFormation::Authentication and a change of the bucket policy.)
The key points are starting
gorin the background with
&and, the magic ingredient that took me so long to figure out, the redirection of the command's output to
/dev/null(the redirection inside the script likely doesn't need to be there with respect to this problem; I have it because I don't want any output to accumulate on the disk).
I do not know if I need to redirect both stdin and stderr of
/opt/gor-in-backgroundand why I need to do it but without it I got the infamous
[time N+1] INFO Command execution completed on all instances. Summary: [Successful: 1, TimedOut: 1].
[time N] WARN The following instances have not responded in the allowed command timeout time (they might still finish eventually on their own): [i-1e35c2b3].
and the instance continued to time out even when I tried to re-deploy a working version and never managed to deliver logs.
Troubleshooting tip: Clone the target env a few times and use those to test changes multiple times and multiple changes in parallel to speed up the process.
If you (container) command leads to a time out, try to redirect its stdout and/or stderr to
Thanks to João Abrantes for the redirection idea!