Installer or command that hangs? Use /dev/urandom instead of /dev/random, but constrained to a particular process

Okay, so I'm working on making an Ansible role for deploying an Oracle 10g Webgate, and I want it working on RHEL 6 and RHEL 7. I managed to do that (yay; took a bit of persuading), but quickly noticed that if you don't do something to prevent it, the installer (InstallShield on Linux... yuck, and not just because it wraps Java 6), your entropy pool drains very very quickly and the installer just sits there, hanging.

You can verify that its blocking because of entropy by using a command such as:

watch cat /proc/sys/kernel/random/entropy_avail

and if it isn't hovering between 2000 and 3000, then you have a potential issue; if its staying under 1024 or so, then you very likely will be experiencing hanging behaviour, depending on which application is draining the pool (commonly by trying to read a bunch of data from /dev/random)

I should note that this is in a VMWare environment, so no hardware random number generation for me. Instead, what I typically do is to push out a sysctl change to change sys.kernel.random.read_wakeup_threshold from the default of 64 to a more useful 1024, which generally makes things (especially things like WebLogic startup times) much more responsive.

Additionally, for instances where I'm dealing with Oracle Java < 8, I edit java.security (tip. locate java.security) and set the following:

securerandom.source=file:/dev/./urandom

These days, this gets done as part of an Ansible role I have for deploying Oracle Java.

But, in the case of this particular installer, that wasn't really accessible, and although has options for doing a silent install (useful, if insufficiently documented, at least by Oracle), it doesn't have a way of passing in JVM arguments. (ie. -Dsecurerandom.source=...)

In my dev environment (a Vagrant VM), I just deleted /dev/random and recreated it with the same major&minor numbers as /dev/urandom, but I didn't want to do that in my real environments, partly to reduce the potential for surprise later on.

But we can do something much more similar, and more constrained (I like constrained effects), using part of the Linux namespaces system to use an independent mount table -- part of what makes Docker work. It does require root privilege however (or at least the power to modify the mount table).

To demonstrate, I wrote this as a one-liner, here reformatted for a bit of enhanced readability. You can copy-paste this into a root shell to demonstrate. (naturally, read it before pasting anything into a root shell)

echo "Before unsharing"; \
  ls -l /dev/*random ; \
  unshare --mount bash -c '
    mount -o bind /dev/urandom /dev/random;
    echo "After unsharing and bind-mounting random device";
    ls -l /dev/*random;
    echo "... finickity installer goes here"
  '; \
  echo "Back in the real world"; \
  ls -l /dev/*random

Here's the output, with some highlighting to make it easier to see the difference.

Before unsharing

crw-rw-rw- 1 root root 1, 8 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom
After unsharing and bind-mounting random device
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom
... finickity installer goes here
Back in the real world
crw-rw-rw- 1 root root 1, 8 Mar 16 23:59 /dev/random
crw-rw-rw- 1 root root 1, 9 Mar 16 23:59 /dev/urandom

Note that if the command you want to run should not run as root (quite likely), then you can change the line in the code above with a command that uses sudo, su, runuser, etc. possibly using a wrapper script. Or just use an interactive shell.

So, let's see how to apply this... this post isn't about Ansible, or about deploying a 10g Webgate using Ansible (perhaps I'll document that later), but it won't hurt to show the key part. In this particular case, because the webgate will be running in Apache httpd (from RHEL package), the installer needs to run as root; if this was for an OHS deployment, then I would have to wrap this command in a 'sudo' command, and probably also make sure requiretty was disabled (at least for transitioning to the target user).

Note that this part of the playbook is running as 'root' already.

Ansible task before integrating this improvement

- name: run the installer ... beware entropy starvation
  command: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}/{{ webgate_installers_and_patches[0] }} -options /root/webgate_installers/install_options.txt -silent -is:silent
  args:
    chdir: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}
    creates: "{{ webgate_install_location }}/oblix"

Ansible task after integrating this improvement

- name: run the installer ... beware entropy starvation
  command: unshare --mount bash -c 'mount -o bind /dev/urandom /dev/random; /root/webgate_installers/{{ webgate_installers_and_patches[0] }}/{{ webgate_installers_and_patches[0] }} -options /root/webgate_installers/install_options.txt -silent -is:silent'
  args:
    chdir: /root/webgate_installers/{{ webgate_installers_and_patches[0] }}
    creates: "{{ webgate_install_location }}/oblix"

Testing

Reverting my change to my Vagrant VM, making /dev/random be what it normally is... (you might reasonably wonder why I didn't just blow it away and restart it, but I had got some other Apache-related work in there), 

Watching entropy_avail while the playbook was running (in a mode that caused it to do the entire webgate install from fresh -- removing the previous install), it showed as stable and bouyant, and the install proceeded at a very healthy rate.

(Note: it would have been faster, but there is a certain, unnecessary, registration activity the install wants to run that takes a little while to time out).

Conclusion

Previously, because this particular installer is such a hog, while running this playbook, I would need to open up a new window onto the (thankfully only one) server being executed on, and run commands such as updatedb, rpm --verify --all and any other command I could think of to generate some disk activity -- that being the only source of entropy in that environment.

Now, instead of that, I have a playbook that runs in a fairly constant and minimal amount of time, rather than timing out and requiring manual intervention. Now that's the kind of speed-up I like to see.


Hope it helps,
Cameron

Comments

Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

Getting MySQL server to run with SSL

From DNS Packet Capture to analysis in Kibana