Jul 11, 2015

Weblogic + Chef + Automation in General - thoughts and reflections

This is a brain dump written on an airplane in a rather sleep deprived state. Since alcohol is not free on domestic flights, I opted out for coffee and pounded away on the keyboard for a few minutes before legroom and awkward sleeping positions stopped being an issue.



To start, I will say that I am not an expert on WebLogic, which means I am not burdened by years of learning and prefect the art of managing it. I have come to learn it on Monday, and today is my first Friday after having being exposed to this technology.

So far I hear that my approach to managing WebLogic would work perfectly with Jboss, JVM, or Tomcat… but would absolutely not work here.

What I do know so far is that WL has a central API that has ability to manage the entire cluster of boxes. It also has the ablity to act as a load balaner, as well as the source of information and a central registry.

That last point is very powerful, and from what I have seen so far the most underutilized aspect of WL. Everyone is interested in the centralized functionality completely ignoring the ability to decentralize it.

Lets break it down.

The common approach that I have seen "sold" so far, is to run all commands from the Admin (central) server. The central server will take care of all distribution of packages, starting and stopping of the cluster and all other deployment related functionality. Great. But just how useful is that in actuality?

WL allows you to have a Domain which is distributed across multiple physical machines. Domain can have multiple clusters distributed across multiple physical machines. Each cluster can have multiple applications installed. Which means, a physical machine can have a whole bunch of MS (JVM) running on it, each belonging to a different domain and a different app.

That's a lot of moving pieces. So when we try to automate a system like that, we will never talk about ONLY WL, we will also talk about patches, modifying property files, modifying port numbers on the host, auto scaling up and down, and a holst of other admin functions which have nothing to do with WL, but which have to account for the fact that WL is distributed across.

After a week of disecting WL, I presently belive that the admin server is a fantastic service discovery tool for WL management.

With regarding to managing WL with Chef:
Chef-client runs on each physical server.
Physical servers are groups by environemnts - prod / dev / test
Each servers run_list includes applications which are running on that server or in that environment.
The recipe for that application pulls information for the current environment from some construct, like attribute or databag or environment where the node resides.
That info includes :
the admin server for a particular app that recipe is responsible for.
Application version number

The recipe has a set of actions (LWRP) - deploy, undeploy, start, stop, etc..
Each chef-client run executes independently on each of the physical server
The Runlist is either a list of applications on that server, or an LWRP with a list of admin servers and application data bags
Each LWRP or Recipe hits the API and finds out what MS are running locally on the host where recipe is being executed.
Each recipe executes an LWRP which hits the API and performs the needed stop commands (stop, query, etc..)
Chef client at the machine level Pulls down the EAR file locally and tells the API location of the EAR. This ensures that EAR is physically located on the host and is accessible to the MS
It then starts the MS via the API
It then does whatever local changes need to be performed on the MS - server level config - thereby ensuring that all of the changes are in fact done, idempotent, and consistent.

If the API / MS allows multi version support, this style of deployments would allow zero-down-time deploys.
If the WebLogic API is not built to support a lot of API calls, perhaps there is a way to optimize the MS or load balance multiple Admin servers - however, if having lets say 40 physical servers made 20 api calls over a course of a 15 minute deployment is too much for the Admin server to handle, perhaps it's a good time to look at automating WebLogic out of the company