[cia] Towards better container images in ONAP
My colleague Leif Madsen and I have done some research and I'd like to present our conclusions as an opening to discussion. If there's interest in this, we are happy to also do a proof-of-concept to show how this would look in practice.
STATEMENT OF PROBLEM
We all know that ONAP lacks uniformity in images, leading to a broad mix of operating systems, packages, versions, and ultimately an unwieldy repository requiring many gigabytes of storage and network data transfer to stand up. But actually the problem is more serious.
ONAP-provided official images can only ever be considered as reference images. No IT department of any company would approve of deploying complete ready-to-run images downloaded off the Internet, even from a trusted Linux Foundation hosted repository. Remember that these images contain not only our ONAP code, but also many other operating system and additional packages. This includes high-level components such as MariaDB/MySQL, language runtimes such as OpenJDK and Python, and low-level system components such as glib and OpenSSL. Companies have their own security and certification processes in place for vetting/patching these packages, and indeed licensing and support contracts for many of them. It's thus important that ONAP provide not only useful reference images, but also a way for companies' IT departments to build their own images in their own integration/testing/staging environment using their own vendored operating systems and packages.
A secondary problem is that we are not currently using open standards. While Docker build configuration and images are a "de facto" standard in the container world, they are problematic outside of Docker products. We should be using the OCI format, which is also endorsed by Docker Inc and natively supported by Docker software. This will allow users to not only build images themselves, but also allow them to choose toolchains other than Docker.
One suggestion is that we move all the image building to a single, centralized ONAP repository. This would allow users to manage images in one place. Unfortunately, this seems unrealistic. It would greatly slow down development work in individual projects and likely lead to painful bugs due to version mismatches between repos. Also, it's not a completely satisfying solution, because users would still have to go over dozens of complex image descriptors, and would have to repeat this process for every release of ONAP. Users who want to track our master repos will find this constant patching to be impossibly cumbersome.
A better solution would be a combination of technology and policy. The idea is that we require all ONAP projects to derive their images from a set of conventionally named base images. This decoupling would allow users to replace these bases images with images of their choosing. But, in order for this to work as intended, we must also require that ONAP projects not be allowed to install additional packages over this base image. Adding packages would not only break the decoupling, but would also not be portable across base operating systems.
For an example, let's look at a Dockerfile in the SO project. It begins with this line stating the base image:
Already two important choices are made here: basing it off the latest publicly available version of Alpine Linux, and also using a specific build and distribution (and latest version) of OpenJDK. Both are limiting and problematic choices. Instead we can do something like this:
So, what we are saying here is that the basic "onap/java8" image should contain the operating system and packages to support Java 8 applications. For the ONAP reference images, this could be Alpine Linux if we so choose. It could include a specific supported distribution of OpenJDK (for example, from Red Hat), or Oracle JDK, etc. Users can spin their own base images to provide these requirements. (We would have similar images for Python 3, Python 2, MariaDB, etc.)
Looking further in the Dockerfile we see this line:
RUN apk --no-cache add curl sudo bash
Unfortunately, we cannot allow for this. Not only is "apk" an Alpine-specific command that would not work on other operating systems, but also there are assumptions being made about 1) what the operating system has or doesn't have, and 2) which versions to download and install (the latest). So, we must disallow this and also make sure that our base images already have the required packages.
HOW TO GET THERE
We'll go over all ONAP images and come up with a list of required base images. This will require some work to identify commonalities to ensure a small set of base images. In some cases, I imagine we will come up with suggestions for improvements to individual project containerization architecture (e.g., separating MariaDB to its own pod instead of including it in an application container).
Then we build these base images. We can pick CoreOS, Alpine, Intel Clear Linux, Ubuntu Cloud -- doesn't really matter because users can easily swap. To build these base images I strongly recommend we use the Buildah tool. Buildah is a very straightforward builder for OCI images: that's its only job and it does it well. Its advantages are not only standards compliance, but also practicality and usability. For example, if you prefer to use Dockerfiles, it does support them. But it also provides a convenient manual way to build base images from "scratch" by assembling the parts you need. You can mount directories, copy files, do everything you need to do to prepare the image and then build it when you are satisfied. Such basic images are tiny and have no layered history behind them: they contain only and exactly the components you want.
Then, one at a time we'll update all Dockerfiles across ONAP to derive from these base images, and also remove package installations from them. Again I would recommend that we switch to using Buildah to building these images in our CI workflow, so that we have an OCI output instead of Docker images. These would continue working with Kubernetes and Docker and anything else as usual. Since Buildah supports Dockerfiles, we can continue using those, so it should be a drop-in replacement.
This is merely a suggestion and an initial stab at solving the problem. Please provide feedback and hopefully we can move towards a solution that requires minimal effort and costs.