Code of the Week 3 Edit

Published on Jan 14, 2019 by bastelfreak.

This is a new blog series that we would like to introduce. At Vox Pupuli, we receive many Pull Requests. We receive so much awesome code, that isn’t really appreciated. In the past, some of our members sent out tweets if they saw cool code. We now want to start a blog series with regular updates to appreciate the work of our collaborators and contributors.

Puppet supports datatypes since Puppet 4 or since 3.8.7 with the future parser. Back in the days, everything was a string. Or at least it could be a string. We had to use so many functions to validate the input of variables. Since the introduction of datatypes, we can enforce the correct datatype directly when the user provides data, without calling multiple functions. The datatypes are especially useful for Collection datatypes because it’s now possible to very strict types for each element in a collection. The migration to datatypes was a bit painful in some modules. But the result was nice. Not only were the actual code (LOC) and tests shorter, datatypes also allow us to validate way more in detail than the old validate functions. This reduced the amount of possible errors that could happen because of unexpected input. We actually discovered multiple bugs during the transition to datatypes and also many codepieces that were suboptimal. For example a lot of casting from 'true' (a string, valid input data in some modules) to true (as an actual Boolean datatype). Based on the experiences we made, we created review guidelines and documented best practices.

Some of these guidelines are:

Are new parameters introduced? They must have datatypes

Are datatypes from stdlib used? Ensure that lowest supported stdlib version is 4.13.1. Check if a newer version introduced the used datatype

Are there new params with datatype Hash or Array? If possible, they should default to empty Hash/Array instead of undef. You can also enforce the datastructure like Array[String]

Are there new params with datatype Boolean? The default value is a tricky decision which needs careful reviewing. Sometimes a True/False is the better approach, sometimes undef

Does a new param map to an option in a config file of a service? The Parameter should accept the possible values that the service allows. For example 'on' and 'off'. Don’t accept a boolean that will be converted to 'on' or 'off'

If you can supply one or multiple values for an attribute it’s common practice to enforce the datatype for one value and an array of that datatype. An example for string is Variant[String[1],Array[String[1]]]. This can be used in the Puppet code as [$var].flatten()

The summary: Datatypes are useful and important. Always using them for each exposed parameter will help you catch bugs earlier, and the more detailed your datatype is the better it will be at catching bugs.

After a very long introduction, let us come to the actual contribution. In calendar week three we have a contribution from David Hollinger. He invested a huge amount of time over a period of months into our rsyslog module. There are multiple great PRs that we could talk about here. Let us have a look at the latest.

You can create custom datatypes that are very complex/very strict. They are based on multiple scalar types. One very detailed example is this (types/actions/outputs/omelasticsearch.pp):

 type Rsyslog::Actions::Outputs::Omelasticsearch = Struct[{
   server               => Optional[Variant[String[1], Array[String[1]]]],
   serverport           => Optional[Stdlib::Port],
   healthchecktimeout   => Optional[Integer],
   searchindex          => Optional[String[1]],
   dynsearchindex       => Optional[Enum['on', 'off']],
   searchtype           => Optional[String[1]],
   dynsearchtype        => Optional[Enum['on', 'off']],
   pipelinename         => Optional[String[1]],
   dynpipelinename      => Optional[Enum['on', 'off']],
   usehttps             => Optional[Enum['on', 'off']],
   timeout              => Optional[Pattern[/^([0-9]+)(ms|s|m)$/]],
   template             => Optional[String[1]],
   bulkmode             => Optional[Enum['on', 'off']],
   maxbytes             => Optional[Pattern[/^([0-9]+)[kKmMgGtT]$/]],
   parent               => Optional[String[1]],
   dynparent            => Optional[Enum['on', 'off']],
   uid                  => Optional[String[1]],
   pwd                  => Optional[String[1]],
   errorfile            => Optional[Stdlib::Absolutepath],
   'tls.cacert'         => Optional[Stdlib::Absolutepath],
   'tls.mycert'         => Optional[Stdlib::Absolutepath],
   'tls.myprivkey'      => Optional[Stdlib::Absolutepath],
   bulkid               => Optional[String[1]],
   dynbulkid            => Optional[Enum['on', 'off']],
   writeoperation       => Optional[Enum['index', 'create']],
   retryfailures        => Optional[Enum['on', 'off']],
   retryruleset         => Optional[String[1]],
   'ratelimit.interval' => Optional[Integer],
   'ratelimit.burst'    => Optional[Integer],
 }]

The rsyslog configuration has so many options, many of them can be nested. In the beginning of datatypes in Puppet, we would have used Hash for the above requirement. At this level the type is already better than the old validate_hash() function and everybody would be happy. But not David! He did not go one, but thousand steps further and enforces the complete hash structure. And didn’t simply create the custom datatype Rsyslog::Actions::Outputs::Omelasticsearch, he created many more and also tests for them. If we do a quick find types/ -type f -exec cat {} \; | wc -l we notice that there are 390 lines of datatype codes. This is amazing! Also find spec/type_aliases/ -type f -exec cat {} \; | wc -l reveals 2009 lines of tests for datatypes!

Thank you so much David! This work is amazing and the quality of this module is way higher than our average.

Did you spot some cool code as well? Write a short blogpost or write an email to our Project Management committee or just drop a note on our IRC channel #voxpupuli on Libera.