Useful PQL Queries Edit

Published on Jul 1, 2022.

What?

PQL is the Puppet Query Language. You can use it to query the PuppetDB for information.

Why?

The syntax isn’t always very intuitive. That’s why this page exists. It serves as a central place where the community can find and submit PQL queries!

How?

All queries here are documented in the string-based query style, not the AST style. You need the Puppet Client tools (for Open Source or PE) to get the puppet query face. To install the client tools on Open Source, you can do:

puppet resource package puppetdb_cli ensure=present provider=puppet_gem

And afterwards run the query with /opt/puppetlabs/puppet/bin/puppet-query $query. For PE installations you can use puppet query.

On the PuppetServer side, installing the puppetdb-termini package allows use of the puppetdb_query() function in a manifest to generate portions of catalog based on result of queries, e.g. find all Virtual-Hosts on all machines and add the relevant configuration to a monitoring system.

Puppet code for setup

Here is Puppet code for setting up puppetdb_cli on your puppetserver (not needed on PE):

# Provides /opt/puppetlabs/puppet/bin/puppet-query.
# https://www.puppet.com/docs/puppetdb/latest/pdb_client_tools.html#step-2-install-and-configure-the-puppetdb-cli
package { 'puppetdb_cli':
  ensure   => installed,
  provider => 'puppet_gem',
}

[
  'db',
  'query',
].each |String[1] $cmd| {
  file { "/opt/puppetlabs/bin/puppet-${cmd}":
    ensure => link,
    target => "../puppet/bin/puppet-${cmd}",
  }
}

file { '/etc/puppetlabs/client-tools':
  ensure => directory,
  owner  => 'root',
  group  => 'root',
  mode   => '0755',
}

file { '/etc/puppetlabs/client-tools/puppetdb.conf':
  owner   => 'root',
  group   => 'root',
  mode    => '0644',
  content => stdlib::to_json_pretty(
    {
      puppetdb => {
        server_urls => "https://${trusted['certname']}:8081",
        cacert      => $settings::localcacert,
        cert        => $settings::hostcert,
        key         => $settings::hostprivkey,
      },
    },
  ),
}

Contribute?!

You can submit your own Queries by editing voxpupuli.github.io/_docs/pql_queries.md on GitHub or by pressing the edit button in the upper right corner.

Query Collection

List all certificate names from all known nodes

puppet query 'nodes[certname] {}'

Result:

[
  {
    "certname": "puppetserver.example.org"
  },
  {
    "certname": "puppetdb.example.org"
  }
]

Get all nodes that have a specific class in their catalog

This is useful if you want to get a list of nodes with a specific profile or role

puppet query 'nodes[certname] {resources {type = "Class" and title = "CapitalizedClassname"}}'

Get a list of all roles / profiles

puppet query 'resources[title] {type = "Class" and title ~ "Role" group by title}'

Result:

[
  {
    "title": "Role::Base"
  },
  {
    "title": "Role::Webserver"
  },
  {
    "title": "Role::Mailserver"
  },
  {
    "title": "Role::Backupnode"
  }
]

Get all nodes that have a specific class in their catalog and start with bla-

puppet query 'nodes[certname] {certname ~ "^bla-" and resources {type = "Class" and title = "CapitalizedClassname"}}'

Get all nodes with changes and a specific resource

puppet query 'nodes[certname] {latest_report_status = "changed" and certname in inventory[certname]{resources { type = "Service" and title = "my_service"}}}'

Get all nodes where one specific resource changed

puppet query 'events[certname] {resource_type = "Service" and resource_title = "apache2" and latest_report? = true and corrective_change = true}'

Get a list of nodes with stale catalogs (date -R to see your time offset)

puppet query 'catalogs[certname,producer_timestamp] {  producer_timestamp < "2022-06-21T07:00:00.000-05:00" }'

Get a list of nodes with a catalog that has failed to compile but used a cached catalog during a specific time window

puppet query 'reports[certname,transaction_uuid,receive_time] { cached_catalog_status="on_failure" and start_time > "2021-10-27T15:36:00-05:00" and end_time < "2021-10-27T16:35:00-05:00"  }'

Get all nodes with corrective changes

It heavily improves the performance when we filter for events from the past hours. PuppetDB uses partitioning and saves reports on a per-day partition. By filtering for the current day we avoid scans on other partitions.

puppet query "events[certname]{corrective_change = true and report_receive_time > \"$(date -u --date='1 day ago' +%Y-%m-%dT%H:%M:%S.000Z)\" group by certname}"

Get all inactive nodes

puppet query 'nodes[certname] { node_state = "inactive" }'

Get all nodes where the last catalog compilation failed

puppet query 'nodes[certname] { latest_report_status = "failed" }'

Get a list of nodes with a specific fact value

puppet query 'inventory[certname] { facts.os.name = "windows" }'

Get a list of nodes with two specific facts

puppet query 'inventory[certname] {facts.os.name = "AlmaLinux" and facts.os.release.major = "8" }'
[
  {
    "certname": "puppet.local"
  }
]
puppet query 'inventory[certname,facts.virtual]{ resources { type="Class" and title ~ "CapitalizedClassname" }}'

Get all nodes and their Puppet Agent version except for version X

This is helpful if you did a Puppet Upgrade and want to identify all nodes that are on a different version than your primary

puppet query 'inventory[certname,facts.aio_agent_build]{facts.aio_agent_build != "7.28.0" }'

Get all resources from one type for one node

puppet query 'resources {type = "File" and certname = "puppet.local"}'

Get one param/attribute for one type of resource for one node

In this case, file is a resource property, it’s the absolute path to the pp file where the resource was declared

puppet query 'resources[file] {type = "File" and certname = "puppet.local"}'

Count all resources of one type

This checks all catalogs in the PuppetDB for this resource type and counts it

puppet query 'resources[count()] {type = "File" }'

Count all resources

puppet query 'resources[count()] { }'

Get a list of nodes for which a fact is not set

puppet query 'inventory[certname] { ! certname in inventory[certname] {  facts.myfactofinterest is not null } }'

Get a list of nodes with a specific structured fact value while using a wildcard in the fact structure

puppet query 'fact_contents { path ~> ["first_level",".*","third_level"] and value = "Y" }'

Get all values for a fact

This fetches a specified fact from all nodes and groups them by value. The result is a unique list of values for the fact:

puppet query 'facts[value]{ name = "domain" group by value}'

Result:

[
  {
    "value": "example.org"
  },
  {
    "value": "example.com"
  }
]

Get value for fact A that have a specific value on fact b

This returns the ip address for all nodes where the trusted fact pp_role is foobar

puppet query 'inventory[certname,facts.networking.ip]{ trusted.extensions.pp_role = "foobar"}'
[
  {
    "certname": "bastel.local",
    "facts.networking.ip": "127.0.0.1"
  },
  {
    "certname": "freak.local",
    "facts.networking.ip": "127.0.0.2"
  }
]

Get all values for a structured fact

The facts endpoint doesn’t support structured facts :(

puppet query 'inventory[facts.networking.ip]{ group by facts.networking.ip }'
[
  {
    "facts.networking.ip": "192.168.178.22"
  },
  {
    "facts.networking.ip": "192.168.178.2"
  }
]

Get all values for a trusted fact

With the following query you can get a unique list of different values from the trusted hash

via inventory

puppet query 'inventory[trusted.extensions.pp_role]{ group by trusted.extensions.pp_role }'

Result:

[
  {
    "trusted.extensions.pp_role": "guacamole"
  },
  {
    "trusted.extensions.pp_role": "gitlab"
  },
  {
    "trusted.extensions.pp_role": "kibana"
  }
]

via facts_contents

See also Get a list of nodes with a specific structured fact value while using a wildcard in the fact structure

puppet query 'fact_contents[value] { path = ["trusted","extensions","pp_role"] group by value }'

Result:

[
  {
    "value": "guacamole"
  },
  {
    "value": "gitlab"
  },
  {
    "value": "kibana"
  }
]

Get a sorted list of all recent event failures

This gives a list of recent event failures from the lastest reports and applies counts to them to show how many of them are occurring across your nodes. In order to sort by the counts, the command has to be combined with jq, which is typically another package that has to be installed on your system.

puppet query 'events[resource_type,resource_title,count()] { latest_report? = true and status = "failure" group by resource_type, resource_title }' | jq 'sort_by(.count) | reverse'

Get class information from latest catalogs from all nodes

PuppetDB stores the last catalog of each node. That contains all classes and their parameters. There might be a usecase you get those information.

puppet query 'resources { type = "Class" and title = "Profile::Core::Node_info"}'

Get class information from latest catalogs from all nodes that had another class in their last catalog

Of course we can make the above query more complicated and restrict it to nodes that had another class in their last catalog

puppet query 'resources { type = "Class" and title = "Profile::Core::Node_info" and certname in resources[certname] { type = "Class" and title = "Prometheus::Node_exporter" }}'

Why is that helpful? Prometheus has native support to query PuppetDB and create a scrape config based on it (introduced in github.com/prometheus/prometheus/commit/8920024323ad8fef353ec2fc495894f8748f0687):

- job_name: node-exporter-dev
  puppetdb_sd_configs:
    - url: "http://puppetdb.local:8080"
      query: |
        resources {
          type = "Class" and title = "Profile::Core::Node_info" and
          certname in resources[certname] {
            type = "Class" and title = "Prometheus::Node_exporter"
          }
        }
      refresh_interval: 30s
      follow_redirects: true
      include_parameters: true
      enable_http2: true
      port: 9100
    relabel_configs:
      - source_labels: [__meta_puppetdb_certname]
        target_label: instance
      - source_labels: [__meta_puppetdb_environment]
        target_label: environment
      - source_labels: [__meta_puppetdb_parameter_site]
        target_label: site
      - source_labels: [__meta_puppetdb_parameter_role]
        target_label: role
      - source_labels: [__meta_puppetdb_parameter_cluster]
        target_label: cluster

And the related Puppet class:

class profile::core::node_info (
  Optional[String[1]] $site = $::site,
  Optional[String[1]] $role = $::role,
  Optional[String[1]] $cluster = $::cluster,
  Optional[String[1]] $variant = $::variant,
  Optional[String[1]] $subvariant = $::subvariant,
) {
}

This will assign top scope variables from an ENC to a class. That in turn will save those topscope variables in the catalog and then they are available for Prometheus.

Thanks to Julien “roidelapluie” Pivotto for writing puppetdb_sd_config and thanks to Joshua Hoblitt for figuring out the PQL Query and the config example.

Endpoints and fields

The available endpoints is a function of which version of puppetdb you are going against. The current list is available at https://puppet.com/docs/puppetdb/7/api/query/v4/entities.html.

If you don’t feel like looking up which fields are available for a given endpoint, use a bogus field name and the puppet query tool will return the valid list.

$ puppet query 'resources[foo] {}'
2022/07/21 10:02:41 ERROR - [GET /pdb/query/v4][400] getQueryBadRequest  Can't extract unknown 'resources' field 'foo'. Acceptable fields are 'resource', 'certname', 'tags', 'exported', 'line', 'title', 'type', 'environment', 'file', and 'parameters'