Parsing complex data structures using #jq

When working with infrastructure-as-code, you have process data structures to filter, extract or transform data. How such a task develops is highly dependent on the complexity of the data structures as well as the level of nested arrays and hashes. This post shows you examples how to use yq and jq to parse them.

Creating Prometheus metrics from nested YAML

Iamgine your management has adopted YAML for the latest project called “company-as-code”. The following YAML file describes the floor plan of your company with the seats reserved for the different teams:

buildings:
- building: A
  floors:
  - floor: 0
    plan:
    - team: FOO
      seats: 20
  - floor: 1
    plan:
    - team: BAR
      seats: 25
- building: B
  floors:
  - floor: 0
    plan:
    - team: BAZ
      seats: 20
  - floor: 1
    plan:
    - team: BLARG
      seats: 25

Your task is to transform this into Prometheus metrics so that you can monitor the utilization of the floor plan. It is rather difficult to access values from high fields when plunging into the deeper levels of YAML/JSON. The following jq` expression will transform the YAML:


```bash
yq --output-format=json eval . plan.yaml \
| jq --raw-output '
    .buildings[] | .building as $building |
        .floors[] | .floor as $floor |
            .plan[] | 
                "plan{building=\"\($building)\",floor=\"\($floor)\",team=\"\(.team)\"} \(.seats)"
'

As soon as .buildings[] is expanded, the name of the building - from .building - is stored in $building for later use. The same applies to the floor. As soon as the array plan is expanded the variables declared earlier can be used to include the name of the building as well as the floor:

You can also simplify this by using gojq:

gojq --yaml-input --raw-output '.' plan.yaml

Selecting elements based on deeper properties

In this second case you are tasked with filtering pods based on their readiness. The following expression will select the names of all pods that are ready:

kubectl get pods --output=json \
| jq --raw-output '
    .items[] | . as $pod |
        .status.conditions[] |
            select(.type == "Ready" and .status == "True") | 
                $pod.metadata.name
'

The tricky part of this expression is the selection of pods based on array items deeper down in the data structure. It is important to store the current pod in $pod so it can be referenced later on. This allows the conditions in .status.conditions[] to be filtered using select(). The metadata of the pod can be returned by using the variable $pod declared earlier.