jq

jq, short for json query, is a command line tool that is useful for navigating, filtering and building json data structures. This post describes some starting queries for jq that I find helpful to get myself going on a query.

I’m not going to try and duplicate the documentation there, it’s good. You should read it.

All code examples are committed to gitlab: jq gems if you would like to download them and follow along.

Starting Data

To start things off, I need to have a little sample JSON data. Here is my starting example. Unless otherwise stated, I’ve saved this in a file data/sample_data_01.json.

[
  {
    "index": 0,
    "value": 0.4099732990490448,
    "category": "that"
  },
  {
    "index": 1,
    "value": 0.4077778399424672,
    "category": "this"
  },
  {
    "index": 2,
    "value": -0.6830016495659132,
    "category": "that"
  },
  {
    "index": 3,
    "value": -0.8812356955272562,
    "category": "that"
  }
]

This is completely arbitrary data but intended to provide some values that I can operate on.

Pretty Print the json structure (or not)

The ‘Hello World’ of jq (for me at least) is to verify that a json structure is valid. jq does this by reading in the json document, then pretty printing it. The first parameter to jq is the ‘filter’ or command that will operate over the json structure provided as the second parameter.

jq '.' ../data/sample_data_01.json          
[
  {
    "index": 0,
    "value": 0.4099732990490448,
    "category": "that"
  },
  {
...chopped...

cli_example_01.sh

The pretty printing is great in some cases, but tedious in others. I use the -c option to compress output but still validate:

jq  -c '.' ../data/sample_data_01.json
[{"index":0,"value":0.4099732990490448,"category":"that"},{"index":1,"value":0.4077778399424672,"category":"this"}, {"index":2,"value":-0.6830016495659132,"category":"that"},{"index":3,"value":-0.8812356955272562,"category":"that"}]

cli_example_02.sh

Structure errors

I frequently use jq to validate that the structure of a JSON document is correct, For example, a missing quote in a json structure will return an error message:

jq '.' ../data/sample_data_bad_quote_01.json
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 13, column 5

cli_example_03.sh

An extra comma added to a list inside a json structure somewhere will also report an error. This is one of my most frequent errors, because some languages allow it while others do not. Fortunately jq will give you and idea where the additional comma is hiding:

jq '.' ../data/sample_data_bad_extra_comma_01.json
parse error: Expected another key-value pair at line 11, column 3

cli_example_04.sh

JSON structure queries

Just a few of my favorite queries. These lead me on interesting queries, but again, I’m not trying to re-write the jq Manual

Counting records in jq.

To count the number of records found in a json body:

jq 'length' ../data/sample_data_01.json
4

cli_example_05.sh

| (pipe) operator

jq uses | to denote the end of one jq query and the start of a second query on the product of the previous.
Similar to the ‘pipe’ operator in bash. For example, to return the keys of each of the items in the list. First produce each item from the list, then request the keys on each object.

jq '.[] | keys' ../data/sample_data_01.json
[
  "category",
  "index",
  "value"
]
[
  "category",
  "index",
...truncated...

cli_example_06.sh

Noticed that they keys have been sorted. Use keys_unsorted for unsorted.

Filtering

To see what objects satisfy a predicate. Note, this is one of those cases where it’s important to use single quotes for the first parameter to jq. jq expressions use double quotes to represent a string. So using single quotes on the outer is important.

jq '.[]|.category=="that"' ../data/sample_data_01.json
true
false
true
true

cli_example_07.sh

This shows that the second object in the json list does not pass the predicate. So to filter, use select with a predicate.

jq '.[]|select(.category=="that")' ../data/sample_data_01.json
{
  "index": 0,
  "value": 0.4099732990490448,
  "category": "that"
}
{
  "index": 2,
  "value": -0.6830016495659132,
  "category": "that"
}
{
  "index": 3,
  "value": -0.8812356955272562,
  "category": "that"
}

cli_example_08.sh

Average

You can do a quick average of values.

jq '[.[]|.value]|add/length' ../data/sample_data_01.json
-0.18662155152541438

cli_example_09.sh

Building on this, you can do average with select and predicate

jq '[ .[]|select(.category=="that") | .value] | add/length' ../data/sample_data_01.json
-0.38475468201470825

cli_example_10.sh

Comma stuff

Deviating from the standard json body, here is an example of a json body that has an embedded csv list of items that I would like to extract and count. Here is the source json that I will use in the following examples. This file is named sample_data_csv_01.json.

[
  {
    "index": 0,
    "value": 0.4099732990490448,
    "category": "that"
  },
  {
    "index": 1,
    "value": 0.4077778399424672,
    "category": "this",
    "item_id": "GH3345,QR6654,WAFER556,RETICLE3"
  },
  {
    "index": 2,
    "value": -0.6830016495659132,
    "category": "that"
  },
  {
    "index": 3,
    "value": -0.8812356955272562,
    "category": "that",
    "item_id": "GH3347,WAFER546,RETICLE2"
  }
]

Again, this is completely fake data intended for example purposes. In this case, embedded commas is not ideal, and should be represented as a sub-list in json, but sometimes you can’t control the structure of json you’ve got to interpret.

Embedded commas

As a starting point, here is a query to retrieve the comma data:

jq '.[]|.item_id' ../data/sample_data_csv_01.json       
null
"GH3345,QR6654,WAFER556,RETICLE3"
null
"GH3347,WAFER546,RETICLE2"

cli_example_11.sh

Now that we’ve got the csv items, how to extract the items from the commas? Notice that the json objects that don’t have an item_id report a null for the .item_id attribute. I need to transform the nulls into the same type as the existing strings. if-then-else-end to the rescue here:

jq '.[]|if .item_id == null then "" else .item_id end' ../data/sample_data_csv_01.json
""
"GH3345,QR6654,WAFER556,RETICLE3"
""
"GH3347,WAFER546,RETICLE2"

cli_example_12.sh

Now that they are all proper strings, I can use the split function to interpret the attribute value as a csv list.

jq '.[]|if .item_id then .item_id else "" end | split(",") ' ../data/sample_data_csv_01.json
[]
[
  "GH3345",
  "QR6654",
  "WAFER556",
  "RETICLE3"
]
[]
[
  "GH3347",
  "WAFER546",
  "RETICLE2"
]

cli_example_13.sh

Transforms

A quick diversion to transforms. If you need to transform a json document, you can do so by emitting [] or {} structures. For example, to build a completely new json document from scratch you could do this:

jq '{comment:"hello world!"}' --null-input
{
  "comment": "hello world!"
}

cli_example_14.sh

Or you could generate a list of json objects.

jq_examples % jq '[range(4) | . as $foo | {index: $foo}]' --null-input
[
  {
    "index": 0
  },
  {
    "index": 1
  },
  ...truncated...

cli_example_15.sh

Transforms and csv

Building on the csv example file (data/sample_data_csv_01.json), I can transform the csv list into a new json object to make it conform more closely to a json structured document:

jq '[.[] | . + {item_id_list: (if .item_id == null then "" else .item_id end | split(",") )}]' ../data/sample_data_csv_01.json 

[
  {
    "index": 0,
    "value": 0.4099732990490448,
    "category": "that",
    "item_id_list": []
  },
  {
    "index": 1,
    "value": 0.4077778399424672,
    "category": "this",
    "item_id": "GH3345,QR6654,WAFER556,RETICLE3",
    "item_id_list": [
      "GH3345",
      "QR6654",
      "WAFER556",
      "RETICLE3"
    ]
  },
  ...truncated...

cli_example_16.sh

In the previous example, there is a bit of redundancy in the item_id list. If you’re OK with replacing attributes with different structures, then you could replace the csv item_id with the new list structure by using the same attribute name on the right side of the +:

jq '[.[] | . + {item_id: (if .item_id == null then "" else .item_id end | split(",") )}]' ./data/sample_data_csv_01.json 

[
  {
    "index": 0,
    "value": 0.4099732990490448,
    "category": "that",
    "item_id": []
  },
  {
    "index": 1,
    "value": 0.4077778399424672,
    "category": "this",
    "item_id": [
      "GH3345",
      "QR6654",
      "WAFER556",
      "RETICLE3"
    ]
  },
  ...truncated...

cli_example_17.sh

Conclusion

I’ll call this the conclusion, but I’ve really just scratched the surface of what can be done in jq. It is a very useful command line tool for transforming, filtering json (and other data). There is a python package that wraps the underlying jq library that will also come in very handing if you need to perform these operations on json in scripts.

References