| tags: [ development python ] categories: [Development ]
jq gems
jq
jq
, short for json query, is a command line tool that is useful for navigating,
filtering and building json data
structures. This post
describes some starting queries for jq that I find helpful to get myself going on a query.
I’m not going to try and duplicate the documentation there, it’s good. You should read it.
All code examples are committed to gitlab: jq gems if you would like to download them and follow along.
Starting Data
To start things off, I need to have a little sample JSON data. Here is my starting example. Unless otherwise stated, I’ve saved this in a file data/sample_data_01.json.
[
{
"index": 0,
"value": 0.4099732990490448,
"category": "that"
},
{
"index": 1,
"value": 0.4077778399424672,
"category": "this"
},
{
"index": 2,
"value": -0.6830016495659132,
"category": "that"
},
{
"index": 3,
"value": -0.8812356955272562,
"category": "that"
}
]
This is completely arbitrary data but intended to provide some values that I can operate on.
Pretty Print the json structure (or not)
The ‘Hello World’ of jq (for me at least) is to verify that a json structure is valid. jq does this by reading in the json document, then pretty printing it. The first parameter to jq is the ‘filter’ or command that will operate over the json structure provided as the second parameter.
jq '.' ../data/sample_data_01.json
[
{
"index": 0,
"value": 0.4099732990490448,
"category": "that"
},
{
...chopped...
The pretty printing is great in some cases, but tedious in others. I use the -c
option to compress output but
still validate:
jq -c '.' ../data/sample_data_01.json
[{"index":0,"value":0.4099732990490448,"category":"that"},{"index":1,"value":0.4077778399424672,"category":"this"}, {"index":2,"value":-0.6830016495659132,"category":"that"},{"index":3,"value":-0.8812356955272562,"category":"that"}]
Structure errors
I frequently use jq to validate that the structure of a JSON document is correct, For example, a missing quote in a json structure will return an error message:
jq '.' ../data/sample_data_bad_quote_01.json
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 13, column 5
An extra comma added to a list inside a json structure somewhere will also report an error. This is one of my most frequent errors, because some languages allow it while others do not. Fortunately jq will give you and idea where the additional comma is hiding:
jq '.' ../data/sample_data_bad_extra_comma_01.json
parse error: Expected another key-value pair at line 11, column 3
JSON structure queries
Just a few of my favorite queries. These lead me on interesting queries, but again, I’m not trying to re-write the jq Manual
Counting records in jq.
To count the number of records found in a json body:
jq 'length' ../data/sample_data_01.json
4
|
(pipe) operator
jq uses |
to denote the end of one jq query and the start of a second query on the product of the previous.
Similar to the ‘pipe’ operator in bash. For example, to return the keys of each of the items in the list. First
produce each item from the list, then request the keys
on each object.
jq '.[] | keys' ../data/sample_data_01.json
[
"category",
"index",
"value"
]
[
"category",
"index",
...truncated...
Noticed that they keys have been sorted. Use keys_unsorted
for unsorted.
Filtering
To see what objects satisfy a predicate. Note, this is one of those cases where it’s important to use single quotes for the first parameter to jq. jq expressions use double quotes to represent a string. So using single quotes on the outer is important.
jq '.[]|.category=="that"' ../data/sample_data_01.json
true
false
true
true
This shows that the second object in the json list does not pass the predicate. So to filter, use select
with a
predicate.
jq '.[]|select(.category=="that")' ../data/sample_data_01.json
{
"index": 0,
"value": 0.4099732990490448,
"category": "that"
}
{
"index": 2,
"value": -0.6830016495659132,
"category": "that"
}
{
"index": 3,
"value": -0.8812356955272562,
"category": "that"
}
Average
You can do a quick average of values.
jq '[.[]|.value]|add/length' ../data/sample_data_01.json
-0.18662155152541438
Building on this, you can do average with select and predicate
jq '[ .[]|select(.category=="that") | .value] | add/length' ../data/sample_data_01.json
-0.38475468201470825
Comma stuff
Deviating from the standard json body, here is an example of a json body that has an embedded csv list of items that
I would like to extract and count. Here is the source json that I will use in the following examples. This file is
named sample_data_csv_01.json
.
[
{
"index": 0,
"value": 0.4099732990490448,
"category": "that"
},
{
"index": 1,
"value": 0.4077778399424672,
"category": "this",
"item_id": "GH3345,QR6654,WAFER556,RETICLE3"
},
{
"index": 2,
"value": -0.6830016495659132,
"category": "that"
},
{
"index": 3,
"value": -0.8812356955272562,
"category": "that",
"item_id": "GH3347,WAFER546,RETICLE2"
}
]
Again, this is completely fake data intended for example purposes. In this case, embedded commas is not ideal, and should be represented as a sub-list in json, but sometimes you can’t control the structure of json you’ve got to interpret.
Embedded commas
As a starting point, here is a query to retrieve the comma data:
jq '.[]|.item_id' ../data/sample_data_csv_01.json
null
"GH3345,QR6654,WAFER556,RETICLE3"
null
"GH3347,WAFER546,RETICLE2"
Now that we’ve got the csv items, how to extract the items from the commas? Notice that
the json objects that don’t have an item_id
report a null
for the .item_id
attribute. I need to transform the null
s into the same type as the existing
strings. if-then-else-end
to the rescue here:
jq '.[]|if .item_id == null then "" else .item_id end' ../data/sample_data_csv_01.json
""
"GH3345,QR6654,WAFER556,RETICLE3"
""
"GH3347,WAFER546,RETICLE2"
Now that they are all proper strings, I can use the split
function to interpret the attribute value as a csv list.
jq '.[]|if .item_id then .item_id else "" end | split(",") ' ../data/sample_data_csv_01.json
[]
[
"GH3345",
"QR6654",
"WAFER556",
"RETICLE3"
]
[]
[
"GH3347",
"WAFER546",
"RETICLE2"
]
Transforms
A quick diversion to transforms. If you need to transform a json document, you can do so by emitting []
or {}
structures. For example, to build a completely new json document from scratch you could do this:
jq '{comment:"hello world!"}' --null-input
{
"comment": "hello world!"
}
Or you could generate a list of json objects.
jq_examples % jq '[range(4) | . as $foo | {index: $foo}]' --null-input
[
{
"index": 0
},
{
"index": 1
},
...truncated...
Transforms and csv
Building on the csv example file (data/sample_data_csv_01.json
), I can transform the csv list into a new json object
to make it conform more closely to a json structured document:
jq '[.[] | . + {item_id_list: (if .item_id == null then "" else .item_id end | split(",") )}]' ../data/sample_data_csv_01.json
[
{
"index": 0,
"value": 0.4099732990490448,
"category": "that",
"item_id_list": []
},
{
"index": 1,
"value": 0.4077778399424672,
"category": "this",
"item_id": "GH3345,QR6654,WAFER556,RETICLE3",
"item_id_list": [
"GH3345",
"QR6654",
"WAFER556",
"RETICLE3"
]
},
...truncated...
In the previous example, there is a bit of redundancy in the item_id list. If you’re OK with replacing attributes
with different structures, then you could replace the csv item_id with the new list structure by using the same
attribute name on the right side of the +
:
jq '[.[] | . + {item_id: (if .item_id == null then "" else .item_id end | split(",") )}]' ./data/sample_data_csv_01.json
[
{
"index": 0,
"value": 0.4099732990490448,
"category": "that",
"item_id": []
},
{
"index": 1,
"value": 0.4077778399424672,
"category": "this",
"item_id": [
"GH3345",
"QR6654",
"WAFER556",
"RETICLE3"
]
},
...truncated...
Conclusion
I’ll call this the conclusion, but I’ve really just scratched the surface of what can be done in jq. It is a very useful command line tool for transforming, filtering json (and other data). There is a python package that wraps the underlying jq library that will also come in very handing if you need to perform these operations on json in scripts.