Querying data
Now that JSON is being fed into jq
, you were briefly introduced to the query containing only .
.
This is called the identity filter .
and it will take the input and pass it as output.
I think of the identity operator as akin to this
in JavaScript. The current context.
So that I can work with a nice chunky bit of JSON, I've captured JSON from Github's API, and have downloaded all the recent commits for one of my projects and saved it in commits.json.
Assuming I know the path to the data I'm interested in (sometimes I do, sometimes I don't and we'll look at that later in the chapters), using this path I can extract the data I'm interested in.
The commit data from Github is returned with the most recent commits first. So to get the latest commit author, in JavaScript I use the following code:
commits[0].commit.author
In JavaScript I've assigned the commits to a variable called commits
. This is also the root of the data. So using jq, this is referred to the as the identify filter.
To view the latest commit author using jq, I use this command:
.[0].commit.author
Notice that the result is returned as JSON too. I might want to copy this or store it in a file. There are also times when I might also want to convert from JSON to a raw output (such as strings) - which I'll show you later when we'll look at the jq
options.
Finding the latest commit was straight forward enough, but what if we want to iterate over every commit?
The [] iterator
Accessing the commit data used square packets already (I used [0]
), but I can also iterate over arrays (and objects) using .[]
.
The result of using .[]
is "return all the elements of the current identity". You can also think of this as unrolling the array into it's individual elements. For example:
.[]
Something to bear in mind is that this also means that the result is not valid JSON.
Although a non-JSON result may not be what I want at this point, it does allow me to iterate over arrays and objects and transform them by connecting filters together.
Using the .[]
syntax, I can use the dot notation to access the commits authors. The result is a raw list of objects, but this result will be transformed later on.
.[].commit.author
Next I want to reshape the result, so that I limit the result to the name and email address, then finally I'll want the result in valid JSON.
Connecting filters
To reshape the data, I need to use another filter (remember that .
is a filter itself).
Filters can be connected using the pipe |
operator. The result from left side is passed into the filter on the right of the pipe. Exactly like a pipe on the command line.
The query .[].commit.author
will return a list of author objects. I want run each of these objects through a filter that will reshape the result.
I only want the .email
and .name
. Again, being familiar with JavaScript, I could pipe the result into a filter that looks like this:
.[].commit.author | { name: .name, email: .email }
Remember that the .
is the identifier of each iterated author object. The above code does indeed work, but I can make use of object shorthand, again similar to JavaScript (certainly JavaScript's ES6):
.[].commit.author | { name, email }
This will return a raw list of objects with only the name and email.
The final part is that I want the result to be a JSON array, so I'll wrap the entire command in array syntax:
[ .[].commit.author | { name, email } ]
Now the result is correctly formatted as an array containing all the elements.
We'll see a lot more connecting filters in later chapters.