Tuesday, May 15, 2018

JSON Parsing with gron

While jq is powerful, its major shortcoming is that it requires one to know the JSON structure being parsed.  gron is less restrictive and can be combined easily with Linux tools, such as grep, sed, and awk to build very powerful parsing pipelines, without having to know exactly where to expect a particular structure or value.

Using the Right Tools

As a polyglot programmer I strive to employ the simplest approach, the best tool for the job.  I have parsed JSON in Java, Python, and Go, but I think too many times we ignore the UNIX/Linux tools (sed, awk, cut, etc.).  Too many programmers ignore these tools, and write hulking data parsers that are just overkill.  With gron, I find it easier to utilize these strong text editing, manipulation, and filtering tools.

Installing gron

Instructions can be found here for installing gron.  I used brew install gron.  And then, for reasons that will be apparent later, I added the following alias:
alias norg="gron --ungron".

Make JSON greppable

Obviously, being text-based, JSON is already "greppable".  However, the strength of gron comes from it's ability to split JSON into lines of what is referred to as "discrete assignments".  Given the JSON snippet below (from an aws ec2 CLI call):

  1.  {  
  2.     "Reservations": [  
  3.         {  
  4.             "OwnerId""<OWNER_ID>",   
  5.             "ReservationId""<RES_ID>",   
  6.             "Groups": [],   
  7.             "Instances": [  
  8.                 {  
  9.                     "Monitoring": {  
  10.                         "State""disabled"  
  11.                     },   
  12.                     "PublicDnsName""",   
  13.                     "State": {  
  14.                         "Code"16,   
  15.                         "Name""running"  
  16.                     },   
  17.                     "EbsOptimized"false,   
  18.                     "LaunchTime""2016-08-31T22:39:37.000Z",   
  19.                     "PublicIpAddress""<PUBLIC_IP>",   
  20.                     "PrivateIpAddress""<PRIVATE_IP>",   
  21.                     "ProductCodes": [],   
  22.                     "VpcId""<VPC_ID>",   
  23.                     "StateTransitionReason""",   
  24.                     "InstanceId""<ID>",   
  25.                     "ImageId""<AMI_ID>",   
  26.                     "PrivateDnsName""<PRIVATE_DNS_NAME>",   
  27.                     "KeyName""<KEY_NAME>",   
  28.                     "SecurityGroups": [...  


gron will parse (cat ~/ec2.json | gron) and convert the JSON to lines of discrete assignments:
  1. json = {};  
  2. json.Reservations = [];  
  3. json.Reservations[0] = {};  
  4. json.Reservations[0].Groups = [];  
  5. json.Reservations[0].Instances = [];  
  6. json.Reservations[0].Instances[0] = {};  
  7. json.Reservations[0].Instances[0].AmiLaunchIndex = 0;  
  8. json.Reservations[0].Instances[0].Architecture = "x86_64";  
  9. json.Reservations[0].Instances[0].BlockDeviceMappings = [];  
  10. json.Reservations[0].Instances[0].BlockDeviceMappings[0] = {};  
  11. json.Reservations[0].Instances[0].BlockDeviceMappings[0].DeviceName = "/dev/xvda";  
  12. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs = {};  
  13. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.AttachTime = "2016-08-21T22:00:41.000Z";  
  14. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.DeleteOnTermination = true;  
  15. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.Status = "attached";  
  16. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId = "<VOL_ID>";  
  17. json.Reservations[0].Instances[0].ClientToken = "<CLIENT_TOKEN>";  
  18. json.Reservations[0].Instances[0].EbsOptimized = false;  
  19. json.Reservations[0].Instances[0].Hypervisor = "xen";  
  20. json.Reservations[0].Instances[0].ImageId = "<AMI_ID>";  
  21. json.Reservations[0].Instances[0].InstanceId = "<ID>";  
  22. json.Reservations[0].Instances[0].InstanceType = "t2.small";  
  23. json.Reservations[0].Instances[0].KeyName = "<KEY_NAME>";  
  24. json.Reservations[0].Instances[0].LaunchTime = "2016-08-31T22:39:37.000Z";  
  25. json.Reservations[0].Instances[0].Monitoring = {};  
  26. json.Reservations[0].Instances[0].Monitoring.State = "disabled";  
  27. json.Reservations[0].Instances[0].NetworkInterfaces = [];  
  28. json.Reservations[0].Instances[0].NetworkInterfaces[0] = {};  
  29. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association = {};  
  30. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.IpOwnerId = "<OWNER_ID>";  
  31. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicDnsName = "";  
  32. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicIp = "<PUBLIC_IP>";  
  33. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment = {};  
  34. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachTime = "2016-08-21T22:00:40.000Z";  
  35. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachmentId = "<ENI_ID>";  
  36. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeleteOnTermination = true;  
  37. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeviceIndex = 0;  
  38. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.Status = "attached";  
  39. json.Reservations[0].Instances[0].NetworkInterfaces[0].Description = "Primary network interface";  
  40. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups = [];  
  41. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0] = {};  
  42. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupId = "<SG_ID>";  
  43. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupName = "Bastion";  
  44. json.Reservations[0].Instances[0].NetworkInterfaces[0].MacAddress = "<MAC_ADDRESS>";  
  45. json.Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId = "<ENI_ID>";  
  46. json.Reservations[0].Instances[0].NetworkInterfaces[0].OwnerId = "<OWNER_ID>";  
  47. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddress = "<PRIVATE_IP>";  
  48. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses = [];  
  49. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0] = {};  
  50. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association = {};  
  51. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.IpOwnerId = "<OWNER_ID>";  
  52. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicDnsName = "";  
  53. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp = "<PUBLIC_IP>";  
  54. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Primary = true;  
  55. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].PrivateIpAddress = "<PRIVATE_IP>";  
  56. json.Reservations[0].Instances[0].NetworkInterfaces[0].SourceDestCheck = true;  
  57. json.Reservations[0].Instances[0].NetworkInterfaces[0].Status = "in-use";  
  58. json.Reservations[0].Instances[0].NetworkInterfaces[0].SubnetId = "<SUBNET_ID>";  
  59. json.Reservations[0].Instances[0].NetworkInterfaces[0].VpcId = "<VPC_ID>";  
  60. json.Reservations[0].Instances[0].Placement = {};  
  61. json.Reservations[0].Instances[0].Placement.AvailabilityZone = "us-east-1a";  
  62. json.Reservations[0].Instances[0].Placement.GroupName = "";  
  63. json.Reservations[0].Instances[0].Placement.Tenancy = "default";  
  64. json.Reservations[0].Instances[0].PrivateDnsName = "<DNS_NAME>";  
  65. json.Reservations[0].Instances[0].PrivateIpAddress = "<PRIVATE_IP>";  
  66. json.Reservations[0].Instances[0].ProductCodes = [];  
  67. json.Reservations[0].Instances[0].PublicDnsName = "";  
  68. json.Reservations[0].Instances[0].PublicIpAddress = "<PUBLIC_IP>";  
  69. json.Reservations[0].Instances[0].RootDeviceName = "/dev/xvda";  
  70. json.Reservations[0].Instances[0].RootDeviceType = "ebs";  
  71. json.Reservations[0].Instances[0].SecurityGroups = [];...  


Munging gron Output Through Command Line Pipelining

JSON is more compact than the gron output, and suited for data structuring for transport and integration.  While more verbose, the gron output is a more usable format for text searching, filtering, and manipulation via Linux's text manipulation and filtering tools, or even sed and awk.  For example, consider the following commands:


$ cat ~/ec2.json | gron | grep AvailabilityZone
json.Reservations[0].Instances[0].Placement.AvailabilityZone = "us-east-1a";
The above command "pipeline" searches the gronned JSON for the text "AvailabilityZone" value, and returns the discrete assignment line.

$ cat ~/ec2.json | gron | grep AvailabilityZone|cut -d\" -f2
us-east-1a
The above pipeline extracts the AvailabilityZone value via the Linux cut command.

$ cat ~/ec2s.json | gron | grep InstanceId | cut -d\" -f2
...
<ID_1>
<ID_2>
<ID_3>
...
The above pipeline pulls all the EC2 instance IDs from the aws ec2 cli output, and creates a list of IDs.

Transforming JSON with gron and ungron (a.k.a. norg)

Earlier, I referenced the norg alias, that pointed to the ungron command.  With this command, gron will transform gron discrete assignments back into JSON.  Consider the commands below:
Note:  cat was removed and gron was called directly.

$ gron ~/ec2s.json | grep InstanceId | norg
...
{
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
...
The above pipeline grons the JSON, greps for the InstanceId field, and then converts the lines of discrete assignments (json.Reservations[999].Instances[0].InstanceId = "<ID>";) from the grepped gron output back into usable and simplified JSON.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId | norg
...
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
...
The above pipeline adds ImageId to the transformed JSON using egrep (Yes, I know GNU has deprecated egrep in lieu of grep -E.) .

sed

sed is a powerful stream editor, and is handy for executing find/replace algorithms on text files.
$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e 's/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g' | norg
...
{
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.small"
        }
      ]
    },
    {
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.micro"
        }
      ]
    },
...
The above pipeline adds stream editing with sed to perform multiple inline string replacements.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e 's/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g' | norg | tr -d '\n' | sed "s/ //g"
...
{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.small"}]},{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.micro"}]},
...
The above pipeline adds the translate command, tr, to remove newline characters and then another sed command to remove remaining whitespace.  This is handy for minimizing JSON files.

Summary

gron converts structured JSON into lines of discrete assignments.  This makes it easier to pipeline text to native tools like grep and sed to perform powerful text manipulation.  Once manipulated, the discrete assignments can be transformed back into JSON via the gron -u|--ungron command.  This makes gron a complement to existing tools like grep and sed, for munging (a.k.a. manipulating) JSON data.