Tuesday, May 15, 2018

JSON Parsing with gron

While jq is powerful, its major shortcoming is that it requires one to know the JSON structure being parsed.  gron is less restrictive and can be combined easily with Linux tools, such as grep, sed, and awk to build very powerful parsing pipelines, without having to know exactly where to expect a particular structure or value.

Using the Right Tools

As a polyglot programmer I strive to employ the simplest approach, the best tool for the job.  I have parsed JSON in Java, Python, and Go, but I think too many times we ignore the UNIX/Linux tools (sed, awk, cut, etc.).  Too many programmers ignore these tools, and write hulking data parsers that are just overkill.  With gron, I find it easier to utilize these strong text editing, manipulation, and filtering tools.

Installing gron

Instructions can be found here for installing gron.  I used brew install gron.  And then, for reasons that will be apparent later, I added the following alias:
alias norg="gron --ungron".

Make JSON greppable

Obviously, being text-based, JSON is already "greppable".  However, the strength of gron comes from it's ability to split JSON into lines of what is referred to as "discrete assignments".  Given the JSON snippet below (from an aws ec2 CLI call):

  1.  {  
  2.     "Reservations": [  
  3.         {  
  4.             "OwnerId""<OWNER_ID>",   
  5.             "ReservationId""<RES_ID>",   
  6.             "Groups": [],   
  7.             "Instances": [  
  8.                 {  
  9.                     "Monitoring": {  
  10.                         "State""disabled"  
  11.                     },   
  12.                     "PublicDnsName""",   
  13.                     "State": {  
  14.                         "Code"16,   
  15.                         "Name""running"  
  16.                     },   
  17.                     "EbsOptimized"false,   
  18.                     "LaunchTime""2016-08-31T22:39:37.000Z",   
  19.                     "PublicIpAddress""<PUBLIC_IP>",   
  20.                     "PrivateIpAddress""<PRIVATE_IP>",   
  21.                     "ProductCodes": [],   
  22.                     "VpcId""<VPC_ID>",   
  23.                     "StateTransitionReason""",   
  24.                     "InstanceId""<ID>",   
  25.                     "ImageId""<AMI_ID>",   
  26.                     "PrivateDnsName""<PRIVATE_DNS_NAME>",   
  27.                     "KeyName""<KEY_NAME>",   
  28.                     "SecurityGroups": [...  


gron will parse (cat ~/ec2.json | gron) and convert the JSON to lines of discrete assignments:
  1. json = {};  
  2. json.Reservations = [];  
  3. json.Reservations[0] = {};  
  4. json.Reservations[0].Groups = [];  
  5. json.Reservations[0].Instances = [];  
  6. json.Reservations[0].Instances[0] = {};  
  7. json.Reservations[0].Instances[0].AmiLaunchIndex = 0;  
  8. json.Reservations[0].Instances[0].Architecture = "x86_64";  
  9. json.Reservations[0].Instances[0].BlockDeviceMappings = [];  
  10. json.Reservations[0].Instances[0].BlockDeviceMappings[0] = {};  
  11. json.Reservations[0].Instances[0].BlockDeviceMappings[0].DeviceName = "/dev/xvda";  
  12. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs = {};  
  13. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.AttachTime = "2016-08-21T22:00:41.000Z";  
  14. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.DeleteOnTermination = true;  
  15. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.Status = "attached";  
  16. json.Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId = "<VOL_ID>";  
  17. json.Reservations[0].Instances[0].ClientToken = "<CLIENT_TOKEN>";  
  18. json.Reservations[0].Instances[0].EbsOptimized = false;  
  19. json.Reservations[0].Instances[0].Hypervisor = "xen";  
  20. json.Reservations[0].Instances[0].ImageId = "<AMI_ID>";  
  21. json.Reservations[0].Instances[0].InstanceId = "<ID>";  
  22. json.Reservations[0].Instances[0].InstanceType = "t2.small";  
  23. json.Reservations[0].Instances[0].KeyName = "<KEY_NAME>";  
  24. json.Reservations[0].Instances[0].LaunchTime = "2016-08-31T22:39:37.000Z";  
  25. json.Reservations[0].Instances[0].Monitoring = {};  
  26. json.Reservations[0].Instances[0].Monitoring.State = "disabled";  
  27. json.Reservations[0].Instances[0].NetworkInterfaces = [];  
  28. json.Reservations[0].Instances[0].NetworkInterfaces[0] = {};  
  29. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association = {};  
  30. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.IpOwnerId = "<OWNER_ID>";  
  31. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicDnsName = "";  
  32. json.Reservations[0].Instances[0].NetworkInterfaces[0].Association.PublicIp = "<PUBLIC_IP>";  
  33. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment = {};  
  34. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachTime = "2016-08-21T22:00:40.000Z";  
  35. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.AttachmentId = "<ENI_ID>";  
  36. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeleteOnTermination = true;  
  37. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.DeviceIndex = 0;  
  38. json.Reservations[0].Instances[0].NetworkInterfaces[0].Attachment.Status = "attached";  
  39. json.Reservations[0].Instances[0].NetworkInterfaces[0].Description = "Primary network interface";  
  40. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups = [];  
  41. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0] = {};  
  42. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupId = "<SG_ID>";  
  43. json.Reservations[0].Instances[0].NetworkInterfaces[0].Groups[0].GroupName = "Bastion";  
  44. json.Reservations[0].Instances[0].NetworkInterfaces[0].MacAddress = "<MAC_ADDRESS>";  
  45. json.Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId = "<ENI_ID>";  
  46. json.Reservations[0].Instances[0].NetworkInterfaces[0].OwnerId = "<OWNER_ID>";  
  47. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddress = "<PRIVATE_IP>";  
  48. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses = [];  
  49. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0] = {};  
  50. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association = {};  
  51. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.IpOwnerId = "<OWNER_ID>";  
  52. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicDnsName = "";  
  53. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Association.PublicIp = "<PUBLIC_IP>";  
  54. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].Primary = true;  
  55. json.Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddresses[0].PrivateIpAddress = "<PRIVATE_IP>";  
  56. json.Reservations[0].Instances[0].NetworkInterfaces[0].SourceDestCheck = true;  
  57. json.Reservations[0].Instances[0].NetworkInterfaces[0].Status = "in-use";  
  58. json.Reservations[0].Instances[0].NetworkInterfaces[0].SubnetId = "<SUBNET_ID>";  
  59. json.Reservations[0].Instances[0].NetworkInterfaces[0].VpcId = "<VPC_ID>";  
  60. json.Reservations[0].Instances[0].Placement = {};  
  61. json.Reservations[0].Instances[0].Placement.AvailabilityZone = "us-east-1a";  
  62. json.Reservations[0].Instances[0].Placement.GroupName = "";  
  63. json.Reservations[0].Instances[0].Placement.Tenancy = "default";  
  64. json.Reservations[0].Instances[0].PrivateDnsName = "<DNS_NAME>";  
  65. json.Reservations[0].Instances[0].PrivateIpAddress = "<PRIVATE_IP>";  
  66. json.Reservations[0].Instances[0].ProductCodes = [];  
  67. json.Reservations[0].Instances[0].PublicDnsName = "";  
  68. json.Reservations[0].Instances[0].PublicIpAddress = "<PUBLIC_IP>";  
  69. json.Reservations[0].Instances[0].RootDeviceName = "/dev/xvda";  
  70. json.Reservations[0].Instances[0].RootDeviceType = "ebs";  
  71. json.Reservations[0].Instances[0].SecurityGroups = [];...  


Munging gron Output Through Command Line Pipelining

JSON is more compact than the gron output, and suited for data structuring for transport and integration.  While more verbose, the gron output is a more usable format for text searching, filtering, and manipulation via Linux's text manipulation and filtering tools, or even sed and awk.  For example, consider the following commands:


$ cat ~/ec2.json | gron | grep AvailabilityZone
json.Reservations[0].Instances[0].Placement.AvailabilityZone = "us-east-1a";
The above command "pipeline" searches the gronned JSON for the text "AvailabilityZone" value, and returns the discrete assignment line.

$ cat ~/ec2.json | gron | grep AvailabilityZone|cut -d\" -f2
us-east-1a
The above pipeline extracts the AvailabilityZone value via the Linux cut command.

$ cat ~/ec2s.json | gron | grep InstanceId | cut -d\" -f2
...
<ID_1>
<ID_2>
<ID_3>
...
The above pipeline pulls all the EC2 instance IDs from the aws ec2 cli output, and creates a list of IDs.

Transforming JSON with gron and ungron (a.k.a. norg)

Earlier, I referenced the norg alias, that pointed to the ungron command.  With this command, gron will transform gron discrete assignments back into JSON.  Consider the commands below:
Note:  cat was removed and gron was called directly.

$ gron ~/ec2s.json | grep InstanceId | norg
...
{
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "InstanceId": "<ID>"
        }
      ]
    },
...
The above pipeline grons the JSON, greps for the InstanceId field, and then converts the lines of discrete assignments (json.Reservations[999].Instances[0].InstanceId = "<ID>";) from the grepped gron output back into usable and simplified JSON.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId | norg
...
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
    {
      "Instances": [
        {
          "ImageId": "<AMI_ID>",
          "InstanceId": "<ID>"
        }
      ]
    },
...
The above pipeline adds ImageId to the transformed JSON using egrep (Yes, I know GNU has deprecated egrep in lieu of grep -E.) .

sed

sed is a powerful stream editor, and is handy for executing find/replace algorithms on text files.
$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e 's/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g' | norg
...
{
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.small"
        }
      ]
    },
    {
      "node": [
        {
          "ami": "<AMI_ID>",
          "id": "<ID>",
          "type": "t2.micro"
        }
      ]
    },
...
The above pipeline adds stream editing with sed to perform multiple inline string replacements.

$ gron ~/ec2s.json | egrep InstanceId\|ImageId\|InstanceType | sed -e 's/Instances/node/g;s/ImageId/ami/g;s/InstanceType/type/g;s/InstanceId/id/g' | norg | tr -d '\n' | sed "s/ //g"
...
{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.small"}]},{"node":[{"ami":"<AMI_ID>","id":"<ID>","type":"t2.micro"}]},
...
The above pipeline adds the translate command, tr, to remove newline characters and then another sed command to remove remaining whitespace.  This is handy for minimizing JSON files.

Summary

gron converts structured JSON into lines of discrete assignments.  This makes it easier to pipeline text to native tools like grep and sed to perform powerful text manipulation.  Once manipulated, the discrete assignments can be transformed back into JSON via the gron -u|--ungron command.  This makes gron a complement to existing tools like grep and sed, for munging (a.k.a. manipulating) JSON data.

20 comments:

  1. I surprise how much effort you put to create such a great informative website!


    Accident Attorney Fort Lauderdale

    ReplyDelete
  2. Thanks for sharing is so amazing and helpful to us.
    Buy Hydrocodone online

    ReplyDelete
  3. Looking for English to Spanish Translators? We provide professional Translation Services at highly competitive rates without compromising the quality.
    spanish to english translation services

    ReplyDelete

  4. Best Article buy Pain Pills online Excellent post. I appreciate this site. Stick with it! Because the admin of this web page is working, no doubt very quickly it will be well-known, due to its quality contents.This website was how do you say it? Relevant!! Finally, I’ve found something that helped me.
    Best Article buy Roxicodone online Excellent post
    buy Xanax online
    buy Oxycodone online

    Best Article buy Pain Medications online Excellent post. I appreciate this site. Stick with it! Because the admin of this web page is working, no doubt very quickly it will be well-known, due to its quality contents.This website was how do you say it? Relevant!! Finally, I’ve found something that helped me.

    buy Research Chemicals online

    buy Roxicodone online

    buy Cbd Isolate online

    ReplyDelete
  5. HIPAA Employee training is specially designed to train large work forces in HIPAA basics for a fraction of the typical cost, and at a high efficiency rate.

    ReplyDelete
  6. Thanks a lot for sharing
    Having good health is what most people out there wants but can not achieve. some people takes buy ibogaine online AND buy weed online to get it.

    ReplyDelete
  7. Marijuana—also called weed, herb, pot, grass, bud, ganja, Mary Jane, and a vast number of other slang terms—is a greenish-gray mixture of the dried flowers of Cannabis sativa.

    The main active chemical in marijuana is THC (delta-9-tetrahydrocannabinol), the psychoactive ingredient. The highest concentrations of THC are found in the dried flowers, or buds. When marijuana smoke is inhaled, THC rapidly passes from the lungs into the bloodstream and is carried to the brain and other organs throughout the body. THC from the marijuana acts on specific receptors in the brain, called cannabinoid receptors, starting off a chain of cellular reactions that finally lead to the euphoria, or "high" that users experience. Feeling of a relaxed state, euphoria, and an enhanced sensory perception may occur. With higher THC levels in those who are not used to the effects, some people may feel anxious, paranoid, or have a panic attack.
    Cannabis plant used for medical or recreational purposes. The main psychoactive part of cannabis is tetrahydrocannabinol, one of the 483 known compounds in the plant, including at least 65 other cannabinoids. 
    buy real weed online
    how to buy weed online
    buy legal weed online
    buy recreational weed online
    buy weed edibles online
    can i buy weed online
    buy medical weed online
    buy weed online canada
    buying weed online reviews
    buy weed online legit
    buy weed online without medical card
    buy weed seeds online canada
    order marijuana online
    order marijuana seeds online
    how to order marijuana online
    order marijuana online without a medical card
    can you order medical marijuana online
    order marijuana online

    ReplyDelete
  8. medical care is what many people lack in the interior areas of the world. people face a lot of health issues everyday without solution. buy psychedelics online, Buy weed online, Buy alprazolam online

    ReplyDelete