Skip to content

JSON Output

Eric Busboom edited this page Nov 13, 2017 · 4 revisions

Composing JSON From A Data Package

How can Metatab generate JSON Output, rather than CSV?

For simple cases, we could just convert the row data to a sequence of dictionaries.

But more sophisticated system would allow for creation of complex data structures. For these cases, the system might have each Root.DataFile generate a sequence of dicts, with the key for each dict being specified as query path ( like JSONPath or ObjectPath ). Each row generate from a table would be rendered as a dict, and the path key value would specify where in an existing structure to put the dict.

Or, maybe each row has a path and an optional key. If the Path references a dict, then the key is the key and the remainder of the row is a dict value. If the path references a list, then there must be no key, and the dict is added to the list.

THe path can include values from the input row.

Generating JSON Output

In some cases it would be useful for Metatab to be able to generate JSON format data files, rather than CSV. This document explores ideas for incorporating specifications into Metatab schemas for outputting JSON.

Metatab already has a system for generating JSON, but for Metatab files themselves. The DeclareTerm.ChildPropertyType term, which can take values of None, scalar or sequence, controls how Metatab table-oriented files are converted into JSON. These terms could either be added to Root.Schema, or there could be a special interpretation of column names.

The process we'll use is generally know as "Unflattening". The forward, "flattening" converts a nested data structure into key/value pairs, where the key is really a path through the nested structure. In fact, the key could actually be a JSONPath construct, and most likely, would be, at least, very similar.

For instance, this JSON Example file, can be run through this online flattening tool to transform

{
    "scalar1": "abc",
    "scalar2": "def",
    "obj": {
        "okey1": "abc",
        "okey2": "def"
    },
    "ilist": [
        1.0,
        2.0,
        3.0,
        4.0,
        5.0
    ],
    "mixed": [
        1.0,
        2.0,
        3.0,
        4.0,
        {
            "okey1": "abc",
            "okey2": "def"
        }
    ],
    "olist1": [
        {
            "okey1": "abc",
            "okey2": "def"
        },
        {
            "okey1": "abc",
            "okey2": "def"
        }
    ],
    "olist2": [
        {
            "okey1": "abc",
            "okey2": [
                1.0,
                2.0
            ]
        },
        {
            "okey1": "abc",
            "okey2": [
                1.0,
                2.0
            ]
        }
    ]
}

Is flattened to:

{
	"scalar1": "abc",
	"scalar2": "def",
	"obj.okey1": "abc",
	"obj.okey2": "def",
	"ilist[0]": 1,
	"ilist[1]": 2,
	"ilist[2]": 3,
	"ilist[3]": 4,
	"ilist[4]": 5,
	"mixed[0]": 1,
	"mixed[1]": 2,
	"mixed[2]": 3,
	"mixed[3]": 4,
	"mixed[4].okey1": "abc",
	"mixed[4].okey2": "def",
	"olist1[0].okey1": "abc",
	"olist1[0].okey2": "def",
	"olist1[1].okey1": "abc",
	"olist1[1].okey2": "def",
	"olist2[0].okey1": "abc",
	"olist2[0].okey2[0]": 1,
	"olist2[0].okey2[1]": 2,
	"olist2[1].okey1": "abc",
	"olist2[1].okey2[0]": 1,
	"olist2[1].okey2[1]": 2
}

The flattening process works in reverse, with each of the Paths specifying where in the existing structure to attach a new object.

Since all of the objects should have identity values if they must be referenced, there are only two components to a path:

  1. For dicts, A dict key, such as in glossary.GlossDiv.title
  2. For list items, one of the fields in the objects can be used as if it were a key, for instance collection.books[title='foobar'].authors

The list object reference could be styled to be more explicitly a "virtual key" by adding a '.': collection.books.[title='foobar'].authors. That style can be interpreted as assuming that books is interpreted as a dict where the keys are the title field of the inner object. If that were true, if it really were a dict with the title for keys, then the equivalent path would be: ``collection.books.foobar.authors`

In Metatab, these strings would probably be interpolated, so they would appear as:

    Table.Path: collection.{row.collection_type}.[title={row.title}].authors`

Data Flow Plan

For a complex data flow, the process would likely be to conver the input data to a collection of normalized tables, just as one would do for a properly designed database. Then, these tables are reconstituted into a data structure, in a specified order.

The JSON generation process would start with a top level skeleton. Then, the first output table runs, constructing objects and adding them to the skeleton. Then, the next table runs, adding to the previously updated structure, possibly adding sub-objects to those added to the skeleton in the first pass.

JSON Output from Metatab

The simplest way to define how to output JSON from a table would be to attach a dotted path string to each column of the table. The attachement could be done either as an extra field in the metadata, or as the name of the column. However, since the bracket characters ("[]") are not sensible in a column name, it would be best for the output path to be an additional field in the schema.

Clone this wiki locally