A detailed specification of all fields in the POPxf data format is given below. Each subsection describes the structure, expected data type, and allowed values of the corresponding entries in the JSON object. The data type object mentioned below refers to a JSON object literal and corresponds to a set of key/value pairs representing named subfields. The format is divided into two main components: the metadata and data fields. An additional $schema field is included to specify the version of the POPxf JSON schema used. All quantities defined in this specification refer to a single datafile. They may be indexed by a superscript
The $schema field allows identifying a JSON file as conforming to the POPxf format and specifies the version of the POPxf JSON schema used. It must be set to
"https://json.schemastore.org/popxf-1.0.json"
for files conforming to this version of the specification. The version number will be incremented for future revisions of the JSON schema.
The metadata field contains all contextual and structural information required to interpret the numerical predictions. It is a JSON object with the following subfields:
Array of
Example:
"observable_names": ["observable1", "observable2", "observable3"]Array of
Example:
"parameters": ["C1", "C2", "C3"]Defines the parameter basis (e.g. an operator basis in an EFT). At least one of the two subfields wcxf and custom has to be present. If both subfields are present, any element of parameters (see above) not belonging to the wcxf basis is interpreted as belonging to the custom basis. The subfields are defined as follows:
wcxf(optional, type: object): Specifies an EFT basis defined by the Wilson Coefficient exchange format (WCxf) [@Aebischer:2017ugx]. This object contains the following fields:eft(required, type: string): EFT name defined by WCxf (e.g.,"SMEFT")basis(required, type: string): Operator basis name defined by WCxf (e.g.,"Warsaw")sectors(optional, type: array of string): Array of renormalisation-group-closed sectors of Wilson coefficients containing the Wilson coefficients given inparameters(see above). The available sectors for each EFT are defined by WCxf.
custom(optional, type: any): Field of any type and substructure to unambiguously specify any parameter basis not defined by WCxf.
Example:
"basis": {
"wcxf": {
"eft": "SMEFT",
"basis": "Warsaw",
"sectors": ["dB=de=dmu=dtau=0"]
}
}This field is required to express observables as functions of polynomials. It requires the simultaneous presence of metadata.observable_expressions and data.polynomial_central.
Array of metadata.observable_expressions (see below). Must contain unique, non-empty strings.
Example:
"polynomial_names": ["polynomial 1", "polynomial 2"]This field is required to express observables as functions of polynomials. It requires the simultaneous presence of metadata.polynomial_names and data.polynomial_central.
Defines how each observable is constructed from the named polynomials. Must be an array of observable_names field. Each object must contain:
variables(required, type: object): An object where each key is a string that is a Python-compatible variable name (used as variable in theexpressionfield described below), and each value is a string identifying a polynomial name frompolynomial_names. For example,{"num": "polynomial 1", "den": "polynomial 2"}.expression(required, type: string): A Python-compatible mathematical expression using the variable names defined invariables, e.g."num/den". Standard mathematical functions likesqrtorcosthat are implemented in packages likenumpymay be used.
Example:
"observable_expressions": [
{
"variables": {
"num": "polynomial 1",
"den": "polynomial 2"
},
"expression": "num / den"
},
{
"variables": {
"num": "polynomial 2",
"den": "polynomial 1"
},
"expression": "num / den"
},
{
"variables": {
"p1": "polynomial 1"
},
"expression": "sqrt(p1**2)"
}
]The renormalisation scale in GeV at which the parameter vector
This field can take one of two forms:
-
single number: A common scale
$\mu$ at which all polynomial coefficients$\vec p_k$ or observable coefficients$\vec o_m$ are defined.-
If the observables
$O_m$ are expressed in terms of polynomials$P_k$ , the polynomials are functions of the parameters evolved to the common scale$\mu$ :$$P_k = a_{k} + \vec{C}(\mu) \cdot \vec{b}_{k}(\mu) + \dots\ $$ -
If the observables
$O_m$ are themselves polynomials, they are themselves functions of the parameters evolved to the common scale$\mu$ :$$O_m = a_m + \vec{C}(\mu) \cdot \vec{b}_m(\mu) + \dots\ $$
-
-
array of numbers: An array defining separate scales
$\mu_k$ of polynomial coefficients$\vec p_k$ ifmetadata.polynomial_namesis present, or separate scales$\mu_m$ of observable coefficients$\vec o_m$ ifmetadata.polynomial_namesis absent.-
If
metadata.polynomial_namesis present, the observables$O_m$ are expressed in terms of polynomials$P_k$ and each polynomial is a function of the parameters evolved to its corresponding scale$\mu_k$ :$$P_k = a_{k} + \vec{C}(\mu_k) \cdot \vec{b}_{k}(\mu_k) + \dots\ $$ The length and order of the array defining the scales
$\mu_k$ must match those of the fieldmetadata.polynomial_names. To avoid ambiguities, the following restrictions apply to this case:-
data.observable_centralmust be absent; -
data.observable_uncertaintiesmust be absent or only define uncertainties for the parameter-independent terms (i.e. only the SM uncertainties in EFT applications).
-
-
If
metadata.polynomial_namesis absent, the observables$O_m$ are themselves polynomials and each observable is a function of the parameters evolved to its corresponding scale$\mu_m$ :$$O_m = a_m + \vec{C}(\mu_m) \cdot \vec{b}_m(\mu_m) + \dots\ $$ The length and order of the array defining the scales
$\mu_m$ must match those of the fieldmetadata.observable_names.
-
Examples:
"scale": 91.1876 "scale": [100.0, 200.0, 300.0, 400.0, 500.0]Specifies the maximum degree of polynomial terms included in the expansion. If omitted, the default value is 2 (i.e., quadratic polynomial). Values higher than 2 may be used to represent observables involving higher-order terms in the model parameters. The current implementation of the JSON schema defining the data format supports values up to 5. Higher degrees are not prohibited in principle but are currently unsupported to avoid excessively large data structures.
Example:
"polynomial_degree": 2Collects relevant data that may be required by a third party to reproduce the prediction. Each element of the array should be an object that corresponds to a step in the workflow and has three predefined fields: description, tool and inputs, specified below. In addition, any additional fields containing data deemed useful in this context can be included.
Schematic example:
"reproducibility": [
{
"description": "Description of the first step",
"tool": { ... },
"inputs": { ... }
},
{
"description": "Description of the second step",
"tool": { ... },
"inputs": { ... }
},
...
]The predefined fields are as follows:
-
description(optional, type: string): Free-form text description of the method and tool used in this step of obtaining the predictions. -
inputs(optional, type: object): Specifies the numerical values of input parameters used by the tool in producing the numerical values of the polynomial coefficients. Each entry maps an input name (a string) or a group of names (a stringified tuple such as"('m1','m2')") to one of the following:- A single number: interpreted as the central value of a single, uncorrelated input parameter without uncertainty;
- An object representing a uni- or multi-variate normal distribution describing one or more possibly correlated input parameters with uncertainties. This object can contain the subfields
mean,std, andcorr. If the key of the object is a stringified tuple of$N$ input names (e.g.,"('m1','m2')"with$N = 2$ ), describing a group of$N$ possibly correlated input parameters, thenmeanand (if present)stdmust be arrays of length$N$ , and (if present)corrmust be an$N \times N$ matrix, expressed as an array of$N$ arrays of$N$ numbers. The subfields are defined as follows:-
mean(required, type: number, array): central value / mean; a single number for a single input name, or an array of numbers for a group of input names; -
std(optional, type: number, array): uncertainty / standard deviation; a single number for a single input name, or an array of numbers for a group of input names; -
corr(optional, type: array of array): correlation matrix; must only be used if a group of input names is given and requires the presence ofstd.
-
- An object representing an arbitrary user-defined uni- or multi-variate probability distribution describing one or more input parameters. This object contains the following subfields:
-
distribution_type(required, type: string): a user-defined name identifying the probability distribution (e.g."uniform"); -
distribution_parameters(required, type: object): an object where each key is a user-defined name of a parameter of the probability distribution, and each value is a single number in the univariate case, or an array of numbers or arrays in the multivariate case (e.g.{"a":0, "b":1}for a uniform distribution with boundaries$a$ and$b$ ). -
distribution_description(required, type: string): Description of the custom distribution implemented, defining the fields indistribution_parameters.
-
Example:
In the example below,
"m1"is an input parameter with no associated uncertainty,"m2"and"m3"are a pair of input parameters with correlated, Gaussian uncertainties, and"m4"is a parameter that is uniformly distributed between 0 and 1."inputs": { "m1": 1.0, "('m2','m3')": { "mean": [1.0, 2.0], "std": [0.1, 0.1], "corr": [ [1.0, 0.3], [0.3, 1.0] ] }, "m4": { "distribution_type": "uniform", "distribution_parameters": { "a": 0, "b": 1 }, "distribution_description": "Uniform distribution with boundaries $a$ and $b$." } }
-
tool(optional, type: object): Provides free-form information about the tool, software or technique used in a particular step of the workflow. The predefined subfields arename,version, andsettings. Any number of additional fields may be included to record or link to supplementary metadata, such as model information/configuration, perturbative order, scale choice, PDF sets, simulation settings, input parameter cards, etc. The predefined subfields are as follows:-
name(required, type: string): name of tool, e.g."MadGraph5_aMC@NLO","POWHEG","SHERPA","WHIZARD","flavio","FeynCalc","analytical calculation", ... -
version(optional, type: string): version of the tool, e.g."1.2" -
settings(optional, type: object): object containing information about the tool settings with free-form substructure. For example:-
perturbative_order(e.g."LO","NLO","NLOQCD", ...) -
PDF: name, version, and set of the PDF used. -
UFO: name and version of UFO model used, as well as any other relevant information such as flavor schemes or webpage link. -
cuts: Information about kinematical cuts specifying the phase space region over which the observable is computed (e.g. acceptance effects, signal region definition, ...). -
scale_choice: Nominal scale choice employed when computing the predictions. This could be an array of fixed scales or a string describing a dynamical scale choice like"dynamical:HT/2". This field is particularly relevant when RGE effects are folded into the prediction, see the description ofmetadata.scaleabove. -
renormalization_scheme: details of the renormalization scheme used in the computation. -
covariant_derivative_sign: sign convention used for the covariant derivative ("+"or"-"). -
gamma5_scheme: scheme used for$\gamma_5$ in dimensional regularization ("BMHV","KKS", ...). -
evanescent: details of the treatment of evanescent operators, e.g. a reference to the scheme used. -
approximations: Any relevant approximations used, such as the use of the first leading-logarithmic approximation for RG evolution. - any other relevant settings specific to the tool or calculation.
-
Examples:
"tool": { "name": "EFTTool", "version": "1.0.0" }
"tool": { "name": "MadGraph5_aMC@NLO", "version": "3.6.2", "settings": { "UFO": { "name": "SMEFTUFO", "version": "1.0.0", "webpage": "https://smeftufo.io" }, "PDF": { "name": "LHAPDF", "version": "6.5.5", "set": "331700" }, "perturbative_order": "NLOQCD", "scale_choice": [91.1876, 125.0] } }
"tool": { "name": "AnalysisTool", "version": "1.0.0", "settings": { "cuts": { "pT_min": 20.0, "eta_max": 2.5 }, "code": "https://coderepository.com/analysis/example" } }
"tool": { "name": "analytical calculation", "settings": { "gamma5_scheme": "KKS", "covariant_derivative_sign": "-", "renormalization_scheme": "MSbar (WCs), On-shell (mass, aS, aEW)", "evanescent": "https://doi.org/10.1016/0550-3213(90)90223-Z" } }
"tool": { "name": "RGEtool", "version": "1.0.0", "settings": { "perturbative_order": "one-loop", "method": "evolution matrix formalism" } }
-
Optional free-form metadata for documentation purposes. May include fields such as authorship, contact information, date, description of the observable, information identifying the associated correlation file (e.g. hash value or filename), or external references. The format is unrestricted, allowing any JSON-encodable content.
Example:
"misc": {
"author": "John Doe",
"contact": "john.doe@example.com",
"description": "Example dataset",
"URL": "johndoe.com/exampledata",
"correlation_file": "correlations.json",
"correlation_file_hash": "AB47BG3F11DA7DCAA5726008BAAFE176"
}The data field contains the numerical representation of all polynomial terms, which define the polynomials
Each component of metadata.parameters. For example, the key "('C1', 'C2')" corresponds to the monomial
- Each key is a string representation of a Python-style tuple: a comma-separated array of strings enclosed in parentheses.
- The length of the tuple is determined by the polynomial degree
$d$ , as defined by themetadatafieldpolynomial_degree(default value:$d=2$ , i.e. quadratic polynomial, ifpolynomial_degreeis omitted). The tuple length equals$d$ , unless a real/imaginary tag is included (see below), in which case the length is$d+1$ . - The first
$d$ entries in the tuple are model parameter names, as defined in themetadatafieldparameters. These names must be sorted alphabetically to ensure unique monomial keys (assuming the same sorting rules as Python'ssort()method which sorts alphabetically according to ASCII or UNICODE-value, where all upper-case letters come before all lower-case letters, and shorter strings take precedence). Empty strings''are used to represent constant terms (equivalent to$1$ ) and to pad monomials of lower degree. For example, for a quadratic polynomial in real parameters (see below for how complex parameters are handled):- A constant
$1$ is written as"('','')", - A linear term
$C_1$ is written as"('', 'C1')", - A quadratic term
$C_1 C_2$ is written as"('C1', 'C2')".
- A constant
- To handle complex parameters, the tuple may optionally include a real/imaginary tag as its final element. This tag consists of
R(real) andI(imaginary) characters, and its length must match the polynomial degree$d$ . It indicates whether each parameter refers to its real or imaginary part. For example:-
"('', 'C1', 'RI')"corresponds to$\mathrm{Im}(C_1)$ ; -
"('C1', 'C2', 'IR')"corresponds to$\mathrm{Im}(C_1)\mathrm{Re}(C_2)$ .
-
- If the real/imaginary tag is omitted, the parameters are assumed to be real. For example:
-
"('', 'C1')"corresponds to$\mathrm{Re}(C_1)$ ; -
"('C1', 'C2')"corresponds to$\mathrm{Re}(C_1)\mathrm{Re}(C_2)$ .
-
These conventions ensure a canonical and unambiguous representation of polynomial terms while offering flexibility in the naming of model parameters. Missing monomials are implicitly treated as having zero coefficients.
The data field is a JSON object with the following subfields:
This field is required to express observables as functions of polynomials. It requires the simultaneous presence of metadata.polynomial_names and metadata.observable_expressions.
An object representing the central values of the polynomial coefficients metadata.polynomial_names.
Example:
Specifying two polynomials,
"polynomial_central": {
"('', '', 'RR')": [1.0, 1.1],
"('', 'C1', 'RI')": [1.2, 1.3],
"('C1', 'C2', 'RR')": [0.8, 0.85],
"('C1', 'C2', 'RI')": [0.5, 0.55],
"('C1', 'C2', 'II')": [0.2, 0.25]
}An object representing the central values of the observable coefficients metadata.observable_expressions. Each key must be a monomial key as defined above. Each value must be an array of metadata.observable_names.
Example:
Specifying three observable predictions,
"observable_central": {
"('', '')": [1.0, 1.1, 2.3],
"('', 'C1')": [1.2, 1.3, 0.3],
"('C1', 'C2')": [1.4, 1.5, 0.7],
"('C1', 'C3')": [1.6, 1.7, 0.5]
}An object representing the uncertainties on the observable coefficients metadata.observable_expressions. The fields specify the nature of quoted uncertainty. In many cases there may only be a single top-level field, "total", but multiple fields can be used to specify a breakdown into several sources of uncertainty (e.g., statistical, scale, PDF, ...). To avoid mistakes, the names of the top-level fields must not have the format of a monomial key (i.e., stringified tuples as defined above). The value of each top-level field can either be an object or an array of floats. Objects must have the same structure as observable_central, arrays must have length "('','')" for a quadratic polynomial).
Examples:
"observable_uncertainties": {
"total": {
"('', '')": [0.05, 0.06, 0.01],
"('', 'C1')": [0.1, 0.12, 0.01],
"('C1', 'C2')": [0.02, 0.03, 0.02],
"('C1', 'C3')": [0.05, 0.06, 0.01]
}
}Specifying only the SM uncertainties:
"observable_uncertainties": {
"total": [0.05, 0.06, 0.01]
}Specifying an uncertainty breakdown:
"observable_uncertainties": {
"MC_stats": {
"('', '')": [0.002, 0.0012, 0.001],
"('', 'C1')": [0.001, 0.0015, 0.0001]
},
"scale": {
"('', '')": [0.04, 0.05, 0.06],
"('', 'C1')": [0.1, 0.12, 0.01]
},
"PDF": {
"('', '')": [0.03, 0.04, 0.05],
"('', 'C1')": [0.02, 0.08, 0.01]
}
}Specifying a breakdown for SM uncertainties only:
"observable_uncertainties": {
"MC_stats": [0.002, 0.0012, 0.001],
"scale": [0.04, 0.05, 0.06],
"PDF": [0.03, 0.04, 0.05]
}