from textwrap import dedent
import check_datapackage as cdp
exclusion_required = cdp.Exclusion(type="required")
exclusion_requiredExclusion(jsonpath=None, type='required')
datapackage.json by using the Config class, such as excluding certain checks or adding your own.
You can pass a Config object to check() to customise the checks done on your Data Package’s properties. The following configuration options are available:
version: The version of Data Package standard to check against. Defaults to v2.exclusions: A list of checks to exclude.extensions: A list of extensions, which are additional checks that supplement those specified by the Data Package standard.strict: Whether to include “SHOULD” checks in addition to “MUST” checks. Defaults to False.The Data Package standard uses language from RFC 2119 to define its specifications. They use “MUST” for required properties and “SHOULD” for properties that should be included but are not strictly required. We try to match this language in check-datapackage by using the terms “MUST” and “SHOULD”, though we also use “required” for “MUST” in our documentation.
You can exclude checks based on their type and the fields they apply to.
The Data Package standard defines a range of check types (e.g., required or pattern) and it is also possible to create your own. For example, to exclude checks flagging missing fields, you would exclude the required check by defining an Exclusion object with this type:
Exclusion(jsonpath=None, type='required')
To exclude checks of a specific field or fields, you can use a JSON path in the jsonpath attribute of an Exclusion object. For example, you can exclude all checks on the name field of the Data Package properties by writing:
Exclusion(jsonpath='$.name', type=None)
Or you can use the wildcard JSON path selector to exclude checks on the path field of all Data Resource properties:
Exclusion(jsonpath='$.resources[*].path', type=None)
The type and jsonpath arguments can also be combined, so we can ignore an Issue of a specific type on a specific field. For example, to exclude checks of whether the created field is in a specific format (type="format"), we can use:
Exclusion(jsonpath='$.created', type='format')
To apply your exclusions when running the check(), you add them to the Config object passed to the check() function. First, let’s make an example that has three Issue items: the package name is a number, the created field is not a date, and the resource path doesn’t point to a data file (isn’t a real path). So we’ll modify our example package_properties from example_package_properties() to make these Issues appear:
{ 'name': 123, 'title': 'Hibernation Physiology of the Woolly Dormouse: A Scoping Review.', 'description': '\nThis scoping review explores the hibernation physiology of the\nwoolly dormouse, drawing on data collected over a 10-year period\nalong the Taurus Mountain range in Turkey.\n', 'id': '123-abc-123', 'created': 'not-a-date', 'version': '1.0.0', 'licenses': [{'name': 'odc-pddl'}], 'resources': [ { 'name': 'woolly-dormice-2015', 'title': 'Body fat percentage in the hibernating woolly dormouse', 'path': '\\not/a/path', 'schema': { 'fields': [ { 'name': 'eye-colour', 'type': 'string', 'title': 'Woolly dormouse eye colour' } ] } } ] }
When we run check() on these properties, we get the three expected issues:
[ Issue( jsonpath='$.created', type='format', message="'not-a-date' is not a 'date-time'", instance='not-a-date' ), Issue( jsonpath='$.name', type='type', message="123 is not of type 'string'", instance=123 ), Issue( jsonpath='$.resources[0].path', type='pattern', message="'\\\\not/a/path' does not match '^((?=[^./~])(?!file:)((?!\\\\/\\\\.\\\\.\\\\/)(?!\\\\\\\\)(?!:\\\\/\\\\/).)*|(http|ftp)s?:\\\\/\\\\/.*)$'", instance='\\not/a/path' ) ]
Now let’s exclude these Issues so that check() finds no issues by adding our exclusions to a Config object and giving it to check():
It is possible to add checks in addition to the ones defined in the Data Package standard. We call these additional checks extensions. There are currently two types of extensions supported: CustomCheck and RequiredCheck. You can add as many CustomChecks and RequiredChecks to your Config as you want to fit your needs.
Let’s say your organisation only accepts Data Packages licensed under MIT. You can express this CustomCheck as follows:
For more details on what each parameter means, see the CustomCheck documentation. Specific to this example, the type is setting the identifier of the check to only-mit and the jsonpath is indicating to only check the name property of each license in the licenses property of the Data Package.
To register your custom checks with the check() function, you add them to the Config object passed to the function:
[ Issue( jsonpath='$.created', type='format', message="'not-a-date' is not a 'date-time'", instance='not-a-date' ), Issue( jsonpath='$.licenses[0].name', type='only-mit', message='\nData Packages may only be licensed under MIT. Please review\nthe licenses listed in the Data Package.\n', instance=None ), Issue( jsonpath='$.name', type='type', message="123 is not of type 'string'", instance=123 ), Issue( jsonpath='$.resources[0].path', type='pattern', message="'\\\\not/a/path' does not match '^((?=[^./~])(?!file:)((?!\\\\/\\\\.\\\\.\\\\/)(?!\\\\\\\\)(?!:\\\\/\\\\/).)*|(http|ftp)s?:\\\\/\\\\/.*)$'", instance='\\not/a/path' ) ]
We can see that the custom check was applied: check() returned one issue flagging the first license attached to the Data Package.
You can also set specific properties in the datapackage.json file to be required, even when they aren’t required by the Data Package standard with a RequiredCheck. For example, if you want to make the description field of Data Package a required field, you can define a RequiredCheck like this:
See the RequiredCheck documentation for more details on its parameters.
To apply this RequiredCheck, it should be added to the Config object passed to check() like shown below. We’ll create a package_properties without a description field to see the effect of this check:
[ Issue( jsonpath='$.description', type='required', message="The 'description' field is required in the Data Package properties.", instance=None ) ]
The Data Package standard includes properties that “MUST” and “SHOULD” be included and/or have a specific format in a compliant Data Package. By default, check() only includes “MUST” checks. To include “SHOULD” checks, set the strict argument to True in the Config object.
For example, the name field of a Data Package “SHOULD” not contain special characters. So running check() in strict mode (strict=True) on the following properties would output an Issue:
[ Issue( jsonpath='$.name', type='pattern', message="'data-package!@#' does not match '^[a-z0-9._-]+$'", instance='data-package!@#' ) ]