Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include "simple data types" from NIST Metaschema #107

Open
ronaldtse opened this issue Oct 19, 2024 · 8 comments
Open

Include "simple data types" from NIST Metaschema #107

ronaldtse opened this issue Oct 19, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

We need to implement the "simple data types" defined by NIST Metaschema in lutaml-model:

The simple data types are provided here:

The documentation for these types is at:

This task involves implementing the datatypes, adding specs (just use the samples provided on the page) and documenting them in the README.

We already cover most of these data types, but we need to add a number of them like IPv4, IPv6 (we used to have them but removed).

@ronaldtse ronaldtse added the enhancement New feature or request label Oct 19, 2024
@ronaldtse
Copy link
Contributor Author

This task is needed by:

@opoudjis
Copy link
Contributor

Note that XSD does not define these as primitives, but with regexes. If I were architecting this (which I'm not), I would not be implementing these as primitives in lutaml-model, but rather supporting the way XSD defines types...

@ronaldtse
Copy link
Contributor Author

When you say "XSD" you mean "the metaschema XSD" or "the XSD standard"?

@opoudjis
Copy link
Contributor

I mean the XSD Standard. The NIST metaschema uses:

<xs:simpleType name="IPV4AddressDatatype">
		<xs:annotation>
			<xs:documentation>An Internet Protocol version 4 address represented using
				dotted-quad syntax as defined in section 3.2 of RFC2673.</xs:documentation>
		</xs:annotation>
		<xs:restriction base="StringDatatype">
			<xs:pattern value="((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])" />
		</xs:restriction>
	</xs:simpleType>

	<xs:annotation>
			<xs:documentation>A non-empty string of Unicode characters with leading and trailing whitespace
				disallowed. Whitespace is: U+9, U+10, U+32 or [ \n\t]+</xs:documentation>
		</xs:annotation>
		<xs:restriction base="xs:string">
			<xs:annotation>
				<xs:documentation>The 'string' datatype restricts the XSD type by prohibiting leading 
					and trailing whitespace, and something (not only whitespace) is required.</xs:documentation>
			</xs:annotation>
			<xs:whiteSpace value="preserve" />
			<xs:pattern value="\S(.*\S)?">
				<xs:annotation>
					<xs:documentation>This pattern ensures that leading and trailing whitespace is
						disallowed. This helps to even the user experience between implementations
						related to whitespace.</xs:documentation>
				</xs:annotation>
			</xs:pattern>
		</xs:restriction>
	</xs:simpleType>

so it defines this IPV4AddressDatatype type as a restriction ultimately on xs:string. Maybe Lutaml-model should define the primitives of the XSD standard. (I'm not convinced, but maybe.) But it should certainly not be defining composite types of arbitrary schemas as primitives. It too should be realising XSD restrictions.

Making lutaml-model support XSD and all its baroque and by now legacy functionality—is going to make our life very uncomfortable. We should not be compounding it by hardcoding types from XSD instances.

@ronaldtse
Copy link
Contributor Author

@opoudjis Metaschema data types are actually defined in the Metaschema language itself, so the XSD here is informative being an implementation language of the Metaschema. I would not say that Meteaschema is an "XSD instance" since it does not depend on the existence of XSD.

The point for incorporating Metaschema simple data types into lutaml-model is an alignment between two "information modeling languages".

Defining a regex pattern restriction is acceptable for lutaml-model models, given that we already have enum. However, the patterns like <xs:pattern value="\S(.*\S)?"> are just for stripping whitespaces, which is very weird.

@HassanAkbar
Copy link
Member

@ronaldtse We are now supporting regex patterns for strings in lutaml-model (added in #158). Is there anything else that needs to be done for this ticket?

@ronaldtse
Copy link
Contributor Author

@suleman-uzair do we support xs:pattern this way in the XSD functionality?

@suleman-uzair
Copy link
Member

@ronaldtse, since we are supporting xs:simpleType as a separate class inherited by Lutaml::Model::Type::Value. Currently supported types are the following:

  1. Lengths
    a. Length
    b. MinLength
    c. MaxLength
    d. MinInclusive
    e. MaxInclusive
    f. MinExclusive
    g. MaxExclusive
  2. Enumerations
  3. Patterns

The generated class from an xs:simpleType will have custom methods and conditions based on the provided restrictions.

do we support xs:pattern this way in the XSD functionality?

Yes, We do support xs:pattern.

Below are the classes generated (along with some other classes) from the simpleType examples provided by @opoudjis in the above comment.
This is an example class to complete the given schema.

# frozen_string_literal: true
require "lutaml/model"

class StringDatatype < Lutaml::Model::Type::String
  def self.cast(value)
    return nil if value.nil?

    value = super(value)
    value
  end
end
# frozen_string_literal: true
require "lutaml/model"
require_relative 'string_datatype'

class IPV4AddressDatatype < StringDatatype
  def self.cast(value)
    return nil if value.nil?

    value = super(value)
    pattern = %r{(((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]))}
    raise_pattern_error(value, pattern) unless value.match?(pattern)
    value
  end

  private

  def self.raise_pattern_error(value, pattern)
    raise Lutaml::Model::Type::InvalidValueError, "The value #{value} does not match the required pattern: #{pattern}"
  end
end

# frozen_string_literal: true
require "lutaml/model"

class WhiteSpaces < Lutaml::Model::Type::String
  def self.cast(value)
    return nil if value.nil?

    value = super(value)
    pattern = %r{(\S(.*\S)?)}
    raise_pattern_error(value, pattern) unless value.match?(pattern)
    value
  end

  private

  def self.raise_pattern_error(value, pattern)
    raise Lutaml::Model::Type::InvalidValueError, "The value #{value} does not match the required pattern: #{pattern}"
  end
end

Let me know if we require any changes in this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants