-
Notifications
You must be signed in to change notification settings - Fork 73
Add support for dfdl:lengthKind="endOfParent"
#1652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -269,6 +269,26 @@ trait Term | |
| .getOrElse(false) | ||
| } | ||
|
|
||
| final lazy val immediatelyEnclosingElementParent: Option[ElementBase] = { | ||
| val p = optLexicalParent.flatMap { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that I think lexical parents does not extend past global decls, so if a global decl has endOfParent then I'm not sure we will correctly check EOP restrictions for anything that references that decl. I'm wondering if the checks need to go down instead up? For example, maybe an element needs to check if it has properties that would disallow children with lengthKind EOP and if so check if any children have are EOP? Or check if any immediate children have EOP, and if so then check if they are compatible? |
||
| case e: ElementBase => Some(e) | ||
| case ge: GlobalElementDecl => Some(ge.asRoot) | ||
| case s: SequenceTermBase => s.immediatelyEnclosingElementParent | ||
| case c: ChoiceTermBase => c.immediatelyEnclosingElementParent | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to return the choice in some cases? It looks like the logic for EOP sometimes cares about the choice so I'm not sure we can bypass this? |
||
| case ct: ComplexTypeBase => { | ||
| ct.optLexicalParent.flatMap { | ||
| case e: ElementBase => Some(e) | ||
| case ge: GlobalElementDecl => Some(ge.asRoot) | ||
| case _ => { | ||
| None | ||
| } | ||
| } | ||
| } | ||
| case _ => None | ||
| } | ||
| p | ||
| } | ||
|
|
||
| final lazy val immediatelyEnclosingGroupDef: Option[GroupDefLike] = { | ||
| optLexicalParent.flatMap { lexicalParent => | ||
| val res: Option[GroupDefLike] = lexicalParent match { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -456,7 +456,11 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| } | ||
| case LengthKind.Delimited => encodingLengthApprox | ||
| case LengthKind.Pattern => encodingLengthApprox | ||
| case LengthKind.EndOfParent => LengthMultipleOf(1) // NYI | ||
| case LengthKind.EndOfParent => | ||
| eb.immediatelyEnclosingElementParent match { | ||
| case Some(parent) => parent.elementSpecifiedLengthApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is quite right. The length of this element isn't the same as the parent length, it's whatever is left over of the parent after the previous siblings. So this elements length is kindof That said, I wonder if we don't really need to get this elements approx length perfect, because no elements come after it, and the endingAlignApprox of the parent won't need this specific is going to be known since it has an explicit length? Maybe this just becomes LengthMultipleOf 1 or 8 depending on length units? This might need some more thought...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm Length Units don't apply for endOfParent, so we might be fine leaving it as LengthMultipleOf(1)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then if its parent has siblings, does that make an impact on us not approximating?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think you're right that we can just say length approx of an EOP element is multiple of 1. In the case of a parent of an EOP element having siblings, I think the parent will calculate contentEndAlignment to figure out where it ends to figure out what alignment is needed for its sibling. And we have this code that handles that: case eb: ElementBase => {
val res = if (eb.isComplexType && eb.lengthKind == LengthKind.Implicit) {
eb.complexType.group.contentEndAlignment
} else {
// simple type or complex type with specified length
contentStartAlignment + elementSpecifiedLengthApprox
}
res
}I don't think implicit is allowed to a be a parent of an EOP child, so we would fall into the And I think So I think we are okay and will correctly detect any needed alignment? |
||
| case _ => LengthMultipleOf(1) | ||
| } | ||
| // If an element is lengthKind="prefixed", the element's length is the length | ||
| // of the value of the prefix element, which can't be known till runtime | ||
| case LengthKind.Prefixed => LengthMultipleOf(1) // NYI (see DAFFODIL-3066) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,10 +19,14 @@ package org.apache.daffodil.core.grammar | |
|
|
||
| import java.lang.Long as JLong | ||
|
|
||
| import org.apache.daffodil.core.dsom.ChoiceTermBase | ||
| import org.apache.daffodil.core.dsom.ElementBase | ||
| import org.apache.daffodil.core.dsom.ExpressionCompilers | ||
| import org.apache.daffodil.core.dsom.InitiatedTerminatedMixin | ||
| import org.apache.daffodil.core.dsom.ModelGroup | ||
| import org.apache.daffodil.core.dsom.PrefixLengthQuasiElementDecl | ||
| import org.apache.daffodil.core.dsom.Root | ||
| import org.apache.daffodil.core.dsom.SequenceTermBase | ||
| import org.apache.daffodil.core.grammar.primitives.* | ||
| import org.apache.daffodil.core.runtime1.ElementBaseRuntime1Mixin | ||
| import org.apache.daffodil.lib.exceptions.Assert | ||
|
|
@@ -52,6 +56,7 @@ trait ElementBaseGrammarMixin | |
|
|
||
| requiredEvaluationsIfActivated(checkPrefixedLengthElementDecl) | ||
| requiredEvaluationsIfActivated(checkDelimitedLengthEVDP) | ||
| requiredEvaluationsIfActivated(checkEndOfParentElem) | ||
|
|
||
| private val context = this | ||
|
|
||
|
|
@@ -252,6 +257,134 @@ trait ElementBaseGrammarMixin | |
| } | ||
| final lazy val prefixedLengthBody = prefixedLengthElementDecl.parsedValue | ||
|
|
||
| final lazy val parentEffectiveLengthUnits: LengthUnits = | ||
| immediatelyEnclosingElementParent match { | ||
| case Some(parent: ElementBase) => { | ||
| parent.lengthKind match { | ||
| case LengthKind.Explicit | LengthKind.Prefixed => parent.lengthUnits | ||
| case LengthKind.Pattern => LengthUnits.Characters | ||
| case _ | ||
| if parent.isInstanceOf[ChoiceTermBase] && (parent | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a choice might not necessarily be a direct parent? It's not uncommon for a choice branch to be a sequence (or a nested sequences). I think we need to look up the parent until we find a choice or an element. |
||
| .asInstanceOf[ChoiceTermBase] | ||
| .choiceLengthKind == ChoiceLengthKind.Explicit) => | ||
| LengthUnits.Bytes | ||
| case LengthKind.EndOfParent => parent.parentEffectiveLengthUnits | ||
| case _ => | ||
| Assert.invariantFailed( | ||
| s"Could not figure effective length unit of parents of ${context}" | ||
| ) | ||
| } | ||
| } | ||
| case None if this.isInstanceOf[Root] => LengthUnits.Characters | ||
| case _ => | ||
| Assert.invariantFailed( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like this invariant might break if we have something like a global element decl with a child with EOP. That EOP will want to reach up to find where it's used but wont' be able to find a parent because it only looks lexically.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean a child that's an element reference? Do we have any other way to look at a parent that's not optLexicalParent? Even immediatelyEnclosingGroupDef uses optLexicalParent.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the below test and it works as expected where LK is EOP
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might I'm just forgetting how optLexicalParent works. Refreshing my memory, it looks like the But we also have an "ElementRef" DSOM object that represents the local element ref to that global definition. And the "ElementRef" is what is in the DSOM tree. So as long as these functions are run within the context of ElementRef then maybe these invariants hold. But I think maybe things get tricky if we try to recursively look up multiple parents though? For example, if we are in the context of the Similarly, say we have a group like this: <group name="foo">
<sequence>
<element name="someEopElement" ... />
</sequence>
</group>Asking for the optLexicalParent of And that makes sense because multiple different things could reference the group, so we don't really know which parent to examine. So I'm not really sure how recursively looking up parents can work. The recrusion essentially ends at the global definition. It works fine if the thing you are looking for is within the scope of of your global element (e.g. Element Ref is always inside an Element, but I think it breaks once groups get involved. Unless those are handelded somehow else, and maybe I'm just looking at the wrongs spot when I'm inspeting values of optLexicalparent? |
||
| s"Could not figure effective length unit of parents of ${context}" | ||
| ) | ||
| } | ||
| final lazy val checkEndOfParentElem: Unit = { | ||
| if (lengthKind != LengthKind.EndOfParent) () | ||
| else { | ||
| schemaDefinitionWhen( | ||
| hasTerminator, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but specifies a dfdl:terminator.", | ||
| context | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to include context in the error string. I believe the error context is capture and output as part of the SDE. |
||
| ) | ||
| schemaDefinitionWhen( | ||
| trailingSkip != 0, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but specifies a non-zero dfdl:trailingSkip.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| maxOccurs > 1, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but specifies a maxOccurs greater than 1.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| nextSibling.isDefined && nextSibling.get.isInstanceOf[ModelGroup], | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but a model group is defined between this element and the end of the enclosing component", | ||
| context | ||
| ) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think nextSibling is lexical, so I don't think it will detect errors if this is a global element decl that is referenced in a manner where it has siblings? Or maybe the context will be the element reference and the that global decl so it will work? Do we ahve tests forthis?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea the context ends up being the ElementRef which seems to accurately give the next Sibling. I added tests with and without siblings to confirm
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we hae an example like this: <group name="foo">
<sequence>
<element name="eopElement" ... />
</sequence>
</group>
<element name="bar">
<complexType>
<sequence>
<group ref="foo" />
<element name="laterSibling" ... />
</sequence>
</complexType>
</element>In this case there effectively a sibling after eopELement, but I'm not sure we would detect that since I'm not sure optLeixcalParent sees past the globalGroupDef. Though, maybe we have logic to allow group refs to have parents? I seem to remember something where we copy groups, but I might be thinking of something else. |
||
| schemaDefinitionWhen( | ||
| nextSibling.isDefined && nextSibling.get.isRepresented, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but a represented element is defined between this element and the end of the enclosing component", | ||
| context | ||
| ) | ||
| immediatelyEnclosingElementParent match { | ||
| case Some(parent: ElementBase) => | ||
| parent.lengthKind match { | ||
| case LengthKind.Implicit | LengthKind.Delimited => | ||
| schemaDefinitionError( | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but its parent is an element with dfdl:lengthKind 'implicit' or 'delimited'.", | ||
| context | ||
| ) | ||
| case _ => // do nothing | ||
| } | ||
| case _ => // do nothing | ||
| } | ||
| schemaDefinitionWhen( | ||
| representation == Representation.Text && knownEncodingWidthInBits != 8 && parentEffectiveLengthUnits != LengthUnits.Characters, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did some testing and change the CVS schema to this: And a got this stack trace: I think the issue is that the default encoding used by csv is UTF-8, which seems to cause problems here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not able to replicate this. And I think CSV's default encoding is ASCII?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like it uses the https://github.com/DFDLSchemas/CSV/blob/master/src/csv-base-format.dfdl.xsd#L48 That's a relatively new change to CSV, maybe you're using an older version?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the issue is specifically due to the use of dfdl:encoding var..replacing with UTF-8 works as expected. Will investigate <xs:element name="file" dfdl:lengthKind="explicit" dfdl:length="10" dfdl:terminator="%NL;" dfdl:encoding="{$dfdl:encoding}">
<xs:complexType>
<xs:sequence>
<xs:element name="field" type="xs:string" dfdl:lengthKind="endOfParent" dfdl:encoding="{$dfdl:encoding}" />
</xs:sequence>
</xs:complexType>
</xs:element> |
||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but the element has text representation, does not have a single-byte character set encoding, and the effective length units of the parent is not 'characters'.", | ||
| context | ||
| ) | ||
|
|
||
| immediatelyEnclosingModelGroup match { | ||
| case Some(s: SequenceTermBase) => { | ||
| schemaDefinitionWhen( | ||
| s.separatorPosition == SeparatorPosition.Postfix, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a sequence with dfdl:separatorPosition defined as 'postfix'.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| s.sequenceKind != SequenceKind.Ordered, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a sequence with dfdl:sequenceKind defined as 'unordered'.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| s.hasTerminator, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a sequence with a dfdl:terminator.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| s.elementChildren.exists(e => e.floating == YesNo.Yes), | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a sequence with elements defining dfdl:floating='yes'.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| s.trailingSkip != 0, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a sequence with a non-zero dfdl:trailingSkip." | ||
| ) | ||
| } | ||
| case Some(c: ChoiceTermBase) if c.choiceLengthKind == ChoiceLengthKind.Implicit => { | ||
| schemaDefinitionWhen( | ||
| c.hasTerminator, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a choice with a dfdl:terminator.", | ||
| context | ||
| ) | ||
| schemaDefinitionWhen( | ||
| c.trailingSkip != 0, | ||
| "%s is specified as dfdl:lengthKind=\"endOfParent\", but is in a choice with a non-zero dfdl:trailingSkip.", | ||
| context | ||
| ) | ||
| } | ||
| case _ => // do nothing | ||
| } | ||
|
|
||
| if (isSimpleType) { | ||
| schemaDefinitionUnless( | ||
| (primType eq PrimType.String) | ||
| || (representation == Representation.Text) | ||
| || (primType eq PrimType.HexBinary) | ||
| || (representation == Representation.Binary | ||
| && Seq(BinaryNumberRep.Packed, BinaryNumberRep.Bcd, BinaryNumberRep.Ibm4690Packed) | ||
| .contains(binaryNumberRep)), | ||
| "%s is a simple type specified as dfdl:lengthKind=\"endOfParent\", but isn't a string type, doesn't have text representation, isn't a hexbinary type, or doesn't have binary representation with packed decimal representation.", | ||
| context | ||
| ) | ||
| } | ||
|
|
||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Quite tricky when we add padding or fill | ||
| * | ||
|
|
@@ -648,10 +781,7 @@ trait ElementBaseGrammarMixin | |
| Assert.invariant(pt == PrimType.String) | ||
| StringOfSpecifiedLength(this) | ||
| } | ||
| case LengthKind.EndOfParent if isComplexType => | ||
| notYetImplemented("lengthKind='endOfParent' for complex type") | ||
| case LengthKind.EndOfParent => | ||
| notYetImplemented("lengthKind='endOfParent' for simple type") | ||
| case LengthKind.EndOfParent => StringOfSpecifiedLength(this) | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -667,6 +797,10 @@ trait ElementBaseGrammarMixin | |
| new HexBinaryLengthPrefixed(this) | ||
| } | ||
|
|
||
| private lazy val hexBinaryLengthEndOfParent = prod("hexBinaryLengthEndOfParent") { | ||
| new HexBinaryLengthEndOfParent(this) | ||
| } | ||
|
|
||
| private lazy val hexBinaryValue = prod("hexBinaryValue") { | ||
| schemaDefinitionWhen( | ||
| lengthUnits == LengthUnits.Characters, | ||
|
|
@@ -678,10 +812,7 @@ trait ElementBaseGrammarMixin | |
| case LengthKind.Delimited => hexBinaryDelimitedEndOfData | ||
| case LengthKind.Pattern => hexBinaryLengthPattern | ||
| case LengthKind.Prefixed => hexBinaryLengthPrefixed | ||
| case LengthKind.EndOfParent if isComplexType => | ||
| notYetImplemented("lengthKind='endOfParent' for complex type") | ||
| case LengthKind.EndOfParent => | ||
| notYetImplemented("lengthKind='endOfParent' for simple type") | ||
| case LengthKind.EndOfParent => hexBinaryLengthEndOfParent | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -1280,10 +1411,7 @@ trait ElementBaseGrammarMixin | |
| case LengthKind.Implicit => | ||
| LiteralValueNilOfSpecifiedLength(this) | ||
| case LengthKind.Prefixed => LiteralValueNilOfSpecifiedLength(this) | ||
| case LengthKind.EndOfParent if isComplexType => | ||
| notYetImplemented("lengthKind='endOfParent' for complex type") | ||
| case LengthKind.EndOfParent => | ||
| notYetImplemented("lengthKind='endOfParent' for simple type") | ||
| case LengthKind.EndOfParent => LiteralValueNilOfSpecifiedLength(this) | ||
| } | ||
| } | ||
| case NilKind.LiteralCharacter => { | ||
|
|
@@ -1396,10 +1524,7 @@ trait ElementBaseGrammarMixin | |
| case LengthKind.Implicit | ||
| if isSimpleType && impliedRepresentation == Representation.Binary => | ||
| new SpecifiedLengthImplicit(this, body, implicitBinaryLengthInBits) | ||
| case LengthKind.EndOfParent if isComplexType => | ||
| notYetImplemented("lengthKind='endOfParent' for complex type") | ||
| case LengthKind.EndOfParent => | ||
| notYetImplemented("lengthKind='endOfParent' for simple type") | ||
| case LengthKind.EndOfParent => new SpecifiedLengthEndOfParent(this, body) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is is only needed for complex types? My thinking is that simple types with lengthKind EOP should have parent parser (ether complex or explicity length choice) that already set the bit limit via one of these specified length parsers. So the bit limit has already been set correctly and we don't need another parser to do that. But this is needed for complex types with EOP since they need to skip any bits up to their parents bit limit that thier children might not have consumed. So I think this wants to be: case LengthKind.EndOfParent if isComplexType => new SpecifiedLengthEndOfParent(this, body)And then
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm what if the simpleType is a root element with lengthKind EOP (where the user intends it to go to the end of the datastream)?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. So maybe it instead wants to be something like this? case LengthKind.EndOfParent => {
Assert.invariant(isComplexType || isRoot)
new SpecifiedLengthEndOfParent(this, body)
} |
||
| case LengthKind.Delimited | LengthKind.Implicit => | ||
| Assert.impossibleCase( | ||
| "Delimited and ComplexType Implicit cases should not be reached" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -72,6 +72,20 @@ final class SpecifiedLengthExplicitImplicitUnparser( | |
| } | ||
| } | ||
|
|
||
| final class SpecifiedLengthEndOfParentUnparser( | ||
| eUnparser: Unparser, | ||
| erd: ElementRuntimeData | ||
| ) extends CombinatorUnparser(erd) { | ||
|
|
||
| override def runtimeDependencies = Vector() | ||
|
|
||
| override def childProcessors = Vector(eUnparser) | ||
|
|
||
| override final def unparse(state: UState): Unit = { | ||
| eUnparser.unparse1(state) | ||
| } | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this unparser doesn't do anything I would suggest we shouldn't even have it and the SpecifiedLengthEndOfParent primitive just wants to return eUnparser. It looks like the pattern parser does something similar, for example. |
||
|
|
||
| /** | ||
| * This trait is to be used with prefixed length unparsers where the length | ||
| * must be calculated based on the content length of the data. This means the | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit confusing to me. This function is supposed to return whether or not we statically know if this element must have non-zero length. I imagine we can rarely statically know that for endOfParent elements, so I think returning false here is correct. But the comment kindof makes it sound like the length is always zero, which kindof contradicts that.
Reading this portion of the spec (which this comment copies), I think the spec is talking about the runtime evaluation of whether or not a field is zero length. I believe the spec is just saying that that an endOfParent element has zero length representation if it is already at the parents end (i.e. bitLimit == bitPosition). Since this is more about runtime, I'm not sure this comment belongs here and might avoid that confusion.