Discussion:
[saxon] what causes the difference
Rolf Schumacher
2017-04-07 10:20:45 UTC
Permalink
I am using Saxon-HE-9.7.0-15.jar and I am about to create a
transformation in order to anonymize the input.

As a first step I was looking for all distinct words in the input and
came across a behavior that I do not comprehend.

I was not sure whether it speeds up to use mode keyword with templates
or not and came across a result that puzzles me.

I boiled it down to this transformation rules:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="fn"
exclude-result-prefixes="xs fn">

<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
<xsl:variable name="allwords" as="xs:string+">
<xsl:apply-templates select="*" mode="lookup"/>
</xsl:variable>
<xsl:variable name="words" select="distinct-values($allwords)"/>
<root>
<xsl:attribute name="allwords" select="count($allwords)"/>
<xsl:attribute name="words" select="count($words)"/>
</root>
</xsl:template>

<xsl:template match="*">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>

<xsl:template match="*" mode="lookup">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>

</xsl:stylesheet>

For a certain input (~30MB) this led to the result:

<?xml version="1.0" encoding="UTF-8"?><root allwords="696831" words="7617"/>

However, commenting the second template out, I get a different result
from the very same input:

<?xml version="1.0" encoding="UTF-8"?><root allwords="531375" words="7620"/>

To make it very clear, here are the transformation rules for the second
results:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="fn"
exclude-result-prefixes="xs fn">

<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
<xsl:variable name="allwords" as="xs:string+">
<xsl:apply-templates select="*" mode="lookup"/>
</xsl:variable>
<xsl:variable name="words" select="distinct-values($allwords)"/>
<root>
<xsl:attribute name="allwords" select="count($allwords)"/>
<xsl:attribute name="words" select="count($words)"/>
</root>
</xsl:template>

<!-- <xsl:template match="*"> -->
<!-- <xsl:value-of
select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" /> -->
<!-- <xsl:apply-templates select="*" /> -->
<!-- </xsl:template> -->

<xsl:template match="*" mode="lookup">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>

</xsl:stylesheet>

Question: what is the semantic difference between the two transformation
rules that could explain the difference in the result?

Kind Regards


Rolf
Rolf Schumacher
2017-04-07 11:26:02 UTC
Permalink
Sorry, at the moment I released the mail I saw the difference:
mode="lookup" is missing in the last template.

---
Viele Grüße, Best Regards

Rolf Schumacher
Post by Rolf Schumacher
I am using Saxon-HE-9.7.0-15.jar and I am about to create a
transformation in order to anonymize the input.
As a first step I was looking for all distinct words in the input and
came across a behavior that I do not comprehend.
I was not sure whether it speeds up to use mode keyword with templates
or not and came across a result that puzzles me.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="fn"
exclude-result-prefixes="xs fn">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:variable name="allwords" as="xs:string+">
<xsl:apply-templates select="*" mode="lookup"/>
</xsl:variable>
<xsl:variable name="words" select="distinct-values($allwords)"/>
<root>
<xsl:attribute name="allwords" select="count($allwords)"/>
<xsl:attribute name="words" select="count($words)"/>
</root>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>
<xsl:template match="*" mode="lookup">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?><root allwords="696831"
words="7617"/>
However, commenting the second template out, I get a different result
<?xml version="1.0" encoding="UTF-8"?><root allwords="531375"
words="7620"/>
To make it very clear, here are the transformation rules for the second
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="fn"
exclude-result-prefixes="xs fn">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:variable name="allwords" as="xs:string+">
<xsl:apply-templates select="*" mode="lookup"/>
</xsl:variable>
<xsl:variable name="words" select="distinct-values($allwords)"/>
<root>
<xsl:attribute name="allwords" select="count($allwords)"/>
<xsl:attribute name="words" select="count($words)"/>
</root>
</xsl:template>
<!-- <xsl:template match="*"> -->
<!-- <xsl:value-of
select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" /> -->
<!-- <xsl:apply-templates select="*" /> -->
<!-- </xsl:template> -->
<xsl:template match="*" mode="lookup">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
Question: what is the semantic difference between the two
transformation
rules that could explain the difference in the result?
Kind Regards
Rolf
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
https://lists.sourceforge.net/lists/listinfo/saxon-help
Martin Honnen
2017-04-07 12:56:44 UTC
Permalink
Post by Rolf Schumacher
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="fn"
exclude-result-prefixes="xs fn">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:variable name="allwords" as="xs:string+">
<xsl:apply-templates select="*" mode="lookup"/>
</xsl:variable>
<xsl:variable name="words" select="distinct-values($allwords)"/>
<root>
<xsl:attribute name="allwords" select="count($allwords)"/>
<xsl:attribute name="words" select="count($words)"/>
</root>
</xsl:template>
<xsl:template match="*">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
</xsl:template>
<xsl:template match="*" mode="lookup">
<xsl:value-of select="tokenize(text(),'[^A-Za-z0-9äöüßÄÖÜ]+')" />
<xsl:apply-templates select="*" />
I think here you want to continue to use the mode 'lookup', no? So
change that to
<xsl:apply-templates select="*" mode="#current"/>
or
<xsl:apply-templates select="*" mode="lookup"/>

Loading...