Ticket #330 (closed clarification: fixed)

Opened 2 years ago

Last modified 1 year ago

xslt basic training for n00bs

Reported by: russ Assigned to: russ
Priority: medium Milestone:
Component: topaz Version:
Keywords: Cc:
Blocking: Blocked By:

Description (Last modified by russ)

please help us use XSLT to split the author name into given name and surname using whitespace.

Dependency Graph

Change History

(follow-up: ↓ 2 ) 04/26/07 17:27:10 changed by ronald

Do they require first and last, or given and surname? There's a difference...

(in reply to: ↑ 1 ; follow-up: ↓ 4 ) 05/01/07 13:04:49 changed by russ

Replying to ronald:

Do they require first and last, or given and surname? There's a difference...

the snippet i provided is taken from their examples, so yes, they are asking for given and surname.

however, i can handle transformations in XSLT. i'm hoping that we are currently keeping more than one field for author name (given-surname, or first-middle-last?) that we can output using the RSS tool and i can mix and match if necessary. i think it makes sense for the native output of the RSS tool to match the internal representation of data with XSLT to transform.

if it's the case that we have a single name field (say it isn't so!) then i suppose i can do it all in XSLT, but i'm hoping that's not the case...

05/01/07 13:06:58 changed by russ

  • summary changed from break up author name into first, last, suffix in rss tool to break up author name into give, surname, suffix in rss tool.

(in reply to: ↑ 2 ) 05/01/07 16:18:46 changed by ronald

Replying to russ:

however, i can handle transformations in XSLT. i'm hoping that we are currently keeping more than one field for author name (given-surname, or first-middle-last?) that we can output using the RSS tool and i can mix and match if necessary. i think it makes sense for the native output of the RSS tool to match the internal representation of data with XSLT to transform.

if it's the case that we have a single name field (say it isn't so!) then i suppose i can do it all in XSLT, but i'm hoping that's not the case...

Currently the author names are extracted and stored as a single string in RDF; and currently the rss tool does not grab the full articles but relies on what's in RDF. So currently all you get is the formatted name. I.e. things suck right now.

If you want to try and parse the name in XSLT, then I suggest the following (this may get it wrong in some cases, but will probably be right in 98% of the cases right now):

<xsl:variable name="lastSpace">
  <xsl:call-template name="findLastSpace">
    <xsl:with-param name="str" select="$name"/>
    <xsl:with-param name="pos" select="0"/>
  </xsl:call-template>
</xsl:variable>
<xsl:variable name="givenName" select="substring($name, 1, $lastSpace - 1)"/>
<xsl:variable name="surname" select="substring($name, $lastSpace + 1)"/>

<xsl:template name="findLastSpace">
  <xsl:param name="str"/>
  <xsl:param name="pos"/>

  <xsl:variable name="rest" select="substring-after($str, ' ')"/>
  <xsl:if test="string-length($rest) &gt; 0">
    <xsl:call-template name="findLastSpace">
      <xsl:with-param name="str" select="$rest"/>
      <xsl:with-param name="pos" select="$pos + string-length(substring-before($str, ' ')) + 1"/>
    </xsl:call-template>
  </xsl:if>
  <xsl:if test="string-length($rest) = 0">
    <xsl:value-of select="$pos"/>
  </xsl:if>
</xsl:template>

Well, after writing this, I realized that you can actually use XSLT 2.0 here, in which case it becomes much simpler:

<xsl:variable name="givenName" select="replace($name, '(.*) .*', '$1')"/>
<xsl:variable name="surname" select="replace($name, '.* ', '')"/>

05/03/07 11:42:41 changed by russ

  • owner changed from somebody to russ.

awesome. thanks for the regex suggestions. i was thinking something along the same lines, and i'll take this ticket.

so, i know that in the article XML we have <author><given</given><surname></surname>

are we ignoring that information on ingest? should we be thinking about changing this before we do the big re-ingest of every article? (is that still even happening?)

05/21/07 14:20:10 changed by russ

it looks like 2.0 is not supported by the RSS tool (or perhaps by my operating system? maybe there's a way to install 2.0 compatible libraries?)

Error on line 1 column 20 of file:///var/local/xslt/PLoSONE.Google.xsl:
  SXXP0003: Error reported by XML parser: XML version "2.0" is not supported, only XML 1.0 is supported.

05/29/07 17:07:23 changed by amit

Looks like you entered 2.0 for the *version* of the XML file. Look at line 1 of your XSLT...that should be 1.0 for XML (2.0 is for the XSLT standard)

06/04/07 13:49:38 changed by russ

yes, at that time i was using 2.0. then i reverted to 1.0. and i works fine although i now get another interesting warning:

[ruman@plosfail01 ~]$ /usr/local/topaz/bin/rss -baseURL http://plostopaz01.localdomain -startDate 2006-12-01T00:00:00 -endDate 2006-12-31T23:59:59 -xslt /var/local/xslt/PLoSONE.Google.xsl -out 200612.xml
log4j:WARN No appenders could be found for logger (org.apache.axis.i18n.ProjectResourceBundle).
log4j:WARN Please initialize the log4j system properly.
Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor

the real question, which is totally off topic for this ticket and it's all my fault, is what do i need to do to get 2.0 working on my server?! 2.0 sucks much less than 1.0. please ignore, i'll figure it out...

06/04/07 13:51:36 changed by russ

  • owner changed from russ to jsuttor.
  • priority changed from high to medium.

the original summary and description are still valid. the entire discussion so far on this ticket, regarding work arounds and xslt 1.0 vs 2.0 is irrelevant i think :)

06/04/07 22:54:47 changed by amit

The piece of code Ronald gave will work only in XSLT 2.0 processors I believe. The warning you are getting says that the version of XSLT is 1.0. This is different from the XML version.

06/04/07 23:02:05 changed by amit

This is what the two requisite lines should look like:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

Note: the first one is XML version which should be 1.0 and the second that is 2.0 is for XSLT.

06/04/07 23:02:28 changed by amit

  • owner changed from jsuttor to russ.

06/05/07 11:54:46 changed by russ

  • status changed from new to assigned.
  • type changed from enhancement to clarification.
  • description changed.
  • summary changed from break up author name into give, surname, suffix in rss tool to xslt basic training for n00bs.

thanks amit, i'm a n00b. i'll give it a try.

so, there's a real issue which this ticket was supposed to address, to have the given name and surname in the article xml feed.

since i've had no luck getting this ticket to address that, i'm going to update summary and description, and will open a new ticket for given name, surname, and suffix.

06/19/07 15:26:36 changed by russ

  • status changed from assigned to closed.
  • resolution set to fixed.

changing the xslt version fixes the error.

08/07/07 16:25:51 changed by

  • milestone deleted.

Milestone Bugs deleted