Ticket #835 (closed clarification: fixed)

Opened 10 months ago

Last modified 6 months ago

how can i run pmc2obj.xslt from the command line?

Reported by: russ Assigned to: russ
Priority: medium Milestone:
Component: ambra Version: 0.8.2.1
Keywords: ingest, stylesheet Cc:
Blocking: Blocked By:

Description

we have an article that's failing ingest for unknown reasons with pmc2obj.xslt

i'd like to run the transform my hand from the command line to try and get more info about what data is problematic in article.xml, however i get a ton of compilation errors running with xslt

i unjarred WEB-INF/lib/article-util-0.8.2.1_rc1.jar and then ran xsltproc org/plos/article/util/pmc2obj.xslt pcbi.1000034.xml

Dependency Graph

Attachments

pcbi.1000034.xml (159.5 kB) - added by russ on 03/10/08 16:08:06.

Change History

03/10/08 16:08:06 changed by russ

  • attachment pcbi.1000034.xml added.

03/10/08 18:55:30 changed by ronald

You have to create an xml doc describing the zip you're ingesting and feed that to pmc2obj.xslt. Furthermore, pmc2obj.xslt is an xslt 2.0 stylesheet, so you need to use an xslt 2.0 processor which xsltproc is not (you can use the saxon jar, though: java -jar saxon-8.7.jar zip.xml pmc2obj.xslt).

Here's an example zip.xml (the crc, size, compressedSize, and time attributes are optional, but you must list all files in the zip by their proper name):

<?xml version="1.0"?>
<ZipInfo name="8.zip">
<ZipEntry name="pone.0000008.g001.tif" crc="1594473181" size="95192" compressedSize="87028" time="1158217234000"/>
<ZipEntry name="pone.0000008.g002.tif" crc="3425310675" size="9977736" compressedSize="4328311" time="1158344256000"/>
<ZipEntry name="pone.0000008.pdf" crc="2715934205" size="285091" compressedSize="263041" time="1158602538000"/>
<ZipEntry name="pone.0000008.s001.doc" crc="4126565061" size="57344" compressedSize="5554" time="1161000346000"/>
<ZipEntry name="pone.0000008.s002.doc" crc="1118072849" size="40960" compressedSize="4322" time="1161000346000"/>
<ZipEntry name="pone.0000008.t001.tif" crc="101364059" size="4657900" compressedSize="73624" time="1161012866000"/>
<ZipEntry name="pone.0000008.t002.tif" crc="2816243467" size="2190884" compressedSize="87337" time="1161012992000"/>
<ZipEntry name="pone.0000008.t003.tif" crc="3195878945" size="8058896" compressedSize="144230" time="1161013084000"/>
<ZipEntry name="pone.0000008.t004.tif" crc="1000231337" size="10617808" compressedSize="216749" time="1161013220000"/>
<ZipEntry name="pone.0000008.xml" crc="4019816705" size="93400" compressedSize="20896" time="1161011776000"/>
</ZipInfo>

03/11/08 11:55:15 changed by jsuttor

(In [4953]) addresses #835: how can i run pmc2obj.xslt from the command line?

# validateArticleZip.groovy -help usage: validateArticleZip [-c config-overrides.xml] [-t file:/tmp] -f

file:/tmp/article.zip

-f file:/tmp/article.zip - article.zip -t file:/tmp - location of tmp dir -c config-overrides.xml - overrides /etc/topaz.xml -h,--help help (this message)

mimics actual webapp Ingest logic right up to start of transaction

  • same stylesheets
  • same transformer/settings/etc.
  • same Source/Result
  • etc.

03/11/08 11:56:54 changed by jsuttor

  • keywords set to ingest, stylesheet.
  • owner changed from jsuttor to russ.

assigned to Russ to ask for more or close.

03/12/08 11:40:17 changed by russ

(In [4962]) tmp dir path for transformed xml does not add trailing slash if missing. changing defaults and usage string. addresses #835

03/12/08 13:33:00 changed by russ

  • status changed from new to closed.
  • resolution set to fixed.

i discovered that the issues reported with the article in question were a red herring, so closing this.

tested validateArticleZip and it works! the next time we get an article with real pmc2obj.xslt issues, i'll test it to see if we get more verbose error messages from the command line...

04/02/08 16:20:10 changed by alex

(In [5265]) Merged revisions 4953 via svnmerge from http://gandalf.topazproject.org/svn/branches/0.8.2.2

........

r4953 | jsuttor | 2008-03-11 11:55:15 -0700 (Tue, 11 Mar 2008) | 16 lines

addresses #835: how can i run pmc2obj.xslt from the command line?

# validateArticleZip.groovy -help usage: validateArticleZip [-c config-overrides.xml] [-t file:/tmp] -f

file:/tmp/article.zip

-f file:/tmp/article.zip - article.zip -t file:/tmp - location of tmp dir -c config-overrides.xml - overrides /etc/topaz.xml -h,--help help (this message)

mimics actual webapp Ingest logic right up to start of transaction

  • same stylesheets
  • same transformer/settings/etc.
  • same Source/Result
  • etc.

........

04/02/08 16:27:11 changed by alex

(In [5271]) Merged revisions 4962 via svnmerge from http://gandalf.topazproject.org/svn/branches/0.8.2.2

........

r4962 | russ | 2008-03-12 11:40:17 -0700 (Wed, 12 Mar 2008) | 1 line

tmp dir path for transformed xml does not add trailing slash if missing. changing defaults and usage string. addresses #835

........

07/16/08 11:01:33 changed by

  • milestone deleted.

Milestone pubApp_0.8.2.2 deleted