Ticket #652 (closed clarification: fixed)

Opened 1 year ago

Last modified 6 months ago

ingest errors on 0.8 for pone.0000949

Reported by: russ Assigned to: russ
Priority: critical Milestone:
Component: ambra Version: 0.8
Keywords: Cc:
Blocking: Blocked By:

Description

Error ingesting: pone.0000949.zip - org.topazproject.otm.OtmException?: Too many values for 'private int org.plos.models.DublinCore?.copyrightYear'

2007-09-20 12:53:16,240 ERROR IngestArchivesAction(PLoSONE)> Error ingesting articles: pone.0000949.zip [TP-Processor3 org.plos.ad\
min.action.IngestArchivesAction]
org.plos.admin.service.ImageResizeException: org.topazproject.otm.OtmException: Too many values for 'private int org.plos.models.D\
ublinCore.copyrightYear'
...
Caused by: org.topazproject.otm.OtmException: Too many values for 'private int org.plos.models.DublinCore.copyrightYear'
    topazproject.otm.mapping.FunctionalMapper.set(FunctionalMapper.java:78)

this article was left in an inconsistent state - there's a publishable link, but the zip is still in the ingestion queue

i only see one instance of <copyright-year> in the xml and it doesn't look suspicious to me:

<copyright-year>2007</copyright-year>

you can find these in /var/spool/plosone on plosstage01

Dependency Graph

Change History

09/20/07 13:18:18 changed by amit

  • owner changed from jsuttor to pradeep.
  • milestone set to 0.8.

I think the database is messed up. Pradeep?

09/20/07 13:21:41 changed by russ

hmm. nothing unusual happened to the mulgara db on stage - it's never been backed up and restored like the live database - just a vanilla upgrade/migrate/create journal (of course, i'm sure that could still cause corruption :)

09/20/07 13:24:03 changed by amit

It could be:

a) The '()' problem we have seen (ITQL grammer problem) b) Article delete left some stuff back and a reingest caused creation of duplication predicate.

or something else.

(follow-up: ↓ 5 ) 09/20/07 13:25:31 changed by amit

Pradeep, can you check that article delete is getting rid of some of the new predicates?

(in reply to: ↑ 4 ) 09/20/07 14:01:45 changed by pradeep

  • owner changed from pradeep to russ.

Replying to amit:

Pradeep, can you check that article delete is getting rid of some of the new predicates?

It should. This is all done generically by OTM.

The problem may have been at insert/update time. Somehow an additional triple was inserted. It quite likely came from the xml or the mulgara database already contained a statement with a different value. The only way to recover now is to:

1. Delete the statement manually from mulgara using an Itql command interpreter 2. Delete the article from admin window 3. Re ingest.

If none of those work, I'll try ingesting the same article here and see what happens.

09/20/07 14:38:13 changed by russ

it's possible that these articles were ingested then deleted on stage prior to 0.8 upgrade (i asked bonnie to clear out the publishable queue).

i'll delete and then see what i can find left over in mulgara to delete...

09/20/07 14:38:54 changed by russ

mmm. same error on delete. will delete manually...

09/20/07 14:51:37 changed by amit

I just ingested a Pone article and deleted it using the article_delete script. I took a mulgara dump before and after to do a comparison and there were no differences. While a small test, it at least means the article_delete is (in general) not leaving a predicate hanging around. Not that I was expecting this (it is the same code), but just in case this was a corner condition. I susect it has to do with rollbacks during ingestion not cleaning up properly as that is not using OTM. Can you provide the two articles to us on gandalf so we can test them?

09/20/07 14:54:43 changed by amit

Russ, you will get the same error now with any OTM call to that article. You have to bypass OTM to get rid of this and this means manually. Please do give the two offending articles though.

(follow-up: ↓ 12 ) 09/20/07 14:58:08 changed by russ

sorry, i can't seem to figure out the correct itql statement to delete the left over bits...can someone give me a clue?

itql> delete select $s $p $o from <local:///topazproject#ri> where $s $p $o and $s $p <info:doi/10.1371/journal.pone.0000949>; Error running query: org.mulgara.itql.parser.ParserException?: [0,0] expecting: 'from'

09/20/07 14:58:47 changed by russ

you can grab the article from /var/spool/plosone on plosstage01...i have my hands too full with launch issues today to provide excellent customer service, sorry...

(in reply to: ↑ 10 ) 09/20/07 15:24:33 changed by pradeep

Replying to russ:

sorry, i can't seem to figure out the correct itql statement to delete the left over bits...can someone give me a clue? itql> delete select $s $p $o from <local:///topazproject#ri> where $s $p $o and $s $p <info:doi/10.1371/journal.pone.0000949>; Error running query: org.mulgara.itql.parser.ParserException?: [0,0] expecting: 'from'

delete select $s $p $o from <local:///topazproject#ri> where $s $p $o and
$s <mulgara:is> <info:doi/10.1371/journal.pone.0000949> and 
$p <mulgara:is> <http://purl.org/dc/terms/dateCopyrighted> from <local:///topazproject#ri>;

Try first a select to confirm that multiple values exist:

select $o from <local:///topazproject#ri> where 
<info:doi/10.1371/journal.pone.0000949> <http://purl.org/dc/terms/dateCopyrighted> $o;

(follow-up: ↓ 14 ) 09/20/07 15:31:21 changed by amit

I managed to ingest 949 successfully on black.topazproject.org using the latest version from head. I don't think there is anything wrong with this, so it must have been an earlier ingested article.

(in reply to: ↑ 13 ; follow-up: ↓ 16 ) 09/20/07 15:36:19 changed by pradeep

Replying to amit:

I managed to ingest 949 successfully on black.topazproject.org using the latest version from head. I don't think there is anything wrong with this, so it must have been an earlier ingested article.

moody is running 0.8. Can you try it over there too?

(follow-up: ↓ 22 ) 09/20/07 15:42:11 changed by russ

even with plosone stopped and mulgara freshly restarted i get the error "cannot get transaction for commit" with this delete query:

[root@plosstage01 topaz]# bin/runitql
Itql Interpreter. Run ".help" for more information.
itql> delete
itql> select $s $p $o from <local:///topazproject#ri>
itql> where $s $p $o
itql> and $s <mulgara:is> <info:doi/10.1371/journal.pone.0000949>
itql> and $p <mulgara:is> <http://purl.org/dc/terms/dateCopyrighted>
itql> from <local:///topazproject#ri>;
Error running query:
javax.xml.soap.SOAPException: ItqlInterpreter error - org.mulgara.query.QueryException: Unable to modify local:///topazproject#ri
Caused by: (QueryException) Unable to modify local:///topazproject#ri
Caused by: (IllegalStateException) Cannot get Transaction for commit

just the select part works fine, and confirms that there are two entries...

mulgara log is pretty identical:

2007-09-20 15:39:41,081 DEBUG ItqlInterpreterBeanWrapper> executeQueryToString on org.mulgara.itql.ItqlInterpreterBean@3852eef4 failed. [http-9091-Processor25 org.topazproject.mulgara.ws.ItqlInterpreterBeanWrapper]
javax.xml.soap.SOAPException: ItqlInterpreter error - org.mulgara.query.QueryException: Unable to modify local:///topazproject#ri
Caused by: (QueryException) Unable to modify local:///topazproject#ri
Caused by: (IllegalStateException) Cannot get Transaction for commit
   at org.mulgara.itql.ItqlInterpreterBean.execute(ItqlInterpreterBean.java:410)
   at org.mulgara.itql.ItqlInterpreterBean.executeQueryToString(ItqlInterpreterBean.java:444)
   at org.topazproject.mulgara.ws.ItqlInterpreterBeanWrapper.executeQueryToString(ItqlInterpreterBeanWrapper.java:176)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:397)
   at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:186)
   at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:323)
   at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
   at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
   at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
   at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:454)
   at org.apache.axis.server.AxisServer.invoke(AxisServer.java:281)
   at org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:699)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
   at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:327)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
   at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
   at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
   at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
   at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
   at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
   at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
   at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
   at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
   at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
   at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
   at java.lang.Thread.run(Thread.java:595)

(in reply to: ↑ 14 ) 09/20/07 15:43:55 changed by pradeep

Replying to pradeep:

Replying to amit:

I managed to ingest 949 successfully on black.topazproject.org using the latest version from head. I don't think there is anything wrong with this, so it must have been an earlier ingested article.

Tried it on moody.topazproject.org running 0.8 and worked fine.

09/20/07 15:46:43 changed by amit

I tried ingesting 943 and it failed as expected. What was even nicer was the the rollback cleaned everything up, so no spurious predicates were left hanging around. Time to take a dump of the mulgara database on stage.

09/20/07 16:03:11 changed by russ

you think i should backup and remove the offending entries manually?

09/20/07 16:05:18 changed by amit

No, I don't understand why delete is failing. Ronald might know better, but can you take a backup of mulgara and put it on stage or gandalf?

09/20/07 16:19:24 changed by russ

i uploaded stage-serverbackup.gz to ~russ on gandalf.

09/20/07 16:19:50 changed by russ

  • owner changed from russ to ronald.

(in reply to: ↑ 15 ) 09/20/07 18:07:06 changed by pradeep

Replying to russ:

even with plosone stopped and mulgara freshly restarted i get the error "cannot get transaction for commit" with this delete query:

Hmm. Looks like the runItql tool is not meant for updates since it is executing all commands using the doQuery() method rather than the doUpdate(). Does anyone have better info on why that is?

(follow-up: ↓ 25 ) 09/20/07 18:39:33 changed by pradeep

This may or may not be a mulgara bug. Seems like an explicit transaction is required to do a 'delete select' query. With auto-commit this fails.

btw: figured out how to do updates using the runItql tool. You need to escape into groovy mode.

  %itql.beginTxn('mytxn');
  %itql.doUpdate('itql statement', null);  
  %itql.commitTxn('mytxn');

WARN: be very very careful. Make sure your update statement works and there are no typos that would cause wiping out more than it should when executing delete. So I would try the select first and then bracket it in 'delete <select> from <local:///topazproject#ri>;'. Also I would do a %itql.rollbackTxn('mytxn') while you practice. And lastly try experimenting with a test database first and copy paste the statements in the runitql console so that there is minimal delay between the beginTxn and commitTxn. And lastly lastly, do a backup before - just in case there is some typo.

09/20/07 19:28:30 changed by amit

Hold off on doing anything till we give you the more precise list of steps.

(in reply to: ↑ 23 ) 09/20/07 19:32:37 changed by ronald

Replying to pradeep:

This may or may not be a mulgara bug. Seems like an explicit transaction is required to do a 'delete select' query. With auto-commit this fails.

Yes, this would be a bug in mulgara. However, ain't gonna fix it, for two reasons: 1) we don't use autocommit mode anywhere, and 2) the whole tx stuff has changed for mulgara 1.1.

btw: figured out how to do updates using the runItql tool. You need to escape into groovy mode.

  %itql.beginTxn('mytxn');
  %itql.doUpdate('itql statement', null);  
  %itql.commitTxn('mytxn');

There's a simpler way:

set autocommit off;
delete select ...;
commit;

The escape mode is cool, though :-)

09/20/07 21:55:01 changed by amit

  • owner changed from ronald to russ.

Russ, we restored your backup and found that the offending triple had been deleted. So odds are that the database you have is clean of this triple. We checked both by restoring and running appropriate queries and also by hand examining the backup file. Still trying to figure out how the triple got created and left behind in the first place. We are going to grab some data off stage and let you move ahead.

09/21/07 09:36:43 changed by amit

Russ, Pradeep and I spent a bit of time yesterday looking at the logs on stage and something really strange is going on. According to the logs you managed to ingest the same article twice (without a delete in the middle). Unless there is a bug, the only way we know of that you can pull that off, is if Fedora and Mulgara have different states. Did someone go in and delete an article in Fedora using the Fedora admin application? We also found a bug in the ingest XSLT that caused this problem (but that can still only happen if you manage to ingest the same article twice in a row). The order is

ingest, publish, (some magic to divert state between Fedora and Mulgara), ingest again.

Do you keep admin logs on your actions?

09/21/07 09:46:47 changed by russ

no, we don't keep manual admin logs.

most likely possiblilty is that:

949 got ingested, but failed (perhaps due to the crash described in 650) without proper rollback, and was in an in between state (the ingest application really likes to leave articles in an inconsitent state)

perhaps 949 was in the ingestable and publishable queues, perhaps not, and it got ingested again.

i'll run the select again and see if i still have the offending triples in my database...

09/21/07 09:56:49 changed by russ

  • status changed from new to closed.
  • resolution set to fixed.

all gone. i have no idea why though. i'll try to delete and ingest again on stage. wow. very strange.

either one of the deletes worked even though i got an error (i've actually had that happen to me before with the itql interpreter) or backing up did something.

09/21/07 14:18:58 changed by amit

Basic analysis was still correct, except one clarification. There was a delete attemtped before the ingest. The delete first deletes from lucene, then fedora and last mulgara. Delete from lucene and fedora was a success, but failed in Mulgara. Since we don't have a commont transaction interface (the big goal we have been proposing to Fedora Commons), unfortunately it exposed the second bug which the mismatch of predicate type between ingest and new model code. I have talked with Ronald and we decided we will change the delete code to first delete from Mulgara, then fedora and lucene. In this case a failure will rollback most optimally.

07/16/08 11:00:34 changed by

  • milestone deleted.

Milestone 0.8 deleted