Ticket #436 (closed defect: fixed)

Opened 1 year ago

Last modified 1 year ago

browse broken on production servers

Reported by: russ Assigned to: russ
Priority: critical Milestone:
Component: topaz Version: 0.6
Keywords: Cc:
Blocking: Blocked By:

Description

we've rolled everything back to 0.6 and it *seems* to be working okay.

we ingested new articles today, and so i need to restart the plosone servers to get browse to update (this is a bug that is fixed in 0.7)

i restarted plosone01, and browse didn't come back up - i get an error in plosone.log:

2007-07-03 12:21:23,939 INFO  LoggingInterceptor> Starting execution stack for action /article/browse [TP-Processor3 com.opensymphony.xwork.interceptor.LoggingInterceptor]
2007-07-03 12:21:23,939 DEBUG EnsureUserAccountInterceptor> ensure user account interceptor called [TP-Processor3 org.plos.user.EnsureUserAccountInterceptor]
2007-07-03 12:21:23,939 DEBUG EnsureUserAccountInterceptor> no single sign on user key [TP-Processor3 org.plos.user.EnsureUserAccountInterceptor]
2007-07-03 12:21:23,939 DEBUG EnsureUserAccountInterceptor> ticket is: null [TP-Processor3 org.plos.user.EnsureUserAccountInterceptor]
2007-07-03 12:21:23,940 WARN  UnhandledException>  [TP-Processor3 org.plos.UnhandledException]
java.lang.ArrayIndexOutOfBoundsException

i tried restarting topaz01 and plosone01, but still had no luck.

browse on the second stack is working just fine, and i'm not seeing errors in topaz01, fedora, or mulgara logs.

currently, i have both webheads pointing at the second stack. you can take a look at this problem on the first stack by using the mirror site:

thanks!

Dependency Graph

Change History

07/03/07 12:44:09 changed by pradeep

  • owner changed from pradeep to stevec.

Steve, This is on 0.6. Can you take a look? Thanks.

07/03/07 13:23:17 changed by stevec

  • owner changed from stevec to ebrown.

There is another error in the log which looks like it has to do with the data returned in the call to get the article info. A character that the XML parser can't deal with apparently. Is this a search index issue? Reassigning to Eric since I think this getArticleInfos is his method.

2007-07-03 11:52:49,641 ERROR HomePageAction> Could not retrieve the most recent articles [TP-Processor2 org.plos.action.HomePageAction]
AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.generalException
 faultSubcode:
 faultString: java.lang.IllegalArgumentException: Failed to validate: <sec><title>Background</title><p>We reasoned that by identifying genetic
 markers on human X chromosome regions where recombination is rare or absent, we should be able to construct X chromosome genealogies analogous to those based on Y ch
romosome and mitochondrial DNA polymorphisms, with the advantage of providing information about both male and female components of the population.</p></sec&g
t;<sec><title>Methodology/Principal Findings</title><p>We identified a 47 Kb interval containing an <italic>Alu</italic> insertion
 polymorphism (<italic>DXS225</italic>) and four microsatellites in complete linkage disequilibrium in a low recombination rate region of the long arm of
the human X chromosome. This haplotype block was studied in 667 males from the HGDP-CEPH Human Genome Diversity Panel. The haplotypic diversity was highest in Africa
(0.992±0.0025) and lowest in the Americas (0.839±0.0378), where no insertion alleles of <italic>DXS225</italic> were observed. Africa shared few haplotypewith other geographical areas, while those exhibited significant sharing among themselves. Median joining networks revealed that the African haplotypes were numerou
s, occupied the periphery of the graph and had low frequency, whereas those from the other continents were few, central and had high frequency. Altogether, our data s
upport a single origin of modern man in Africa and migration to occupy the other continents by serial founder effects. Coalescent analysis permitted estimation of the
 time of the most recent common ancestor as 182,000 years (56,700â479,000) and the estimated time of the <italic>DXS225 Alu</italic> insertion of 94,400 y
ears (24,300â310,000). These dates are fully compatible with the current widely accepted scenario of the origin of modern mankind in Africa within the last 195,000 ye
ars and migration out-of-Africa <italic>circa</italic> 55,000â65,000 years ago.</p></sec><sec><title>Conclusions/Significance</
title><p>A haplotypic block combining an <italic>Alu</italic> insertion polymorphism and four microsatellite markers on the human X chromosome is
 a useful marker to evaluate genetic diversity of human populations and provides a highly informative tool for evolutionary studies.</p></sec>
 faultActor:
 faultNode:
 faultDetail:
        {http://xml.apache.org/axis/}hostname:plostopaz01.localdomain

java.lang.IllegalArgumentException: Failed to validate: <sec><title>Background</title><p>We reasoned that by identifying genetic markers on human X chromosome regions
 where recombination is rare or absent, we should be able to construct X chromosome genealogies analogous to those based on Y chromosome and mitochondrial DNA polymor
phisms, with the advantage of providing information about both male and female components of the population.</p></sec><sec><title>Methodology/Principal Findings</titl
e><p>We identified a 47 Kb interval containing an <italic>Alu</italic> insertion polymorphism (<italic>DXS225</italic>) and four microsatellites in complete linkage d
isequilibrium in a low recombination rate region of the long arm of the human X chromosome. This haplotype block was studied in 667 males from the HGDP-CEPH Human Gen
ome Diversity Panel. The haplotypic diversity was highest in Africa (0.992±0.0025) and lowest in the Americas (0.839±0.0378), where no insertion alleles of <italic>DX25</italic> were observed. Africa shared few haplotypes with other geographical areas, while those exhibited significant sharing among themselves. Median joining ne
tworks revealed that the African haplotypes were numerous, occupied the periphery of the graph and had low frequency, whereas those from the other continents were few
, central and had high frequency. Altogether, our data support a single origin of modern man in Africa and migration to occupy the other continents by serial founder
effects. Coalescent analysis permitted estimation of the time of the most recent common ancestor as 182,000 years (56,700â479,000) and the estimated time of the <ital
ic>DXS225 Alu</italic> insertion of 94,400 years (24,300â310,000). These dates are fully compatible with the current widely accepted scenario of the origin of modern
mankind in Africa within the last 195,000 years and migration out-of-Africa <italic>circa</italic> 55,000â65,000 years ago.</p></sec><sec><title>Conclusions/Significa
nce</title><p>A haplotypic block combining an <italic>Alu</italic> insertion polymorphism and four microsatellite markers on the human X chromosome is a useful marker
 to evaluate genetic diversity of human populations and provides a highly informative tool for evolutionary studies.</p></sec>
        at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
        at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
        at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:633)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanEndElement(XMLNSDocumentScannerImpl.java:719)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1685)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
        at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
        at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
        at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
        at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
        at org.apache.axis.client.Call.invoke(Call.java:2767)
        at org.apache.axis.client.Call.invoke(Call.java:2443)
        at org.apache.axis.client.Call.invoke(Call.java:2366)
        at org.apache.axis.client.Call.invoke(Call.java:1812)
        at org.topazproject.ws.article.ArticleServicePortSoapBindingStub.getArticleInfos(ArticleServicePortSoapBindingStub.java:815)
        at org.plos.article.service.ArticleWebService.getArticleInfos(ArticleWebService.java:159)
        at org.plos.article.service.ArticleWebService$$FastClassByCGLIB$$c1438d02.invoke(<generated>)



2007-07-03 11:52:49,863 ERROR BrowseService> Could not retrieve the all articles [TP-Processor8 org.plos.article.service.BrowseService]
AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.generalException
 faultSubcode:
 faultString: java.lang.IllegalArgumentException: Failed to validate: Wnt, Hedgehog and Junctional Armadillo/β-Catenin Establish Planar Polarity in the &lt;italic&gt;
Drosophila&lt;/italic&gt; Embryo
 faultActor:
 faultNode:
 faultDetail:
        {http://xml.apache.org/axis/}hostname:plostopaz01.localdomain

java.lang.IllegalArgumentException: Failed to validate: Wnt, Hedgehog and Junctional Armadillo/β-Catenin Establish Planar Polarity in the <italic>Drosophila</italic>
Embryo
        at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:222)
        at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:129)
        at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:633)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanEndElement(XMLNSDocumentScannerImpl.java:719)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1685)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
        at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
        at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:696)
        at org.apache.axis.Message.getSOAPEnvelope(Message.java:435)
        at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnderstandChecker.java:62)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
        at org.apache.axis.client.Call.invoke(Call.java:2767)
        at org.apache.axis.client.Call.invoke(Call.java:2443)
        at org.apache.axis.client.Call.invoke(Call.java:2366)
        at org.apache.axis.client.Call.invoke(Call.java:1812)
        at org.topazproject.ws.article.ArticleServicePortSoapBindingStub.getArticleInfos(ArticleServicePortSoapBindingStub.java:815)
        at org.plos.article.service.ArticleWebService.getArticleInfos(ArticleWebService.java:159)
        at org.plos.article.service.ArticleWebService$$FastClassByCGLIB$$c1438d02.invoke(<generated>)
        at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
        at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:674)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:154)
        at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:126)
        at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:114)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
        at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:615)
        at org.plos.article.service.ArticleWebService$$EnhancerByCGLIB$$d2cc089c.getArticleInfos(<generated>)
        at org.plos.article.service.BrowseService.getAllArticles(BrowseService.java:80)
        at org.plos.article.service.BrowseService.createBrowseObjects(BrowseService.java:136)
        at org.plos.article.service.BrowseService.populateArticlesAndCategories(BrowseService.java:115)
        at org.plos.article.service.BrowseService.getArticlesByCategory(BrowseService.java:234)
        at org.plos.article.action.BrowseArticlesAction.execute(BrowseArticlesAction.java:66)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at s


07/03/07 14:17:22 changed by russ

aha, so we're dying because of one of the articles that has been ingested. sigh.

can you tell me what log the above is from? did i just miss that in plosone.log?

thanks!

07/03/07 14:31:31 changed by ebrown

This is not a search issue. Search is not invoked to browse.

The ArrayIndexOutOfBoundsException? in the plosone.log is exactly what I was seeing when browse was broken in head (0.8) a week or two ago (before I'd done the merge).

I don't think one exception is related to the other. It is possible there are two problems -- I don't know yet.

07/03/07 14:33:11 changed by stevec

yes, in the plosone log. Just search for the timestamp and you can see the trail of errors starting with the ones above. This brings up a couple questions, though... Why does stack 2 work (I wonder if it has a cached, stale version), and how come this didn't show up on the staging servers? Odd.

07/03/07 14:39:02 changed by ebrown

So it was #383 that fixed the bug in milestone:0.7. I doubt [2967] though would impact what's going on with plosone01.

07/03/07 14:41:40 changed by stevec

To clarify, there are two causes for the array exception. I believe the current one is directly related to the Axis problem since it is getting a zero length array from the ArticleInfo? call. The fix I checked in was meant to address an issue where a browseService bean was getting reused between users and thus could contain stale data. I made the bean session scoped and made it check for updated data every time.

07/03/07 14:45:41 changed by ebrown

There were some socket timeouts in the log too -- in relation to browse. I wonder if mulgara or fedora was busy and we just failed. (I don't understand why Java's socket and connection related exceptions don't include the host and port. Stupid....)

07/03/07 15:26:46 changed by ebrown

I believe the validation error is carried across axis from topaz and mulgara regarding an encoding issue in article info:doi/10.1371/journal.pone.0000009. I don't really understand the issue as it seems it must be something very specific as we've not seen this problem before and I can find no reference to the same exception on plosone02 or plostopaz02. I could be wrong, but I don't think it has anything to do with the browse issue.

My suspicion is that the browse issue is due to some network problems. My first inclination is to try restarting and see if that solves the problem.

07/03/07 15:37:00 changed by amit

  • owner changed from ebrown to russ.

That looks like an old article. Russ, did we do a manual cleanup on this anytime?

07/03/07 16:33:35 changed by russ

happy after another restart. thanks for looking into it.

let me know if you want me to give you tickets for things like this in the future, or if i should just keep restarting until it works.

i don't want to hassle you with unfixable random issues, but if you're interested in looking for gremlins i'm happy to leave a stack broken for you to look at when things like this come up...

07/03/07 17:13:03 changed by amit

  • status changed from new to closed.
  • resolution set to fixed.

No, please file tickets after you have done you bit of debugging. It is not a good idea asking folks to restart for a fix.

07/03/07 17:27:43 changed by ronald

This is a bug in mulgara/JRDF. It's trying to validate an XMLLiteral that is already stored in mulgara (and hence was already validated on insert), and that validation is failing. This happens intermittently.

We first saw this a long time ago, and I even added extra logging to mulgara for exactly this. Unfortunately, I can't find any problem with the code. I'm thinking it may be bug in JRDF's RdfXmlParser, but I've looked at the code and haven't seen any problem there either.

Btw., if you look at the logs you'll notice this pops up every now and then (grep for ParseException).