i came home around 10:30 to find nagios alerts waiting for me.
plosone was unresponsive. lots of timeouts, connection refused, etc. in plosone.log
in mulgara.log, nothing since 8pm.
[root@ploskow01 plos]# /sbin/service mulgara status
mulgara (17165) is running from /usr/local/topaz/mulgara
PID USER RSS %MEM ELAPSED TIME %CPU COMMAND
17165 topaz 1886768 11.4 05:44:24 00:21:31 6.2 java
java 17165 topaz 5u IPv6 6357470 TCP *:9091 (LISTEN)
java 17165 topaz 11u IPv6 6357512 TCP 127.0.0.1:9291 (LISTEN)
java 17165 topaz 13u IPv6 6357522 TCP *:60618 (LISTEN)
[root@ploskow01 plos]# tail /var/log/topaz/mulgara.log
2007-07-10 19:54:06,404 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:05:08,756 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:05:08,757 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:05:08,757 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:08:09,380 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:09:09,583 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:10:09,784 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:10:09,784 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:11:09,987 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
2007-07-10 20:11:09,987 INFO DatabaseSession> Closing session [Finalizer org.mu
lgara.resolver.DatabaseSession]
[root@ploskow01 plos]# /sbin/service mulgara stop
Stopping mulgara (17165)
[root@ploskow01 plos]#
after a mulgara restart, everything is chipper.
i understand that not much has changed in mulgara for 0.7, and it does feel like a network issue. certainly, things have changed now that plosone talks to mulgara directly.
it feels like the good old days all over again.
i'm editing iptables on mulgara and plosone boxes to accept all traffic from other servers in the stacks (previously i was using -m state --state NEW, and then accepting all ESTABLISHED and RELATED traffic, but i know we had some ipconntrack issues in the past before we figured out the right AGP settings...)
if someone could look over the tomcat settings with a fine toothed comb i'd appreciate it.
also, please tell me what kind of logging you want when these kinds of things happen.
thanks!