Ticket #303 (closed task: fixed)

Opened 2 years ago

Last modified 1 year ago

document cache behavior

Reported by: russ Assigned to: somebody
Priority: low Milestone:
Component: documentation Version:
Keywords: Cc:
Blocking: Blocked By:

Description

since cache behavior has implications for stability of the site (eg, if we lose all plosone cache info we're looking at least an hour of downtime while the cache rebuilds) it seems like we should understand a little more about how the cache's work and what data is exchanged.

one specific question: we've been working under the assumption (backed by experience) that the plosone caches exchange data. but apparently that's not the case according to a note by steve in #302. if the plosone caches only exchange dirty notices, is there a benefit to restarting the plosone servers in a round-robin fashion? do we, in fact, preserve the cache this way?

thanks!

Dependency Graph

Change History

(follow-up: ↓ 5 ) 03/09/07 09:04:28 changed by stevec

Yes, there is a benefit. You reduce the load on mulgara if you restart them one at a time and allow the security cache to be rebuilt on that side of things.

03/12/07 18:35:50 changed by russ

so we don't preserve the plosone cache at all by restarting the plosone service at different times?

is it the case that the plosone cache is blown away on restart?

why are there such dramatic differences in the time it takes the wget script to run after restart? it seems like it's *much* faster when the other plosone server is up, although i've had instances where it's slow and the other server is up, and instances where it's fast and the other server is down...

we need comprehensive documentation on this.

(follow-up: ↓ 4 ) 03/13/07 09:52:39 changed by stevec

Pradeep will have to fill you in on the Topaz on down side of things. The plosone cache is fairly simple. There are two types that I use. One is persistent across restarts, the other is not. There are reasons why I chose one over the other. The only things persistently cached to disk are the article images. The reason being those should never change once they are published. User information, articles, annotations, etc are cached non-persistently. The article XML itself is not cached because it was fairly difficult to integrate that in before launch due to the objects we were using. Instead, I actually cache the HTML for each article after the annotations have been applied. Thus, it can't be persisted across restarts because the cache wouldn't know to refresh if things changed between restarts. The same is ture of the other objects cached in this manner. The caches are fairly dumb between servers. They exist independently from one another, but they will send dirty event notices for a given cache key to other caches in the group. The likely reasons for the speed increases when starting one after the other are probably because a) you're putting half the load on mulgara and b) the mulgara and topaz caches are populated when the first plosone server starts. I believe that after the caches at the mulgara/topaz level are populated, things are much quicker because all the access checks are already done. But Pradeep will have more details on that.

(in reply to: ↑ 3 ; follow-up: ↓ 6 ) 03/13/07 13:17:49 changed by pradeep

Replying to stevec:

Pradeep will have to fill you in on the Topaz on down side of things.

Topaz uses EhCache?. It is used for permissions caching only. Nothing else is cached. When we did our performance measurements, the XACML rule evaluations where the ones that took about 85% of the time. That is because each permission evaluation requires a bunch of ITQL queries. So we decided to cache it and chose a somewhat aggressive scheme to cache. ie. if one topaz web-app cached something it'll share it with all others. Multicast is used for peer discovery. But the cache information itself is sent via reliable (TCP) channels.

Also for invalidating the cache, we chose a scheme equivalent to 'database triggers' for mulgara. If a cache entry is affected by an update, a filter-resolver on mulgara will send 'remove' commands to all peers (again via TCP to each peer).

Also note that the fedora tomcat is hosting fedora web-app and topaz access service. The access service is part of the EhCache? peer network.

That is about it. Look for ehcache.xml in the WEB-INF/classes directory for any topaz web-app to see how EhCache? is configured. If any change is made, make sure all the peer configs are updated including that of mulgara and fedora/access service.

(in reply to: ↑ 1 ) 03/13/07 13:22:01 changed by pradeep

Replying to stevec:

Yes, there is a benefit. You reduce the load on mulgara if you restart them one at a time and allow the security cache to be rebuilt on that side of things.

Yes. Once the security cache is built, there should be about a 6x benifit. If the wget scripts on the second plosone instance is faster by around this much, then this would very well explain it.

(in reply to: ↑ 4 ; follow-up: ↓ 8 ) 03/14/07 10:51:40 changed by russ

The only things persistently cached to disk are the article images. The reason being > those should never change once they are published.

is it possible that these are, in fact, cached on ingest and not publication? if we ingest an article, and then want to make image changes before publication, we have to manually delete the cache in /var/cache/plosone/application before the new images are visible on reingest. i think i have a mantis bug on this...

(follow-up: ↓ 9 ) 03/14/07 10:59:48 changed by russ

okay, i think things make sense now.

in all cases, plosone has to rebuild its html cache on restart. what makes it faster or slower is whether it has the xacml access rules cached in topaz.

in fact, if i restart both plosone servers, without restarting topaz/fedora/mulgara, the cache rebuilds quickly on both servers - no difference between the first start and second started server.

so it's really about keeping the xacml cache in place during plosone restarts.

i concur there's about a 6x difference between the time for a slow cache vs fast cache rebuild on plosone.

it still makes sense to restart the two plosone servers serially to reduce the load on mulgara. but that has nothing to do with the caching.

pradeep, can you tell me what causes the xacml cache to be wiped out? if we restart mulgara but not topaz, for example, do we keep the cache?

thanks!

(in reply to: ↑ 6 ) 03/14/07 11:20:25 changed by stevec

Replying to russ:

The only things persistently cached to disk are the article images. The reason being > those should never change once they are published.

is it possible that these are, in fact, cached on ingest and not publication? if we ingest an article, and then want to make image changes before publication, we have to manually delete the cache in /var/cache/plosone/application before the new images are visible on reingest. i think i have a mantis bug on this...

No, that is not possible (at least not the way you're thinking). The cache is actually happening at the http request level. When an image is requested by a browser, it is cached by a layer sitting above webwork. Subsequent requests pull it directly from disk without any request getting processed by the application.

If you're referring to trac ticket #267, a fix was checked in on 2/12/07 with [2346]

(in reply to: ↑ 7 ) 03/15/07 16:11:12 changed by pradeep

Replying to russ:

pradeep, can you tell me what causes the xacml cache to be wiped out? if we restart mulgara but not topaz, for example, do we keep the cache?

Yes. All caches are eternel and are set up to bootstrap from each other. So you would have to kill both the topaz servers and fedora to wipeout the cache, provided there aren't any network problems.

03/15/07 16:33:56 changed by russ

  • status changed from new to closed.
  • resolution set to fixed.

thank you!!!

08/07/07 16:25:51 changed by

  • milestone deleted.

Milestone Bugs deleted