Ticket #10 (closed defect: wontfix)

Opened 6 years ago

Last modified 5 years ago

Race-condition in filter-resolver/resource-index-module

Reported by: ronald Assigned to: ronald
Priority: critical Milestone:
Component: topaz-mulgara Version:
Keywords: Cc:
Blocking: Blocked By:

Description

There's a serious race-condition with the filter-resolver: if any statements for a given subject are added or removed in the time after the resolver has sent the mods for that subject to fedora but before fedora's resource-index-module has turned around and updated kowari, then those mods will get clobbered by the resource-index-module's update.

The problem is that fedora doesn't support incremental updates to datastreams (in particular the RELS-EXT), and hence the filter-resolver has to grab the whole list of statements and shove them to fedora.

The easiest solution involves having the filter-resolver set a flag (add a statement) and having the resource-index-module not update kowari if it sees that flag.

Dependency Graph

Change History

04/20/06 23:06:27 changed by ronald

  • status changed from new to assigned.

04/24/06 21:36:08 changed by amit

  • milestone changed from dodo to Nirvana.

Changed to Nirvana.

05/10/06 12:40:20 changed by amit

  • milestone changed from Nirvana to newton.

Needs to be fixed before October. Changed milestone to newton.

05/18/06 14:24:30 changed by amit

  • milestone changed from newton to topaz_newton.

Since we are splitting the projects, move this to Topaz milestone.

06/11/06 13:28:26 changed by ronald

The race condition is easily solved with the flag idea above. The problem is the following scenario:

  1. update to kowari
  2. update to fedora by non filter-resolver
  3. update to fedora by filter-resolver (for changes from 1)

Because the resource-indexer does a drop-all-insert-all (for a given fedora-object/rdf-subject) 2 will clobber 1.

After some discussion, Pradeep came up with what seems the best idea. The only way to avoid problems is to avoid have two entities write the same data. The first idea was to disable the Resource Indexer completely, and making the ingester and anybody else creating fedora objects send all RDF to kowari/mulgara directly (i.e. if you stick something into DC or RELS-EXT it won't appear in the RDF anymore). This only drawback is the loss of Fedora's meta-info in the RDF (such as the list of datastreams etc). So instead we'll modify the Resource Indexer to just write Fedora system predicates and leave DC and RELS-EXT alone. This way there's a clean division of who writes what. The only problem with this is that the Fedora system predicates become read-only in the RDF (well, you can change them but they'll get clobbered on the next fedora mod), but I think that makes sense anyway. If desired, the filter-resolver could be modified to detect attempted mods to Fedora system predicates and either generate a warning or throw an exception.

The above requires a change to the Resource Indexer. We should add the capability of configuring a filter that can both control which statements are sent to kowari/mulgara and can modify statements if necessary (such as change the URI's). Rough idea for the interface:

  interface RITripleFilter {
    public Triple doFilter(Triple t);
  }

where doFilter can return the original triple (no mods), a new triple (modified statement), or null (don't add the statement).

One likely place for this filter is the RIQueue.

08/17/06 02:51:12 changed by ronald

(In [469]) Addresses #10 and #72 (indirectly): disable all triple-manipulation by the resource-indexer. Note that this is a temporary "fix", to be replaced with the full fix described in #72.

Also note that we disable the triple updates this way instead of disabling the whole RI outright via fedora.fcfg because the we still want Fedora to start up and initialize the triple-store for us.

09/03/06 15:02:02 changed by ronald

Note that when the resource-indexer and filter-resolver are re-enabled, then the Fedora-PID to RDF-URI mapping needs to addressed too.

10/02/06 21:18:01 changed by ronald

(In [728]) Addresses #4, #10 and #72: multiple fixed and updates to FilterResolver?:

  • Handle multiple models. The datastream a model's statements should be written to is part of the model definition so as avoid having to change the configuration and restart the server for new models.
  • Flush outstanding queued items on server shutdown.
  • Make various hardcoded params configurable.
  • Made fedora-updater transaction aware so only committed stuff triggers an update
  • Delete empty datastreams
  • Avoid creating fedora objects unless really necessary.
  • Updated fedora-updater for Fedora 2.1.1 (error message for no-such-datastream changed).
  • Remove obsolete/dead code
  • Added configurable filters for fedora-update to control which statements are written and whether any URI-rewriting should occur. Provided three different implementations: a very simple one that only handles Fedora URI's; a filter that tries to put things into Fedora in their proper places with the least mods, but falls back shortening and escaping URI's where necessary; and lastly a filter that writes all statements to separate Fedora objects (i.e. not the object indicated by the subject-uri). Neither of these are satisfactory, though, as they either don't fully fix the original problems in #10 or they invalidate the purpose of the updater (#4). They are being checked in here for historical purposes.

10/02/06 21:32:55 changed by ronald

  • status changed from assigned to closed.
  • resolution set to wontfix.

While the original race-condition can be solved with the solution above (ensure that each datastream has exactly one "owner"), there is a related race-condition which can't be: the filter-resolver needs to create fedora objects if they don't exist. However, what can happen is the following:

  1. a fedora object is created and triples associated with that inserted
  2. the updater fires and gathers the list of triples to push into that object
  3. the app deletes the fedora object and triples again
  4. the updater (re)creates the object in order to store the triples

This results is objects "reappearing". Note that while the resolver could in theory check if all datastreams are empty and in that case delete the object (a step 5 above), that would A) be quite expensive, and B) still not completely solve the problem as there's still a time-window in which the app could validly assume the object doesn't exist.

Hence this whole approach is being abandoned and the ticket closed.

08/07/07 16:25:51 changed by

  • milestone deleted.

Milestone Bugs deleted