Ticket #946 (new enhancement)

Opened 6 months ago

Last modified 6 hours ago

support very large assets (supporting info, etc.)

Reported by: russ Assigned to: pradeep
Priority: critical Milestone: 0.9.1
Component: ambra Version: 0.9-SNAPSHOT
Keywords: Cc:
Blocking: Blocked By:

Description

Currently, when ambra returns a file for download (pdf, xml, tif, supporting info) it Loads the whole file into memory and caches it.

This does not make sense for very large files.

We have one bio article with a number of ~250MB supporting info files.

We should find a way to avoid caching large files, and just return them directly from fedora without loading into memory on the ambra side.

I think it would be ideal to switch based on asset type. we want to cache thumbnails and equations on the article page, medium sized PNGs on the slideshow page, and XML files. These are the objects with type PNG, PNG_S, PNG_M, and XML on the fedora side. Everything else - including PNG_L - would be a straight download.

Alternately, we could set a filesize limit for the cache (only cache files smaller than 1MB or 10MB?)

this is a requirement for the plos bio/med ambra migration.

Dependency Graph

Change History

09/08/08 15:35:50 changed by amit

  • milestone set to 0.9.1.

09/08/08 16:02:07 changed by amit

  • owner changed from amit to ronald.
  • priority changed from medium to critical.

10/27/08 16:51:13 changed by amit

  • owner changed from ronald to pradeep.

11/20/08 21:36:34 changed by pradeep

(In [6733]) Added support for Streaming Blobs. Fedora and SimpleBlobStore? has been updated to keep txn data in temporary files instead of in memory as before. This now allows streaming of large Blobs thru OTM. In addition the OTM blob support has been revamped to provide both 'managed' and 'unmanaged/copy' Blob fields.

Managed Blob fields must be used for large blobs. Unmanaged ones are for simplicity where OTM copies the contents of the blobs directly into the fields and does state tracking and writes out any changes back to the BlobStore?.

There is a bit of restriction on how Managed Blob fields can be used. Since OTM manages these fields, application does not have any control over when/how these fields are created. OTM creates them on a load from store or on a saveOrUpdate(). ie. OTM is the 'factory' class for these fields. Once created by OTM, the application can read/write to the data-streams of these blobs or even delete the blobs.

Because OTM is the factory for Managed Blobs, it can only create the following types of objects:

Others may be added later. Note that the java.io.InputStream? essentially provides a read-only access to the Blob.

For Unmanaged/Copy Blob fields, an additional support has been added for fields that are Serializable. This makes it possible to define things like: '@Blob String getText();'. The Strings are expected to be in UTF-8 encoding. (Other encodings may be added later by modifying the @Blob annotation)

So far no change to the config to support this. The restriction of byte[] on @Blob fields is now lifted.

Addresses #946.