Ticket #848 (closed enhancement: fixed)

Opened 10 months ago

Last modified 6 months ago

ingest resized images in zip file on ingest

Reported by: rich Assigned to: jkirton
Priority: critical Milestone:
Component: ambra Version: 0.8.2.1
Keywords: Cc:
Blocking: Blocked By:

Description

Ingest needs to determine if resized images already exists in a zip file (from processimages.groovy) and store those images files.

DocumentManagementService?.resizeImages() around line 440 would be a place to test for pre-existing s/m/l. That whole block that handles images/figs/formulas/etc could be redone to be aware of pre-conversion.

Dependency Graph

Attachments

ingest_equation_error.txt (74.7 kB) - added by rich on 03/28/08 14:02:51.
pcbi.0030029.zip (4.0 MB) - added by rich on 03/28/08 14:03:58.
doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_L+PNG_L.0 (365.0 kB) - added by russ on 04/03/08 09:49:31.
ingested L PNG for g001
doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_M+PNG_M.0 (232.2 kB) - added by russ on 04/03/08 09:49:48.
ingested M PNG for g001
doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_S+PNG_S.0 (29.1 kB) - added by russ on 04/03/08 09:50:00.
ingested PNG for g001

Change History

03/16/08 02:10:57 changed by ronald

I would suggest that whole piece of code be removed completely, i.e. that ingest do no image generation/resizing whatsoever.

03/19/08 10:43:26 changed by rich

  • milestone changed from pubApp_0.8.2.2 to pubApp_0.8.2.3.

03/21/08 16:34:41 changed by jkirton

  • status changed from new to closed.
  • resolution set to fixed.

Initial impl complete per r5138, r5139, r5140 (eclipse borked on my attempt for a single commit - it isolates changes per project)

03/24/08 10:15:50 changed by russ

  • milestone changed from pubApp_0.8.2.3 to pubApp_0.8.2.2.

03/28/08 13:50:41 changed by rich

  • status changed from closed to reopened.
  • resolution deleted.

03/28/08 14:02:30 changed by rich

Tried to ingest an compbiol article with equation tiffs and received the following ERRORs (full trace attached):

2008-03-28 13:48:02,766 INFO  Ingester(PLoSCompBiol)> Successfully ingested 'info:doi/10.1371/journal.pcbi.0030029' [http-8080-Processor24org.plos.article.util.Ingester]
2008-03-28 13:48:05,392 INFO  DenyBiasedPEP(PLoSCompBiol)> 'permit-admin' permits 'info:doi/10.1371/account/2abf9176-5dca-46c7-bca1-c7d4d6aaee18' to do 'articles:readMetaData' on 'info:doi/10.1371/journal.pcbi.0030029' [http-8080-Processor24 org.plos.xacml.DenyBiasedPEP]
2008-03-28 13:48:05,453 ERROR DocumentManagementService(PLoSCompBiol)> Unable to retrieve Article URI='info:doi/10.1371/journal.pcbi.0030029' [http-8080-Processor24 org.plos.admin.service.DocumentManagementService]
org.plos.ApplicationException: org.plos.article.util.NoSuchArticleIdException: (id = 'info:doi/10.1371/journal.pcbi.0030029')
2008-03-28 13:48:05,457 WARN  DocumentManagementService(PLoSCompBiol)> fetchArticleService.getArticleInfo() returnednull for article URI: 'info:doi/10.1371/journal.pcbi.0030029' so using default image set [http-8080-Processor24 org.plos.admin.service.DocumentManagementService]
2008-03-28 13:48:05,458 INFO  DocumentManagementService(PLoSCompBiol)> Ingested: /var/spool/plosone/ingestion-queue/pcbi.0030029.zip [http-8080-Processor24 org.plos.admin.service.DocumentManagementService]
2008-03-28 13:48:05,463 INFO  DenyBiasedPEP(PLoSCompBiol)> 'permit-admin' permits 'info:doi/10.1371/account/2abf9176-5dca-46c7-bca1-c7d4d6aaee18' to do 'articles:listSecondaryObjects' on 'info:doi/10.1371/journal.pcbi.0030029' [http-8080-Processor24 org.plos.xacml.DenyBiasedPEP]
2008-03-28 13:48:05,970 ERROR DocumentManagementService(PLoSCompBiol)> Resize images failed for article info:doi/10.1371/journal.pcbi.0030029 [http-8080-Processor24 org.plos.admin.service.DocumentManagementService]
org.plos.article.util.NoSuchArticleIdException: (id = 'info:doi/10.1371/journal.pcbi.0030029')
2008-03-28 13:48:05,986 ERROR IngestArchivesAction(PLoSCompBiol)> Error ingesting articles: pcbi.0030029.zip [http-8080-Processor24 org.plos.admin.action.IngestArchivesAction]org.plos.article.util.ImageResizeException: org.plos.article.util.NoSuchArticleIdException: (id = 'info:doi/10.1371/journal.pcbi.0030029')
2008-03-28 13:48:05,989 INFO  DenyBiasedPEP(PLoSCompBiol)> 'permit-admin' permits 'info:doi/10.1371/account/2abf9176-5dca-46c7-bca1-c7d4d6aaee18' to do 'articles:deleteArticle' on 'info:doi/10.1371/journal.pcbi.0030029' [http-8080-Processor24 org.plos.xacml.DenyBiasedPEP]
2008-03-28 13:48:06,038 ERROR IngestArchivesAction(PLoSCompBiol)> Could not delete article: info:doi/10.1371/journal.pcbi.0030029 [http-8080-Processor24 org.plos.admin.action.IngestArchivesAction]
org.plos.article.util.NoSuchArticleIdException: (id = 'info:doi/10.1371/journal.pcbi.0030029')

03/28/08 14:02:51 changed by rich

  • attachment ingest_equation_error.txt added.

03/28/08 14:03:58 changed by rich

  • attachment pcbi.0030029.zip added.

03/31/08 14:00:17 changed by rich

  • priority changed from high to critical.

04/01/08 10:21:56 changed by jkirton

Upon clearing *all* cache on branch and ingesting a test article zip file having equation images, all is well. Could this be a stale cache related issue? Especially in light of the fact that the package name was changed for ArticleType and related classes with the r5151 move.

04/01/08 16:43:23 changed by jkirton

(In [5216]) addresses #847, #848 Changed the package name for ArticleType.java to retain the validity of existing java serializations that contain this type.

04/03/08 09:47:23 changed by russ

some images are getting pickled on ingest.

processed article pcbi.0030151.

ingested.

http://ploscompbiol-branch.plos.org:8080/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0030151

medium sized images in the slideshow are all corrupt.

checked the processed image files in the zip - all are good.

diffed the processed image files to the files in fedora. S matches, M and L do not match.

[root@sfweb02 tmp]# diff L_pcbi.0030151.g001.png doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_L+PNG_L.0
Binary files L_pcbi.0030151.g001.png and doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_L+PNG_L.0 differ
[root@sfweb02 tmp]# diff S_pcbi.0030151.g001.png doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_S+PNG_S.0

saw another case on multi where some of the S files were corrupt as well, and some of the M files were okay, so i don't think it's specific to M and L files.

not sure if the same files are corrupt on each ingest or not.

i'll attach both original and processed zip as well as the ingested images for g001.

04/03/08 09:49:31 changed by russ

  • attachment doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_L+PNG_L.0 added.

ingested L PNG for g001

04/03/08 09:49:48 changed by russ

  • attachment doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_M+PNG_M.0 added.

ingested M PNG for g001

04/03/08 09:50:00 changed by russ

  • attachment doi_10.1371%2Fjournal.pcbi.0030151.g001+PNG_S+PNG_S.0 added.

ingested PNG for g001

04/03/08 09:50:39 changed by russ

article zips were too bug to attach - you can find them on branch in /var/spool/plosone/ingested/proc.pcbi.0030151.zip and /var/spool/plosone/ingestion-queue/pcbi.0030151.bak

04/03/08 17:30:54 changed by jkirton

(In [5341]) addresses #848 Changed the code that reads in processed image bytes from a zip file entry (in PreProcessedArticleImageProvider?.java) as before some bad assumptions were made regarding the InputStream?. I'm hoping this fixes the image corruption issue however I'm not able to test it at the moment due to a NoSuchArticleIdException? that now occurs when a call for an article's secondary objects is made in DocumentManagementService?.processimages() method. I cleared *all* cache but the exception persists. I'm committing these changes anyway so they are not lost. Beyond this code change, logging messages were improved in DocumentManagementService?.java for easier debugging. Lastly, some code clean up was also done mainly in DocumentManagementService?.java.

04/03/08 18:00:38 changed by alex

(In [5346]) Recorded merge of revisions 5216 via svnmerge from http://gandalf.topazproject.org/svn/branches/0.8.2.2

........

r5216 | jkirton | 2008-04-01 16:43:23 -0700 (Tue, 01 Apr 2008) | 2 lines

addresses #847, #848 Changed the package name for ArticleType.java to retain the validity of existing java serializations that contain this type.

........

04/04/08 13:00:10 changed by jkirton

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [5370]) fixes #848 r5341 actually contains the fix but now after local testing, the fix was able to be verified. This commit re-factors DocumentManagementService? where we are not more explicit about image contexts and their associated processed image mime-types.

04/04/08 15:38:23 changed by russ

looks good to me with n=1 article tested. will do some more in-depth testing as we approach 0.8.2.2 release.

05/16/08 14:49:18 changed by russ

(In [5764]) creating a branch from 0.8.2.2 at r4863. will attempt to remove article serialization changes while preserving pre-ingest processing changes. addresses #806, #847, #848

07/16/08 11:01:33 changed by

  • milestone deleted.

Milestone pubApp_0.8.2.2 deleted