Ticket #35 (closed task: worksforme)

Opened 3 years ago

Last modified 1 year ago

Review process changes to allow author to specify the abstract of the article in triples/pseudo triples format

Reported by: amit Assigned to: chris
Priority: high Milestone:
Component: review-process Version:
Keywords: abstract triples Cc: amit
Blocking: Blocked By:

Description

Please see ticket:34 for background on this. We need to know and document the changes that will be needed to the upstream process to require authors to specify the abstract in triples/pseudo triples format.

Dependency Graph

Change History

05/03/06 02:55:50 changed by chris

  • status changed from new to assigned.

05/03/06 03:17:44 changed by chris

This will take some thinking about. May be too too much editorial work upfront although it is an important ideal.

For reference and email exchange between Chris annd Amit giving fuller background:


One questions. What's 'triples format'? Am I being completely dumb for not understanding?

Currently PLoS Abstracts are split into 3 components "Background", "Methodology/Principal Findings" and "Conclusions/Significance" but I'm guessing that isn’t what you and John W mean.

Chris


Not at all. Sorry, sometimes we take our own language for granted. Triples format is based on the idea that you can describe most things in the world through a (subject, predicate, object) relationship. Hence the term triples...For example, if I were to describe you, I would start with giving you an internal identity say chris and say the following:

chris worksFor PLoS
chris fullName "Chris Surridge"
chris citizenOf "U.K"
chris email "csurridge@plos.org"
.
.
.

Some of this can get more abstract, for example, we can start grouping items into classes and define relationship on the classes which makes adding new information easier as we add smaller number of statements/triples but can deduce/infer the same meaning. This type of structure allows a computer to pretend to understand human knowledge. There are various syntaxes you can use to define these, but the W3C mandated are RDF/RDFS/OWL. If you are interested in more details, please see http://www.w3.org/TR/rdf-primer/. It is RDF specific, but if you ignore the syntax stuff, the basic underlying concepts is triples.

Since this was his idea, I had sent him an email earlier asking for details and got back a response saying he was busy but would respond in detail later. I am still waiting.

Question for you is that the 3 components/predicates of Abstract you have defined above are PLoS specific or is the industry standard? If the former then we could potentially do:

abstract hasBackground "..."
abstract methodology "..."
abstract conclusion "..."

regards

Amit


Amit,

Thanks. I now understand. The subheadingsof our abstracts aren't going to be any help at all. What I think John is suggesting is the seemingly sensible idea that abstracts should be deconstructed into a series of arguments and that these arguments should be rendered in triples. I think it would go something like this:

A current abstract in PLoS Biol is:

Stem cell function during organogenesis is a key issue in developmental biology. The transcription factor SHORT-ROOT (SHR) is a critical component in a developmental pathway regulating both the specification of the root stem cell niche and the differentiation potential of a subset of stem cells in the Arabidopsis root. To obtain a comprehensive view of the SHR pathway, we used a statistical method called meta-analysis to combine the results of several microarray experiments measuring the changes in global expression profiles after modulating SHR activity. Meta-analysis was first used to identify the direct targets of SHR by combining results from an inducible form of SHR driven by its endogenous promoter, ectopic expression, followed by cell sorting and comparisons of mutant to wild-type roots. Eight putative direct targets.................

In triples this would become:

[Stem cells] [act in] [organogenesis]
[organogenesis] [is a susbset of] [development]
[SHORT-ROOT (SHR)] [is a] [transcription factor]
[SHORT-ROOT (SHR)] [acts in ] [Arabidopsis root development]
[SHORT-ROOT (SHR)] [regulates] [stem cell specification]
[SHORT-ROOT (SHR)] [regulates] [differentiation potential of stem cells]
[meta-analysis] [is a] [statistical method]
[meta-analysis] [was applied to] [several microarray experiments]
[meta-analysis] [measured] [changes in global expression profiles]
[SHR] [modulates] [global expression profiles]
[meta-analysis] [identified] [direct targets of SHR]

Etc.

The vocabulary can be greatly restricted to remove pseudonyms of course.

Getting authors to put stuff in this format would be a big job unless John has a cunning plan.

Chris


Hi Chris,

Yes, precisely. But I think he wanted to take very small steps before trying to get in development of specialized vocabulary for various domains of knowledge (but this is my reading and I could be of base, hence my email to him for clarification).

John said that we should be more focused on modifying certain social behaviours as the technology is ephemeral. The social change he said we could potentially focus on is requiring authors to provide abstract in triples or close to triples (which we can translate programmatically).

I have asked him to provide me with the format so that it gives me a precise idea and I am waiting for it. Assuming (and a big assumption) is that we get that, the questions in my mind are:

1. What it will take to get authors to do something like this or do we (in PLoS) read the author's abstract and translate it manually?

2. I suspect this will be done during the review process and needs to be inserted in some shape and form into the document sent to Allen Press for translation in PMC format. We need to know what it will take to do this and where in the PMC document Topaz will be able to locate the abstract triples and store them in the repository and how we will expose the same via search etc.

I talked to Rich about it and he said to bring to you in as you understood the upstream process better. Hopefully this gives the full context. I am wide open here for your suggestions.

regards

Amit

05/11/06 06:28:39 changed by chris

I've been thinking about this and I think that the upstream editorial work to acheive this for all papers would be more than we could handle. I'm not sure what resources would be needed to 'check and correct' this metadata but I know that we won't have it. That said it is a good idea annd something that it would be good to try and support, if not enforce.

There are some obvious questions to my mind.

1) Would the 'abstract in triples' be encorporated as metadata in the XML of an article or be a seperate file associated with the document? My answer to that would be that it would be better to include this in the metadata of the XML. This will make it far more accessible for machine reading which is the primary driver for this as I see it.

2) Will the 'abstract in triples' be a) part of the package supplied to the JPS from the production system or b) will it be merged with the document within the JPS.

a) is the most logical as it is better that as many matters to do with editing are done before the documents are handed on to this JPS, that is what production processes are for. Equally though this isn't something that the AP and PLoS ONE production system are designed to handle so this kind of functionality really becomes an item for consideration when specifying the features of a new Manuscript Tracking System, which I guess is the last step in producing an end to end publishing system.

b) This would entail a way of editing the XML metadata of a document after publication and providing a facility by which both staff and authors could have access to the editing tools. An "Insert and/or edit metadata" tool would be a nice feature to add to the wish list, it could also be a way of handling typos and other minor errors in a text. A system of versioning would again be needed and security to restrict access to certain (possibly dynamic) roles i.e. authors. This isn't something that needs building for October but it would be very good for the framework to potentially support it, i.e support the editing of XML/insertion of additional metadata.

Are there technical consideration that I'm missing?

05/22/06 11:16:38 changed by amit

Jimmy Wales of Wikipedia has given us a pretty good solution which John Wilbanks has agreed with. Please see ticket:34 for details. This idea will not involve any changes to the upstream process and require no additional impact on PLoS resources. Chris, please take a look and if you are okay with it, please close this ticket.

06/19/06 04:35:20 changed by chris

  • status changed from assigned to closed.
  • resolution set to worksforme.

I'm fine with this.

10/29/07 21:13:02 changed by

  • milestone deleted.

Milestone snowcrash deleted