Ticket #520 (assigned enhancement)

Opened 1 year ago

Last modified 2 weeks ago

Line wrap for lengthy gene sequences

Reported by: susanne Assigned to: josowski (accepted)
Priority: medium Milestone: 0.9.2
Component: ambra Version: 0.8
Keywords: text Cc:
Blocking: Blocked By:

Description

Charlesworth has begun, with article e459 published May 23, 2007, to tag gene sequences in the XML with the <named-content content-type="gene"></named-content> tag.

Topaz is rendering these in the HTML as: <span class="capture-id"></span>. At the moment, there is no style definied for capture-id, but I imagine Topaz has put it in as a placeholder for something we might do in the future.

Perhaps one of the first things we can do is to use this somehow to control the problem that we are having with lengthy gene sequences not wrapping. Here is some background on this problem:

DNA sequences listed in a paper are often lengthy, as in this example from PLoS ONE:

http://www.plosone.org/doi/pone.0000299#s4

ATGACTGAGACCCTCCCACCCGTGACTGAAAGCGCCGTCGCTCTGCAAGCAGAGGTTACCCAGCGGGAGCTGTTCGAGTTCGTCCTCAACGACCCCCTCCTGGCTTCTAGCCTCTACATCAACATTGCTCTGGCAGGCCTGTCTATACTGCTGTTCGTCTTCATGACCAGGGGACTCGATGACCCTAGGGCTAAACTGATTGCAGTGAGCACAATTCTGGTTCCCGTGGTCTCTATCGCTTCCTACACTGGGCTGGCATCTGGTCTCACAATCAGTGTCCTGGAAATGCCAGCTGGCCACTTTGCCGAAGGGAGTTCTGTCATGCTGGGAGGCGAAGAGGTCGATGGGGTTGTCACAATGTGGGGTCGCTACCTCACCTGGGCTCTCAGTACCCCCATGATCCTGCTGGCACTCGGACTCCTGGCCGGAAGTAACGCCACCAAACTCTTCACTGCTATTACATTCGATATCGCCATGTGCGTGACCGGGCTCGCAGCTGCCCTCACCACCAGCAGCCATCTGATGAGATGGTTTTGGTATGCCATCTCTTGTGCCTGCTTTCTGGTGGTGCTGTATATCCTGCTGGTGGAGTGGGCTCAGGATGCCAAGGCTGCAGGGACAGCCGACATGTTTAATACACTGAAGCTGCTCACTGTGGTGATGTGGCTGGGTTACCCTATCGTTTGGGCACTCGGCGTGGAGGGAATCGCAGTTCTGCCTGTTGGTGTGACAAGCTGGGGCTACTCCTTCCTGGACATTGTGGCCAAGTATATTTTTGCCTTTCTGCTGCTGAATTATCTGACTTCCAATGAGTCCGTGGTGTCCGGCTCCATACTGGACGTGCCATCCGCCAGCGGCACACCTGCCGATGACTGA

The string of letters is not wrapping at the end of the line, it just extends off to edge of the screen, behind other layers.

In other PLoS journals, our vendor has resolved this in the HTML display by placing a <wbr alt="&8203;" style="content: attr(alt);" /> after every five characters in the string, for example:

http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0040366#toclink4

From a semantic perspective, this is a bad solution. Do you have a better one?

Dependency Graph

Change History

08/02/07 15:42:23 changed by amit

  • keywords set to text.
  • owner changed from pradeep to stevec.
  • version set to 0.8.
  • component changed from topaz to publishing-app.
  • priority changed from unassigned to medium.

Steve, any ideas?

08/02/07 19:39:30 changed by stevec

I think we've spoken about this in the past. There is no canonical way to wrap long strings. The stuff that Margaret, you, and I looked at last year and early this year had various hacks, but no clean solution. It's a limitation in HTML at the moment.

It is good news that they are getting wrapped with a tag, so that we can identify it. capture-id was just a default class defined by the XSL that I was working from. Don't read anything into it.

08/07/07 16:25:51 changed by

  • milestone deleted.

Milestone Bugs deleted

10/29/07 20:38:19 changed by amit

  • owner changed from stevec to alex.

11/05/07 17:56:27 changed by alex

How about if we wrap <named-content content-type="gene"></named-content> in a scrollable text box that's only the width of the screen? If somebody really wants to look at this long sequence, they can scroll the text box, or cut / paste it. At least this way they can find it as one continuous sequence when searching the page.

02/13/08 13:01:34 changed by alex

I noticed that a

block quote

in the wiki results in a non-wrapping and auto scrolling block of text. We could base the style sheet for gene sequences on this.

02/13/08 13:02:50 changed by alex

like this?

ATGACTGAGACCCTCCCACCCGTGACTGAAAGCGCCGTCGCTCTGCAAGCAGAGGTTACCCAGCGGGAGCTGTTCGAGTTCGTCCTCAACGACCCCCTCCTGGCTTCTAGCCTCTACATCAACATTGCTCTGGCAGGCCTGTCTATACTGCTGTTCGTCTTCATGACCAGGGGACTCGATGACCCTAGGGCTAAACTGATTGCAGTGAGCACAATTCTGGTTCCCGTGGTCTCTATCGCTTCCTACACTGGGCTGGCATCTGGTCTCACAATCAGTGTCCTGGAAATGCCAGCTGGCCACTTTGCCGAAGGGAGTTCTGTCATGCTGGGAGGCGAAGAGGTCGATGGGGTTGTCACAATGTGGGGTCGCTACCTCACCTGGGCTCTCAGTACCCCCATGATCCTGCTGGCACTCGGACTCCTGGCCGGAAGTAACGCCACCAAACTCTTCACTGCTATTACATTCGATATCGCCATGTGCGTGACCGGGCTCGCAGCTGCCCTCACCACCAGCAGCCATCTGATGAGATGGTTTTGGTATGCCATCTCTTGTGCCTGCTTTCTGGTGGTGCTGTATATCCTGCTGGTGGAGTGGGCTCAGGATGCCAAGGCTGCAGGGACAGCCGACATGTTTAATACACTGAAGCTGCTCACTGTGGTGATGTGGCTGGGTTACCCTATCGTTTGGGCACTCGGCGTGGAGGGAATCGCAGTTCTGCCTGTTGGTGTGACAAGCTGGGGCTACTCCTTCCTGGACATTGTGGCCAAGTATATTTTTGCCTTTCTGCTGCTGAATTATCTGACTTCCAATGAGTCCGTGGTGTCCGGCTCCATACTGGACGTGCCATCCGCCAGCGGCACACCTGCCGATGACTGA

02/13/08 22:19:26 changed by ronald

You probably want the css overflow property - overflow=auto gives you a scrollbar whenever the content would be too long.

10/23/08 09:28:26 changed by rich

  • owner changed from alex to josowski.
  • blocking changed.
  • blockedby changed.

10/27/08 16:44:49 changed by josowski

  • status changed from new to assigned.

10/28/08 13:54:44 changed by josowski

I really like the proposal by Alex also perhaps using a dojo control:

http://dojocampus.org/explorer/#Dojox_Layout_ScrollPane_Horizontal

BUT, I am betting this will cause problems with the commenting system. I'll have to experiment to know for sure.

I wonder if we can do something similar to what I did for the a tags inside comments. If the content in question is tagged with <named-content content-type="gene"></named-content>, can we control this a bit inside the XML to HTML transform? (Is this an XSL sheet?)

That is ...hide extra content over N characters but give option to display / get the full sequence.

sorta like: <span title="FULLSEQUENCE">SHORTENEDSEQUENCE *icon*</span>

Clicking on icon would put the sequence into the clipboard

10/28/08 14:29:13 changed by amit

I would try and keep it as simple as possible, for example, can we do this using css overflow property as outlined by Ronald? Do we have to use javascript?

Yes, we use XSL to transform XML to XHTML.

Not sure about the idea of using truncated string with an icon.

10/28/08 15:33:03 changed by josowski

Example:

ATGACT....

10/28/08 15:42:32 changed by amit

Still not sure it conveys clearly to a reader. Also not clear to me what is the scope here. Is the scope of this to worry only long DNA sequences or to worry about long strings in general? If it is DNA sequence only, are there other options we can use to display? Any innovative ideas on the web in this area?

Maybe it is off, but I am thinking of the DNA sequence somewhat similar to having source code in html.

11/03/08 16:01:53 changed by josowski

I've modified the viewlm-v2.xsl to insert zero width space characters into any word longer than N characters (currently set to 40).

I'm not committing until after release.

11/08/08 09:09:21 changed by rich

  • milestone set to 0.9.2.