SEUL-sci Logo Linux in Science Report #9

Previous Report Reports Page Next Report

16 Apr 2001--

Spring...

As the rejuvenation of Spring returns to these frigid lands, exams are written and grades returned, the prospect of Free Time seems less and less a myth. Ideas start to flow, papers start to take shape. Spring is indeed a wonderful time. With this in mind, this report deals with writing...

Linux and Research Writing

One of the primary tasks in research is the communication of existing research and the context it can provide into the primary literature. Many journals accept a number of different formats, many of these formats are commercial formats however. leaving open source proponents wondering about what tools they should use for research writing.

One good option is a text-based format called LaTeX which uses markup codes (not unlike the 'reveal codes' feature of Word Perfect) to delineate various attributes of the text. Usually, writing with a spreadsheet is an iterative process, where formatting and writing happens at the same time. A common argument against that approach to writing is that it tends to make the writer spends more time working on the presentation of the text than writing. Typically, the writing with LaTeX involves three stages: writing, markup and fine-tuning. Writing here focuses particularly on the content of the manuscript and not worrying about how it actually looks at this stage; Many writers leave extensive commented phrases at this stage to help with revisiting the text at a later time. Separating the writing process from the markup process allows the writer to completely focus on formatting the document at once, making it easier for the writer to achieve a consistent look and feel to the document. Minor aesthetic changes are made only after all of the major formatting has been completed. While many are somewhat hesitant to use a text-based mark up system, a graphical front end for making documents in LaTeX is available - LyX.

Of course, writing in a particular format is of little benefit if the files cannot be used. Several organizations already cheerfully accept submissions in LaTeX format, however, and a short list of these would be ACS Publications, American Mathematical Society, Annual Reviews, Elsevier, Institute of Physics / IOP Publishing Kluwer, NRC Press (Canadian Institute for Scientific and Technical Information), and Springer-Verlag to name but a few. LaTeX is a great tool for these and other publishing companies because of its flexibility for these and other printing tasks, so expect to see more and more journals accept input in this format.

The ability to create documents with complex tables and equations is quite important, but research publications also require substantial references to the primary literature. Maintaining these by hand in larger documents is a time consuming. Fortunately, LaTeX has a bibliographic database management system called BibTeX, which is used to generate reference lists based on references in the text. BibTeX is able to use different citation styles to produce both the appropriate cite in the text as well as the in the work cited section of the document. Thus, one can draw from a common set of references and use them in papers destined for different journals having widely varying citation formats. Managing bibliographic databases can be onerous, and one tool that can help in maintaining them is gBib (a project hosted here at SEUL).

One advantage of using a text-based form of markup (eg. LaTeX, TeX and HTML) is the ability to use existing tools for managing file versions across different sites. Programmers have been using the Concurrent Version System (CVS) to manage source code (which is text after all) across many programmers so that each coder has access to the most updated versions of each file, wherever they are. However, it can just as easily be used to manage manuscripts being worked on by several individuals or even over different sites. For instance, a single author working on a manuscript may not have to worry about exchanging files with colleagues, but an article being written by several authors becomes more difficult to manage and keep updated as each author makes changes.

A common practice in many lab courses is for the Department to provide a lab manual to students. The work required to maintain this manual tends to substantial, and often changes and editing get put off partially because of the time required to make and verify the changes, but perhaps mostly because the expense of reprinting the current manual for the present students is prohibitively expensive. A markup language like LaTeX, however, presents a different avenue when combined with CVS: several people (Lab coordinators, faculty and TAs) could be permitted to add their changes to the manual as the year progresses and have the . LaTeX is flexible enough to permit many different presentation formats (paper, PDF, web pages) from the same files. And with the increasing use of computers in teaching labs, it becomes possible to give the students access to the lab manual online through web pages which they can print if needed: updating these is far less expensive than updating printed materials. Also, in cases where paper printouts are indispensable, postscript or PDF provide excellent quality output. And of course, web pages and PDF files are almost universally readable on virtually all operating systems.

Perhaps the supreme utility of CVS is the extra level of protection this provides to work-in-progress. I've seen colleagues accidentally delete critical manuscript files, and lose a great deal of time rebuilding and rewriting them. If a local file under revision control is accidentally deleted, however, recovery is simply a matter of updating the local copy of the repository; missing files will be restored to their current version. While this does not obviate the need for backups, it certainly reduces the potential of lost time.

In my own case, while I was working on my thesis manuscript (which was in LaTeX), I found CVS to be particularly valuable. Since none of the lab machines were running Linux at the time, I could not work on the manuscript directly. However, as I had a number of shell accounts elsewhere that I could access from the lab, the fact that I had the thesis under revision control meant that I was able to maintain local copies of the manuscript files on these various machines. Thus, whichever of these shell accounts I was using at the time, I always had the most up to date files.

Here are some more links and updates for some scientific software that we've found...

EMBOSS - http://www.uk.embnet.org/Software/EMBOSS/
I received an email from David Martin recently about the EMBOSS - The European Molecular Biology Open Software Suite of applications for sequence analysis. From their web page, "EMBOSS is a new, free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages." (License: GPL)

kpl - http://frsl06.physik.uni-freiburg.de/privat/stille/kpl/
kpl is a KDE application used to plot two-dimensional graphical representations of data sets and functions. Multidimensional nonlinear parameter fits of functions to data sets can be performed using the Levenberg-Marquardt algorithm. General linear least square parameter fits are also possible. A DCOP interface can be used to control Kpl by other applications and scripts. Current language support includes English and German. (License:GPL)

Stone Soupercomputer - http://stonesoup.esd.ornl.gov/
Although not a software application, per se, this is an excellent example of scientific ingenuity overcoming financial limitations by creating a Beowulf-class cluster from Linux and Alpha-based computers that have been discarded from normal usage.

Ted http://www.nllgg.nl/Ted/ Ted is a word processor for writing and editing RTF files, and it fills that niche very well. I'm quite pleased with it, and it has made my life much simpler when exchanging files with my Windows-using colleagues. It has other features as well, including conversion of RTF documents to PDF format. (License:GPL)


As always, I look forward to receiving your comments and suggestions for links or future feature articles.

-- Pete St. Onge (pete@seul.org)

Previous Report Reports Page Next Report

Please report any technical problems to webmaster@seul.org.