SEUL-sci Logo Linux in Science Report #8

Previous Report Reports Page Next Report

02 Apr 2001--

Post-Linux World Expo Report

Another Linux World Expo has come and gone, and life has returned to normal. This was my first visit to Linux World Expo, and it was quite an eye-opening experience. I was quite pleased to see the .ORG pavilion located in the center of the convention hall, which hasn't always been the case at previous Expos I've attended. SEUL group was there en masse, and despite our best efforts, someone did manage to obtain photographic evidence of our presence there. Thanks to Daniel Markle for organizing the photos, many of whom were taken by Harry McGregor. I would also like to thank MandrakeSoft for its generous financial support of our trip; indeed, the majority of us would have otherwise not have been able to attend. They also lent us a computer for the occasion so that we could demonstrate SEUL applications for passers-by. And hey, their penguins are darned cute too.

Overall, there were a number of interesting displays. Among these were the dual-processor Athlon system demonstrated by ASL Labs at the MandrakeSoft booth, a very cool display by members of the Brookhaven National Laboratory demonstrating the use of Linux in physics research there, the Flight Gear flight simulator project with a running FlightGear demo, and also the Etherboot project which has made important strides in making booting x86 PCs over a network an easier proposition,

Clearly, if the commercial hype around Linux is any indication, I'm left with little doubt that Linux is indeed moving into the mainstream of computing. More heartening is the fact that I can now could a significant number of my colleagues as Linux users. Software choices have become far wider than even a year ago, and existing software is only getting better. One example is how current versions of the R Statistical System are now able to download and install or upgrade current R packages with a simple command. There is some discussion towards creating a similar capability for the GRASS GIS, and I am sure it will happen soon enough.

About three years ago now, my first graduate supervisor passed away. As I was going through some copies of old disks from his research, I was struck by the fact that I was still quite able to read the media. I also recalled how I know many labs still using a rather mature software program for a particular task, and how it only runs in DOS if at all. This experience prompted the 'feature' for this report, namely ...

Linux and Obsolescence (or 'Old files never die...')

(Note: thus far, these "special features" have focused more on the mechanics of the use of Linux in Science rather than on the science done using Linux. Subsequent features will focus on how people are using Linux in their science. I *want* to know how *you* are using Linux in your own work, whether you're a tenured prof doing leading-edge research, or a student working on assignments. Join the seul-sci mailing list, and share your experiences!)

A few months ago, I saw a page by a lab workstation with an archaic, 8" floppy disk taped to it, and the message "Always transfer your files; Formats change, and you never know when you won't be able to read them anymore." I think most of us, at one time or another, have had to deal with compatibility problems in physical media. More often, though, incompatible data file formats present the big headaches.

Most Linux software has some very substantial built-in safeguards to guard against this sort of software obsolescence. Many programs use text files to store data, and even those using complex formats (like gnumeric and SciGraphica) use XML, which itself is text. Although in some cases it may be inconvenient to parse through text formats to get data into a newer version of a given application, common editors like vi and emacs both support regular expressions to simplify large-scale data taming. Of course, combining the power of stream editors like sed and awk with simple shell scripts can also help make short work of large data tasks.

From a system perspective, Linux has several advantages over most commercial systems to ward off obsolescence. The Linux kernel maintains the full range of filesystem types (minix, msdos, vfat, hfs, ntfs, ext2, iso9660, reiserfs and ext3 - I'm sure I've missed some!) so data and other files stored previously on more mature filesystems are easily read by modern Linux workstations. I'm aware of many labs that use rather old DOS-based programs for some analyses as either nothing more recent currently exists for other platforms, or the equivalents are not yet known. In these cases, it is possible to use an emulator (like plex86) to run these applications.

Also, hardware support for 5.25" floppy drives or other devices is still available, so reading the files is still possible. Of course, combining simple shell scripts and two older drives on an older machine, one could make shorter work of a large pile of ancient floppies. Whether files are copied directly or imaged using dd, they can be burned to CD for future reference or later conversion to other formats.

Finally, Roberto Di Cosmo presents an interesting perspective paper on computing in Europe and the quality of some typical commercial computing tools Dr. Di Cosmo presents some very topical arguments that are valid even today, despite the article now being over four years old.

Here are some more links and updates for some scientific software that we've found...

A recent addition to SEUL/sci, GDIS is a GTK program used to display and manipulate isolated molecules and periodic systems. It can read several common file formats (BIOSYM, XYZ, XTL, MARVIN, and GULP), is able to animate BIOSYM files, and draws on other programs (like GPeriodic and POVRAY) for specialized functions. Welcome aboard, Sean!

GGobi Data Visualization System -
Some time ago, I became acquainted with the xgobi data exploration system which enabled me to view the multitude of data used in my thesis. According to their web page, "GGobi is a data visualization system for viewing high-dimensional data and is the next edition of xgobi. It provides a new interface to many of the features of xgobi, built using Gtk, the GIMP toolkit" and features include: new interface, direct access from R, Perl and Python, a new input format using XML, and finally, database (MySQL) support. (License varies by component: GPL, LGPL and AT&T Open Source license)

OpenReference -
"OpenReference is a Servlet/JSP based database application to manage your or your group's research references. It is ideal for academic and business researchers! Best of all, it is completely free and comes with source code under GPL. Since I use it extensively myself, I will continue to add in more features and improvements." (GPL)

One Wire Weather
OWW is a RISC OS/Linux interface to the Dallas Semiconductor One Wire Weather Station. As researchers look to capture meteorological data for field experiments, this is an affordable alternative to large commercial-quality met stations. The software can read and log numerous instruments (wind speed and direction, temperature, precipitation quantities and rates). (License: Free for non-commercial use)

"OpenVRML is a free cross-platform runtime for VRML available under the GNU Lesser General Public License. The basic OpenVRML distribution includes libraries you can use to add VRML support to an application, and "Lookat", a simple stand-alone VRML browser."

pybibliographer - Pybibliographer is a bibliographic database management tool, which supports several current formats: BibTeX, Medline, Ovid, Refer. It is a Python-based package able to search, edit, reformat bibliographic entries. A GNOME interface is available, and allows direct insertion of references into LyX (1.0.x).

Scientific Image Database -
"SIDB archives 2-D, 3-D images. Image files are stored unchanged in a central directory (archive). Users of the system are subdivided in groups, and whoever owns an image (by uploading it) can determine who else on the system is allowed to view and use the image. Files can be uploaded through HTML, or using a mounted (smb, NFS, etc..) drive. Entering meta-data is facilitated by user-definable templates. The meta-data fields currently in use have been designed for images derived by (confocal) microscopy. When combined with cheap, large hard drives and a fail-save backup mechanism, SIDB provides a perfect means to archive images within the setting of small to medium-sized research groups. However, it might be of use wherever people collaborate on images." (GPL)

As always, I look forward to receiving your comments and suggestions for links or future feature articles.

-- Pete St. Onge (

Previous Report Reports Page Next Report
Please report any technical problems to