SEUL-sci Logo Linux in Science Report #10

Previous Report Reports Page Next Report

24 July 2001--

Research Graphics in Linux

My recent focus in these Linux in Science reports has been primarily on various aspects of scientific communications. This report centers on one which is critical to quantitative science: research graphics.

Graphics tasks in research tend to range from the simple (like xy plots) to the increasingly complex (contour maps, PCA plots, etc). A common misconception held by many researchers is the apparent lack of high-quality graphical and graphing tools for Linux and *BSD. This week's report will hopefully help dispel that myth.

Before I get too far into this report, I would like to thank Aaron Malone for the paragraph on web graphics. Aaron is a long-time member of SEUL and SEUL/sci.

Linux and Research Graphics

One of the primary tasks in research is the communication of results in a transparent way so that independant assessments can be made as to the applicability and generality of that research. Peer review is, after all, the cornerstone of the scientific method.

The specialized nature of software required to produce research graphics, be it for image analysis or the graphical summarization of results, tends to make commercial versions rather expensive to obtain. Certain types of research graphics are typically only created by statistical software (eg. principal component analysis) or are otherwise time-consuming to make by hand. Another consideration, particularly for exploratory data analysis or visual verification of structured data, is the ability to create a large number of graphs through some batch-processing facility in combination with simple scripts.

The breadth of graphing software for Linux and other Unix platforms has expanded considerably over the last few years. I've mentioned some of these applications in previous Linux in Science reports, and they bear a fresh mention. Scigraphica and Grace are both well-known GUI graphing packages. Scigraphica has improved substantially in the last few months, and will be familiar to users having experience with a program like Origin or SigmaPlot. Grace is a mature package offering lots of functionality, several types of graph types, and perhaps most importantly for the purposes of large datasets, a command-line interface most amenable to scripting. Further, Grace has a python interface to allow 'live' scripting.

As command-line programs, Guppi and Gnuplot are well-known and quite capable. Gnuplot has a long history in being used for dynamic web graphs, given its flexibility. Guppi has become the standard graphing tool in many other applications, most notably the Gnumeric spreadsheet. The R statistical package probably has the most developed graphing facility, with a great deal of flexibility as to the placement and presentation of data. For mathematical modelling, the similarity of Octave to the commercial MatLab program makes it ideal for mathematical / function graphing.

Diagramming is another critical aspect of research graphics. Conventional graphing programs tend to be less useful in this area because of their complexity for most diagrams. Two excellent offerings for Linux users are xFig and Dia. xFig has a long history of use in scientific and engineering and is easily one of the most mature and capable Unix applications I've had the pleasure of using in a very long time, as I recount below. Dia is a more recent but similarly well-developed application, similar to the commercial Visio. Alternatively, there is also a vector graphic format used by the LaTeX typography system, and some LaTeX packages are very useful when trying to create certain types of diagrams (chemical or electrical structures, musical notation, etc) that might otherwise be difficult to creating using a typical diagramming program.

Perhaps one of the most important considerations when presenting research graphics is the format in which that information will be presented, taking into account the inherent differences between screen and and paper output. I've often been misled by the appearance of a graph on the screen, thinking that since the screen is graphical, the resultant image should be as well-defined on paper or on the web page. This is often not the case, however, as the resolution of the screen far poorer than most paper documents.

There is literally a plethora of print graphic formats (EPS, BMP, JPG, WMF, WPG, etc), the bulk of which are commercial in nature. Generally, these files are either line-based (vector graphics) or bitmaps (raster graphics). Raster-based formats (eg. JPEG or PNG) are best suited for digital representations of images / photographs. Graphs and diagrams are almost always best stored in vector formats for printing. These formats will tend to scale better on paper without losing fine detail from graphical elements or textual annotations. Perhaps the most popular format for these graphics is Encapsulated Post Script (EPS). Most journals are amenable to this format as they are easily imported by desktop publishing applications (EPS is a well-known and standard format). The LaTeX vector graphic format described above is also widely used.

For web graphics, choices are limited to three formats: GIF, JPEG (often notated "jpg" because of the filename limits on some obsolete operating systems), and PNG. GIF, however, is considered to be a deprecated format and is often completely unusable in open-source work because of patent encumbrance. JPEG is a "lossy" format, meaning that detail is sacrificed in favour of compression efficiency. This makes it ideal for photographic data, where some colour-blending will not be noticed. PNG, on the other hand, is a "lossless" format. File size will depend on the complexity of the image; line-art and simple diagrams compress to extremely small sizes in PNG, whereas detailed photographs can end up quite large, and would often be better served by JPEG compression. If you're not sure which format to use, the best idea is to just try both, and compare the resulting quality and file size.

Although these three format options are typically available for print publications, unless the graphic is an image (photograph, gradient or similar), it is almost always best to use postscript (PS or EPS), LaTeX (TEX) or similar line-type graphic format. The former will often render unsatisfactorily, giving blocky and poorly defined images. The latter will scale better, take less space and generally give far clearer output. As mentioned in the last Linux in Science Report, most publishers will gladly accept these formats for graphics, especially since these formats are the most versatile to work with.

One advantage of using standard, open formats is the flexibility that is offered. One situation I often see is the properly cited use and adaptation of graphics from other articles (sometimes called ('meta-analysis'). A case in point was the study area map used in my own M.Sc. thesis. Having found a suitable study area map in a published article, I noted that not only were there a number of errors (albeit slight ones) in the original map, the symbology used to indicate certain features was completely inconsistant with that used in the thesis. Once I converted the postscript file to fig format using pstoedit, I was easily able to modify the graphic in xfig without losing the original infomation contained in the map while correcting the errors and altering the colour and appearance of the relevant symbols. I was also able to add an inset map I had previously made and correct some aethetic issues I didn't like in the original map.

Another useful aspect of pstoedit lies in its ability to "sanitize" postscript files of questionable quality. I had previously made a number of graphs in a Win32 graphing application and printed them to postscript files. A couple of years later, when I went to use them in a LaTeX document, the images didn't come out properly. Sent individually to a postscript printer (or even the postscript filter used by my own Linux box), the graphs came out fine. What I hadn't realized was that these graphic files were malformed (missing some rendering information required to output properly) when included as part of a larger document. Converting to fig format and back to encapsulated postscript (EPS) resolved all of the problems.

In summary, although there is no single graphing and diagramming tool that will satisfy all graphing and diagramming tasks, I found that learning a couple of the many available programs will cover most research needs.

Here are some more links and updates for the software mentioned in this report ...

Dia http://www.lysator.liu.se/~alla/dia/
Grace http://plasma-gate.weizmann.ac.il/Grace/
Gnumeric http://www.gnome.org/projects/gnumeric/
Gnuplot http://www.gnuplot.org/
Guppi http://www.gnome.org/projects/guppi/
LaTeX http://www.latex-project.org/
Octave http://www.octave.org/
pstoedit http://www.geocities.com/SiliconValley/Network/1958/pstoedit/
R http://www.r-project.org/
Scigraphica http://scigraphica.sourceforge.net/
xFig http://www.xfig.org/

As always, I look forward to receiving your comments and suggestions for links or future feature articles.

-- Pete St. Onge (pete@seul.org)

Previous Report Reports Page Next Report

Please report any technical problems to webmaster@seul.org.