gEDA-dev: SPICE rawwaveform file

Anthony J Bybell netracurse at nc.rr.com
Tue Dec 4 19:48:48 EST 2007


On Tue, 4 Dec 2007, John Griessen wrote:

> I imagine a combination of both arrangements of data would be good for retrieving from disk drives.
> First, record waveforms so they have values in a string for each wave, up to some time interval that is manageable, with time
> segment ID at top of string, then switch to the next waveform until all done, then start the first wave for the next time
> interval, repeat.  That would let you find the chunks of data for wave 989 out of 3400 waves much quicker than
> data arranged as all values per tiniest time step, repeat.

LXT2 (in "partial mode") and VZT file formats in gtkwave more-or-less do
that and break the file internally into blocks of a size of user-specified
timesteps.

I agree that arranging the data like VCD does is a mistake w.r.t. speed as
you have to parse the whole file to extract all the value changes for a
given waveform.  There's also the fact that tagging the data with an
identifier (think of the VCD ID codes) is highly inefficient as which
nodes switch in a circuit are unpredictable from the point of view of a
data compressor.  (i.e., more information is contained in the identifiers
than in the timestep data)  You're better off grouping the data for a
given wave over a time range and emit the time deltas (into a master
index) for it.  You can do all kinds of weird transforms like linear
prediction, etc at that point as your data for a given wave is all in one
place.  Also, if the data is continuous, you really have no timesteps
outside of the start time and the step width from sample to sample.

Note that I have actually seen one efficient dumpfile implementation that
uses sorted identifiers and emits the skip value between IDs, but that
method is patented by Novas.


> Perhaps the only non-database method to have speed for both kinds of searches is to record it both ways in
> the same data file?

It depends on what you're trying to do but that would lead to some huge(r)
files as you're saving the data twice.


> There is an assumption here that you can't have access to disk drive details like sector size, but maybe disk tools like LVM could
> help.  When you create a merge of two disks, can a tool like LVM create a new "virtual block size" that is smaller than the
> physical ones?  If so, you could create a storage method that uses the small block size as in "start all data chunks at a new
> block, with the fist line being a time segment and wave number tag.  Then some of the work done by a database would be done in
> mapping actual disk block zones of data to the small virtual blocks that let you _randomly_ access data better.

You probably wouldn't care about disk block size as you'd have no idea of
what the underlying filesystem or OS buffer cache is doing (and if you
filter the data through a compressor such as zlib it's irrelevant).  Note
that randomly accessing the data is beyond the realm of what visualization
tools typically would do as they're going to cache viewable data off on
the side for easy access.  Otherwise, you'd have the viewer chugging
whenever you move the scrollbars.

Keep in mind that that if a user needs to extract/mine the data so it
looks like it came from a regular spice waveform file, it's still possible
as all you have to do is throw the individual waves' time indices on a
heap and heapify through it (or simply use a radix sort).  Of course, for
continuous-time data, it's a no-brainer.


> And the data could be just easy access text and still get the benefit I am thinking of above.

Text is good for ease but it still has to be parsed and converted to a
floating point representation (or vice-versa during writing).  For
enormous traces, the amount of time doing that is probably not
insignificant.  *shrugs*

BTW, one interesting aspect of block-based approaches (or any
non-sequential ones) is that they can be processed by multiple pthreads.
In gtkwave, the VZT loader can prefetch and reformat future blocks while
the current one is being processed.  Given that processors are multi-core
these days, having an SMP-capable algorithm/format is something to think
about.

A lot of this is a black-art thing and requires experimentation.  The most
important thing is developing a writer/reader API first so you can plug in
various formats to testcode and not be tightly coupled to your tools.

-Tony



More information about the geda-dev mailing list