Integers are 16 bit two's complement and stored in big-endian format as on Sun Sparc and opposite to the Dec VAX.
Single precision real values are 32 bits long, in big-endian format. The high bit is the sign bit, followed by a 7 bit excess 64 exponent (power to which 16 must be raised) then a 24 bit hexadecimally normalized mantissa with the decimal point to the left of the most significant bit. Double precision values just have another 32 bits tacked on the mantissa and the same exponent format.
Sign |<-->|<------ Exponent ------>|<--------- Mantissa -------->| ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 31 28 27 24 23 20 19 16 |<----------------------- Mantissa ------------------------>| ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0
Here is a little piece of C++ code that should run on anything and convert Data General floats to whatever the host's floating point format is.
double value; unsigned char sign; Uint16 exponent; Uint32 mantissa; typedef struct { unsigned sign : 1; unsigned exponent : 7; unsigned mantissa : 24; } DG_FLOAT; DG_FLOAT number; unsigned char buffer[4]; instream.read(buffer,4); if (instream) { // DataGeneral is a Big Endian machine memcpy ((char *)(&number),buffer,4); sign = number.sign; exponent = number.exponent; mantissa = number.mantissa; value = (double) mantissa / (1 << 24) * pow (16.0, (long)(exponent) - 64); value = (sign == 0) ? value : -value; } else { cerr << "read failed\n" << flush; value=0; }
Used on the GE CT 9800 family. Severely primitive but then is running on an old machine that can only map 64Kb of memory at a time after all. It is apparently multitasking. Documentation may still be available from Data General (try DG Direct) but is not supplied with the scanner by GE. If anyone knows where I can find it at a reasonable price let me know. Here is a brief command summary culled from a nifty pocket book from GE for SunOS/Genesis users that compares commands:
CHATR - file attributes CRAND - create randomly organized file CDIR - create directory DELETE - files or directories DIR - change directory DISK - free space FILCOM - compare files GDIR - show working directory name GTOD - show date and time LINK - files (symbolic) LIST - directory contents MOVE - a file RENAME - a file SDAT - set date STOD - set time SDUMP - write files to a device SLOAD - read dumped files SPEED - tex editor TYPE - contents of file XFER - copy a file wildcards: '-' is series, '*' is single character
Used on the GE Signa 3X and 4X family. Quite a nice operating system with multi-tasking and hierarchical directories. Here is a brief command summary again culled from a nifty pocket book from GE for SunOS/Genesis users that compares commands:
ACL - access control list (ownership) BYE - exit command process COPY - a file CREATE - a text file CREATE/DIR - a directory CREATE/LINK - link files DELETE - files & directories DIR - display or change working directory DUMP - to peripheral F/AS/S - directory listing with file status DATE - show or set HELP LOAD - DUMPed files MOVE - a file RENAME - a file PATH - show pathname of a file PAUSE - the command line interpreter SUPERU ON - enable superuser SED - text editor TIME - show or set TYPE - contents of text file ? - list processes running wildcards: '+' is series, '*' is single character
Other useful hints include the use of "^" to refer to the next directory up (like ".." in Unix) in DIR commands. Command options follow the command name without any spaces and are indicated by a slash. COPY operations specify the destination name first and then the source name. Devices like the mag tape are indicated by "@", for example "@MTB0" is tape drive zero. Files on the tape can be referred to as "@MTB0:nn" which is very handy. For example to read a file off a CT 9800 tape under AOS/VS:
COPY/V/IMTRSIZE=8192 B038040101.YP @MTB0:18
Perhaps most importantly, there is an extensive online help system ... use the HELP command.
If you have a GE Signa based on a DG then you can get the so-called "High Speed Network" card and software from GE. From memory it is pretty pricey, and there used to be a "slower" network interface that was cheaper, but I don't think this is available anymore.
If you have a CT 9800 based on the DG S/140 and you need to get it connected there are a number of solutions:
$2,850 - EC-10 ethernet controller $1,645 - RDOS TCP/IP software (telnet client,ftp client/server)
I have not personally tried either of these approaches, and I am sure there are others (talk to Merge or DeJarnette), but I am getting really tired of carrying 9-track tapes around so perhaps I will bite the bullet soon (and upgrade to a HighSpeed Advantage !).
Truely one of the world's most irritating operating systems to use, especially if you are a unix fan. Still it works, has a great online help system that saves one's butt almost often enough to be useful, and if you can remember the directory where kermit is stored and the weird command to invoke it one can get by (barely).
If you don't know VMS and the vendor doesn't supply the manuals, get them from DEC ... you need them bad ... real bad. If (like me) you throw them out everytime you move then encounter another piece of archaic equipment, you need the "vaxbook" which is available via ftp from decoy.uoregon.edu, written by Joseph E St Sauver, which summarizes commands, files and all sorts of application specific stuff, though it is no substitute for the real thing.
Recent VMS update: goddamn file formats ! Why can't VMS behave like a real operating system and forget this file format crap ! I have some Philips S5 MR images exported in ACR/NEMA format and I can't get the things off the hosts's Vax using Kermit, because though they have fixed length 512 byte records, some cretinous program sets the "carriage return carriage control" record attributes, which causes kermit to send with all the '0A' characters scrubbed out amongst other atrocities.
I am getting desperate and about to try using the Hex/Dehex utility that came with Kermit to get the stuff off and then decode the hex format ! Or perhaps even use "dump" to make a textfile, transfer, and decipher that. (No I don't have a C compiler for the Vax so I guess I can't use uuencode unless someone wants to mail me a hex'ed executable). Any hints, or instructions as to how to use FDL and Convert, to change it to a normal format would be appreciated. (Why can't they just have a "set file record attribute xxx" command like all the other millions of set commands ? Grrrr.).
More recent VMS update: finally had an inspiration while staring at hex dumps of these files - why not use the VMS "DUMP" utility which produces hex dumps as a "poor man's uuencode" by saving the dump to a file, transferring it as an ascii file, and then decoding it at the destination ? Of course there are no nifty line checksums or anything, but a transfer protocol such as kermit takes care of this.
The DUMP output defaults to 8 32 bit long words separated by a space per line displayed as hex, then an ascii string (32 bytes) and then a 24 bit word hex address offset from the start of the fixed length record. All the data containing lines start with a single space, where as descriptions at the start of each record begin in the first column, hence the data lines can be easily selected out. By the way, the hex version of the data is listed in reverse order ! VMS is so bizarre ! For example, here is a fixed length 512 byte record file from a Philips S5 MRI (some of the hex words elided to make the line fit on the page):
Dump of file SYS$SYSROOT:[GYROSCAN]ABAALKHAIL02010201010001.ANI;1 ... File ID (2419,301,0) End of file block 198 / Allocated 200 Virtual block number 1 (00000001), 512 (0200) bytes 0000000C 00100008 ... 00000008 ........¶...........ð........... 000000 00083932 2E36302E ... 2D524341 ACR-NEMA 1.0.. .....1994.06.29.. 000020 00600008 4D5F4553 ... 00000030 0.......@.........A.....SE_M..`. 000040 494B0000 00100080 ... 00000002 ....MR..p.....Philips ........KI 000060 00183148 00000002 ... 32200000 .. 2........63865375........H1.. 0001E0 ^L Dump of file SYS$SYSROOT:[GYROSCAN]ABAALKHAIL02010201010001.ANI;1 ... File ID (2419,301,0) End of file block 198 / Allocated 200 Virtual block number 2 (00000002), 512 (0200) bytes 40000018 45424F52 ... 00161250 P.....AGACQ_PT_SURFACE_PROBE...@ 000000
And so on ... you get the idea. This ugly little C++ utility written quickly during this moment of inspiration will take saved DUMP output and make it binary again:
#include <fstream.h> #include "MainCmd.h" signed char hextobin(char c) { signed char r; switch (c) { case '0': r=0; break; case '1': r=1; break; case '2': r=2; break; case '3': r=3; break; case '4': r=4; break; case '5': r=5; break; case '6': r=6; break; case '7': r=7; break; case '8': r=8; break; case '9': r=9; break; case 'A': case 'a': r=0xa; break; case 'B': case 'b': r=0xb; break; case 'C': case 'c': r=0xc; break; case 'D': case 'd': r=0xd; break; case 'E': case 'e': r=0xe; break; case 'F': case 'f': r=0xf; break; default: r=-1; break; } return r; } int main(int argc,char **argv) { CCOMMAND(argc,argv); while (1) { const linemax=132; // only needs 113 char line[linemax]; cin.getline(line,linemax); if (!cin || cin.eof()) { // cerr << "Bad or eof\n" << flush; break; } unsigned count=cin.gcount(); if (count == 0 || line[0] != ' ') continue; if (count != 113) { cerr << "Line length " << count << "\n" << flush; break; } unsigned i; char *ptr = line + 8*(1+8); // line is in reverse order ... for (i=0; i<8; ++i) { unsigned j; for (j=0; j<4; ++j) { // 2 hex bytes -> 1 byte char bytelo = *--ptr; char bytehi = *--ptr; unsigned char byte = (hextobin(bytehi)<<4) + hextobin(bytelo); cout.put(byte); } --ptr; // space between long words } } return 0; }
Note that the nature of fixed length records under VMS means that the last record will be padded out to 512 bytes without any indication of the "real" end-of-file. This means you have to cope with trailing garbage gracefully.
Hot VMS/Philips news: neelin@pet.mni.mcgill.ca (Peter Neelin) tells me there is an extremely useful tool for fiddling binary files called FILE from DECUS. It allows you to change a file's header information without modifying the content of the file. This then permits ftp, kermit, etc. to do the right thing with Philips .ANI files. It also permits wildcards and does not make a copy of the file (so it is fast). He says also that someone has told him that they succeeded in using convert to fix these files, but his general experience with it is not positive (it will often change the content of the file and it doesn't allow wildcards, in addition to promoting the use of the horrible fdl editor!). If you are interested, you can get FILE through gopher from decus.org (look for the DECUS software library archives, under essential tools). The binary is provided in case you don't have a compiler. FILE, and many other useful things are also available from the sites listed in Vax VMS Tools.
Some other useful hints:
UNIX FTP server Vax/VMS FTP server cd dir cd [.dir] cd dir/subdir cd [.dir.subdir] cd .. cd [-]
The sun3 and sun4 architectures use much the same formats. Even though the processors are different both are big-endian and the float formats are IEEE. See the Sparc Architecture Manual - Chapter 3 - Data Formats for more details.
One very important difference though, is that the sun3 convention is not to align 32 bit and 64 bit data types on 4 and 8 byte boundaries respectively, whereas the sparc (sun4) architectures usually does, dictated by a compile time option. Be very careful when using the same header files on one architecture or the other. This drove me nuts when trying to figure out why the well described Genesis (sun3) layout did not match the unknown Advantage Windows (sun4) data. It was pretty obvious when it was pointed out though :).
Integers are 8, 16, 32, or 64 bit unsigned or signed two's complement and stored in big-endian format as on Data General and opposite to the Dec VAX. Most C compilers treat short as 16 bits, and int and long as 32 bits.
Formats conform to the IEEE 754-1985 Standard for Binary Floating-Point Arithmetic. Single precision real values are 32 bits long, in big-endian format. The high bit is the sign bit, followed by a 8 bit excess 127 exponent (power to which 2 must be raised) then a 23 bit normalized mantissa with the decimal point to the left of the most significant bit, from which 1.0 has been subtracted. Double precision values have a 11 bit excess 1023 exponent and a 52 bit mantissa. Quad precision values have a 15 bit excess 16383 exponent and a 112 bit mantissa.
Sign |<-->|<-------- Exponent -------->|<------- Mantissa ------>| ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 31 28 27 24 23 20 19 16 |<----------------------- Mantissa ------------------------>| ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0
Here is a little piece of C++ code that should run on anything and convert Sun IEEE floats to whatever the host's floating point format is. It probably should take into account a few special cases to be strictly correct:
unsigned char buffer[4]; instream.read(buffer,4); if (instream) { #ifdef USESUN4NATIVEFLOAT float fvalue; memcpy ((char *)(&fvalue),buffer,4); value=fvalue; #else USESUN4NATIVEFLOAT unsigned char sign; Uint16 exponent; Uint32 mantissa; typedef struct { unsigned sign : 1; unsigned exponent : 8; unsigned mantissa : 23; } IEEE_FLOAT_SINGLE; IEEE_FLOAT_SINGLE number; // Sparc is a Big Endian machine memcpy ((char *)(&number),buffer,4); sign = number.sign; exponent = number.exponent; mantissa = number.mantissa; if (exponent) { value = (1.0 + (double)mantissa / (1 << 23)) * pow (2.0, (long)(exponent) - 127); } else { if (mantissa) { value = (double)mantissa / (1 << 23) * pow (2.0, (long)(-126)); } else { value=0; } } value = (sign == 0) ? value : -value; #endif USESUN4NATIVEFLOAT } else { cerr << "read failed\n" << flush; value=0; }
Strings obey the usual C convention of null terminated strings without a length preamble.
In DICOM, compression (both reversible and irreversible) is achieved by specifying a particular "transfer syntax" either during negotiation of the network connection (association) or in the media application profile for files stored on media (and specified in the meta information header so the reader knows which transfer syntax to switch to).
The compressed data stream is actually encoded as an "encapsulated" data stream as defined in Part 5 of DICOM. Uncompressed data (unencapsulated) is sent in DICOM as a series of raw bytes or words (little or big endian) in the Value field of the Pixel Data element (7FE0,0010). Encapsulated data on the other hand is sent not as raw bytes or words but as Fragments contained in Items that are the Value field of Pixel Data. The encoding of these Items follows the same pattern as is used to specify Sequences in DICOM, thogh the VR (Value Representation) field of the Pixel Data is OB not SQ.
The encapsulated compressed data may be a single frame or it may contain multiple frames for those SOP Classes that allow multifram images (such as XA, XRF, US and NM). The rules in part 5 further specify that the first Item will either be empty or contain a list of offsets to the beginning of the Item containing each frame (or the only frame for a single frame image). Also, though a frame may be split into multiple fragments, each fragment may contain data for only one frame. That is a frame may be split into multiple fragments, but a fragment may not span different frames. The reason for the fragments in the first place is that each fragment (each item) must have a fixed, known length, so unless one buffers the entire compressed frame before encoding it, one doesn't know in advance how long it will be. In practice, most encoders do send one frame per fragment but all decoders must be prepared to handle the case where a frame spans fragments. Furthermore, all fragments have to be of even length, and there are padding rules in Part 5 for the last fragment of a frame (that are consistent with the definition of padding in the JPEG standard).
Part 5 contains several examples of how to fill in the various fields in Items of the encapsulated sequence-like value for Pixel Data, so these will not be repeated here. However the overall strategy looks something like this for an image with two frames,the first split across two fragments, and an empty offset table:
(7FE0,0010) VR=OB VL=FFFFFFFF Pixel Data (FFFE,E000) VR= VL=00000000 Item (empty offset table, hence zero length) (FFFE,E000) VR= VL=000004C6 Item (first fragment of first frame) .... compressed byte stream here (4C6 bytes) (FFFE,E000) VR= VL=0000024A Item (first fragment of first frame) .... compressed byte stream here (24A bytes) (FFFE,E000) VR= VL=00000628 Item (first fragment of first frame) .... compressed byte stream here (628 bytes) (FFFE,E0DD) VR= VL=00000000 Sequence Delimiter
Note that the Item and Sequence Delimiter tags have no VR, that the Item Delimiter tag is never used, since Items are required to be of fixed not undefined length, and that the Sequence Delimiter tag is always used, since the Pixel Data is always of undefined length (that is FFFFFFFF) for encapsulated data.
If one is trying to decode a DICOM image encoded with an encapsulated transfer syntax, one therefore has to get to the Pixel Data tag, and start parsing the sequence like structure. One cannot just pass the entire Value field of Pixel Data to a conventional JPEG decoder for instance. One needs to strip out the embedded Item tags and the trailing Sequence Delimiter. For an example of how to do this see the source code from dicom3tools in "libsrc/include/pixeldat/unencap.h", a simplified version of which (without the GE bug handling) is reproduced here.
size_t read(void) { // - non-pixel data is always LE, including fragment delimiters and lengths // - 1st item is offset table, may have zero VL // - other items are fragments // - finally sequence delimitation tag (with zero VL) // - each delimiter is 2 byte group,2 byte element, 4 byte VL, little endian // - Item tag is (0xfffe,0xe000) // - Seq delimiter is (0xfffe,0xe0dd) length=0; while (!lefttoreadthisfragment && !finished && !bad) { Uint16 group=read16(); Uint16 element=read16(); Uint32 vl=read32(); if (group == 0xfffe) { if (element == 0xe0dd) { // Sequence Delimiter Tag Assert(vl == 0); finished=true; } else /* if (element == 0xe000) */ { // Item Tag bool vlbyteorderwrong=false; if (++fragmentnumber > 0) { Assert(vl); // Zero length fragments thought not to be legal lefttoreadthisfragment=vl; } else { // skip the offset table Assert(vl%4 == 0); unsigned i=0; while (vl) { Uint32 offset=read32(); vl-=4; ++i; } } } } else { // bad tag group in encapsulated data bad=true; } } if (lefttoreadthisfragment && !bad) { length=unsigned(lefttoreadthisfragment > maxlength ? maxlength : lefttoreadthisfragment); if (istr->read(buffer,length)) { length=istr->gcount(); } else { bad=true; length=0; } lefttoreadthisfragment-=length; } return length; }
An application that will take a DICOM dataset and write a pure byte stream (having stripped off the DICOM encapsulation) is also in dicom3tools, "dctoraw". One can feed the output of this utility straight to a JPEG decoder such as the Stanford PVRG utility "jpeg -d". If any padding is present at the end of each frame, it should have been encoded in a manner consistent with JPEG padding defined in ISO 10918-1 so that the JPEG decoder won't fail if it encounters padding between the image frames.
Note also that the use of the terms "image" and "frame" are slightly different in DICOM than JPEG so be careful when comparing the two standards.
When using images with more than one component (that is a color image rather than a grayscale image), take care about the color space. One of the features of the ISO 10918-1 JPEG standard is that it specifies only a compressed bitstream, and not a file format. Even if there are three components specified in the compressed bitstream, that does not mean they are RGB or YBR or whatever. This has to be signalled outside the bitstream, and in DICOM this is done in Photometric Interpretation (this is somewhat controversial however, and one should look at recent proposed DICOM CPs on the matter, such as CP 143).
In the non-DICOM world, the color space is specified in the file header such as the commonly used JFIF header, or its superset, the SPIFF header as defined in ISO 10918-3. Be especially careful that one does not assume during decoding that a JFIF header is present in the DICOM compressed bit stream ... it is not. If one wants to feed the extracted bitstream to a JPEG decoder that needs a JFIF header (like the IJG code), then you need to add one. Conversely, never create an encapsulated DICOM image with a bitstream that contains the JFIF header ... strip it off first or use an encoder like Stanford PVRG JPEG that doesn't create JFIF headers.
Here JPEG has been discussed, but the same principle applies to other encapsulated data sets in DICOM, including the RLE compression scheme popular in Ultrasound images (which is equivalent to the TIFF PackBits compression scheme). The compression scheme to interpret the encapsulated bitstream is different, but the encapsulation mechanism using Item tags and fragments is identical.
This mechanism has been widely used in the cardiac angiography world on the DICOM CDs that these devices make, on Ultrasound 90 mm MODs, and on GE's more recent CT and MR scanners that write use the CT and MR media application profile on 130 mm MODs. Note that early implementations of the encapsulation mechanism and the JPEG lossless encoding contain some bugs which are described in detail in the section on GE CTI.
Nine-track half-inch tapes were the old medium of choice for archiving and image exchange and many older pieces of equipment will have these. Unfortunately most people don't have such a drive on their workstation or personal computer. There are several possibilities:
The Qualstar 1054 is one such drive, that attaches to a SCSI port, and works with the regular SunOS SCSI tape driver, once a few tables in the kernel have been updated as follows, and the kernel rebuilt:
{root}% pwd /usr/kvm/sys/scsi/targets {root}% diff -c stdef.h.prequalstar stdef.h *** stdef.h.prequalstar Tue Aug 30 19:32:24 1994 --- stdef.h Tue Aug 30 19:32:24 1994 *************** *** 43,48 **** --- 43,49 ---- #define ST_TYPE_FUJI 0x21 /* Fujitsu - (not tested) */ #define ST_TYPE_KENNEDY 0x22 /* Kennedy */ #define ST_TYPE_HP 0x23 /* HP */ + #define ST_TYPE_QUALSTAR 0x24 /* Qualstar */ #define ST_TYPE_HIC 0x26 /* Generic 1/2" Cartridge */ #define ST_TYPE_REEL 0x27 /* Generic 1/2" Reel Tape */ {root}% diff -c st_conf.c.prequalstar st_conf.c *** st_conf.c.prequalstar Tue Aug 30 19:32:22 1994 --- st_conf.c Tue Aug 30 19:32:22 1994 *************** *** 153,158 **** --- 153,174 ---- * so our best guess as to their capabilities is * included herein. */ + /* Qualstar 1054 or 1260s scsi 9-track with 64KB buffer */ + { + "Qualstar 1054/1260s 1/2\" Reel", 7, "NCR ADP-53", ST_TYPE_QUALSTAR, 10240, + (ST_REEL | ST_VARIABLE | ST_BSF | ST_BSR), + 300, 300, + { 0x00, 0x02, 0x06, 0x03}, + { 0, 0, 0, 0 } + }, + /* Qualstar 1054 scsi 9-track with 256KB buffer */ + { + "Qualstar 1054 1/2\" Reel", 10, "QUALSTAR10", ST_TYPE_QUALSTAR, 10240, + (ST_REEL | ST_VARIABLE | ST_BSF | ST_BSR), + 300, 300, + { 0x00, 0x02, 0x06, 0x06}, + { 0, 0, 0, 0 } + }, /* Wangtek QIC-150 1/4" cartridge */ { "Wangtek QIC-150", 14, "WANGTEK 5150ES", ST_TYPE_WANGTEK, 512, (ST_QIC | ST_AUTODEN_OVERRIDE),
I got my Qualstar 1054 from Bill Power at Power Computer Services for only $750 and have successfully read GE 9800 CT and Philips S15 MR tapes with it so far. See the "Sources" section for where to get one.
Once you have such a tape connected to the SCSI port, one can either write simple programs to read files (easiest if the tape has variable length records) or use shell scripts and the "dd" command with whatever the correct block size is. See dd(1), mt(1), and mtio(3) for more information. Remember that the read(2) call reads one fixed or variable length record at a time, and returns 0 bytes read for a tape mark, and two tape marks in a row indicates the end of the tape (normally). If you encounter short files with a series of records 80 bytes long chances are you are dealing with header/end markers. This is what ANSI standard tapes off VAX VMS seem to look like.
Anyone who has any further information about tape formats and handling, especially references to standard or on-line documents please let me know.
The next part is part7 - information sources.
END OF PART 6