Files Matt Newville, ravel@u.washington.edu, and Bruce Ravel, ravel@u.washington.edu 10 December, 1995 Data Handling for the UWXAFS analysis package ______________________________________________________________________ Table of Contents: 1. Introduction 2. ASCII Format Data Files 2.1. XMU files 2.2. CHI files 2.3. RSP files 2.4. ENV files 3. UWXAFS Binary Format Data Files 3.1. XMU files 3.2. CHI files 3.3. RSP files 3.4. ENV files 4. Utility Programs for Data Files 4.1. Reform 4.2. copydf 4.3. listdf ______________________________________________________________________ 1. Introduction The UWXAFS data analysis programs use external files to store the data to be analyzed. The data files used in XAFS analysis are fairly small, with a single set of data rarely exceeding a few thousand real numbers and twenty lines of associated text. But there needs to be some formatting rules to the data files so that their contents can be be understood. To keep the UWXAFS programs coherent, flexible, and portable, a fairly simple system for handling small XAFS data files has been developed over the years. There are some peculiarities and possibly outdated constructs in this system, but it's not too bad. Most importantly, it's stable. For the UWXAFS programs, all data files contain both text lines and numbers representing the data. The meaning of the numbers, including units, must be either known ahead of time. The text lines give some documentation in the data file, presumably to tell what the data is, where it came from, etc. There are currently two formats for the data files that the UWXAFS programs can use, ASCII and UWXAFS binary file formats. The file formats indicate how the file was written by an analysis program, and how it is to be read in the next. The two available formats have advantages and disadvantages that complement each other. ASCII files are easy for humans to create, read, edit, and transport between machines (by e-mail, for example), and easy for many other programs (such as general-purpose plotting and spreadsheet packages) to deal with. They are relatively large files (on the order of 10 to 50 Kb for each set of data) storing the data in an inefficient way with no compression. Each ASCII format file contains exactly one set of data. UWXAFS binary files offer more efficient storage than ASCII files, and include some management of related data by allowing more than one set of data to be stored in separate "records" of a single file. There is also a slight improvement in reading and writing speed of these files, but it's small. The UWXAFS binary files are impossible to transport between machines with different operating systems and architecture, and cannot be used by any other applications other than those written especially for them. There are a few such utility programs for data in the UWXAFS format, including copydf to copy data records from one file to another and listdf to list the data records contained in a single file. Most importantly, the primitive graphical display program idp requires UWXAFS binary files. Data can be easily converted between files with different formats with the UWXAFS program reform, so that you can take advantage of the best features of each file format. reform is an interactive program which can be used to convert data set at a time. It can also dump all the data records from a single UWXAFS binary file into separate ASCII files, or put several sequentially numbered ASCII files into a single UWXAFS binary file. 2. ASCII Format Data Files ASCII format data files are plain text files which can be manipulated by many programs, including text editors, and stand-alone plotting routines. These files can be easily moved between different programs, different operating systems, and different machines. They can be sent by e-mail. The ASCII files used by the UWXAFS data analysis programs have a minimal but well-defined layout, based on the lines of the text file. Document lines are placed at the top of the file. There can be any number of these lines, though most programs will store only the first 20 lines and ignore the rest. After the document lines, there is a line with minus signs (----) to indicate that all the document lines have been written. (If you want to get picky, the second through sixth non-blank characters in this line must be minus signs.) After this line of minus signs, there is an ignored line, which is typically used for labeling the columns of the numerical data which follow. All of these lines of text may have a number sign # in the first column. These are put there as a comment line for other programs that use text files. When read by UWXAFS programs, this leading # will not read as part of the document line. After the ignored line for column labels comes the numerical data. Each set of data occurs on a single line. There can be between two and five columns of numbers. Columns of data are separated by one or more spaces or tab characters. The first column contains the abscissa, the second column contains the real part of the ordinate. The third through fifth column (if given) contain the imaginary part of the ordinate, the magnitude of the ordinate, and the phase of the ordinate, respectively. Listing all forms of complex data is redundant but quite useful. The UWXAFS programs will write all five columns for any complex data (as for chi(R) and chi(q)). The UWXAFS data analysis programs limit the data size to 2048 data points, which should be sufficient for all XAFS analysis. The leading # in the first column of text lines and the general format of the ASCII format works well with the general-purpose, stand-alone graphics program gnuplot, which is freely available and works well on essentially every platform, and has a much larger user-base than any XAFS program. The UWXAFS project has no association with the gnuplot project, but we heartily recommend this program. It has its own oddities and is by no means perfect, but it does essentially everything needed for plotting simple data files, is well-documented and well-supported. It can write to a wide variety of output formats, including Postscript and nearly any terminal type. It can be run interactively or in batch mode. If you do not have it on your system, ask your system manager to install it. If you are the system manager, source code, documentation, and executables for some systems are available by anonymous ftp to ftp.dartmouth.edu. Support is provided through the usenet group comp.graphics.gnuplot. As with the UWXAFS binary files, the ASCII files have a file type associated with them (see the next chapter for more details), though the only purpose of the file type of ASCII files is to tell how many columns to use. For ASCII files with file type xmu or chi only the first two columns are used, and the rest are ignored. All other file types (rsp and env) indicate complex data, so that all five columns are used. Examples of each of the file types are given in the following sections. 2.1. XMU files Files with xmu File Type contain absorption data on an energy grid which does not need to be evenly spaced. The units of energy are eV. This file type is used for raw absorption data input to autobk and for the output data of the background function from autobk. The second column in xmu files contains values of absorption. The units for the data in the second column are unimportant. Raw synchrotron data of detector intensities must be converted to xmu format before being used in autobk. Any data in columns past the second will be ignored. The Cu absorption data distributed with autobk are examples of ASCII xmu data files. Here is part of one of them: # Cu foil, 10K # data taken at NSLS beamline X-11A Sept 1992 # foil from 99.999% Cu rolled and annealed to ~12 microns #---------------------------------------------------------- # energy xmu 0.8968871E+04 0.9484839E+00 0.8969347E+04 0.9510049E+00 0.8969909E+04 0.9537250E+00 0.8970386E+04 0.9559226E+00 0.8970862E+04 0.9591411E+00 2.2. CHI files Files with chi File Type contain chi(k) data. The first column contains k-values in units of inverse Angstrom. These values should be evenly k-spaced. All UWXAFS programs will write chi files with a grid spacing of delta k = 0.05 inv. Ang., and will interpolate any input chi files onto this grid. The second column of a chi file contains the chi(k) values. It does not contain k-weighted chi(k). Any data past the second column will be ignored. The Cu chi(k) data that is output from the autobk and those distributed for input of feffit (which came from autobk) are examples of ASCII chi data files. Here is part of one of them: # data : cu 10k background by autobk # chi: skey ASCII of cu010k.dat using skey ASCII of chi.dat # e0 = 8982.61; pre-edge range =[ -50.0 -200.0]; edge step = 2.257 #--------------------------------------------------------------------- # k chi(k) .5000000E+00 -.1540712E+00 .5500000E+00 -.1576023E+00 .6000000E+00 -.1621443E+00 .6500000E+00 -.1669036E+00 .7000000E+00 -.1723104E+00 .7500000E+00 -.1756163E+00 .8000000E+00 -.1719365E+00 .8500000E+00 -.1712734E+00 .9000000E+00 -.1738329E+00 .9500000E+00 -.1679564E+00 .1000000E+01 -.1598812E+00 2.3. RSP files Files with rsp File Type contain chi(R) data. The first column contains R-values in units of Angtroms. These values will be evenly R-spaced. Though the size of the grid can vary, it will typically be delta R = 0.031 Ang.. The second, third, fourth, and fifth columns of an rsp file contains the real, imaginary, amplitude, and phase parts of chi(R), respectively. The R-space outputs from feffit are examples of ASCII rsp data files. Here is part of one of them: # data : cu 10k background by autobk # chi: skey ASCII of cu010k.dat using skey ASCII of chi.dat # e0 = 8982.61; pre-edge range =[ -50.0 -200.0]; edge step = 2.257 #--------------------------------------------------------------------- # r real(chi(r)) imag(chi(r)) ampl(chi(r)) phase(chi(r)) .0000000E+00 .6142655E-01 .0000000E+00 .6142655E-01 .0000000E+00 .3067962E-01 .2903621E-01 -.5033424E-01 .5810884E-01 -.1047559E+01 .6135923E-01 -.2831294E-01 -.4127166E-01 .5004970E-01 -.2172074E+01 .9203885E-01 -.4082767E-01 .1267206E-01 .4274903E-01 -.3442544E+01 .1227185E+00 .3410820E-02 .4164495E-01 .4178440E-01 -.4794109E+01 .1533981E+00 .4358866E-01 .1037124E-01 .4480551E-01 -.6049594E+01 .1840777E+00 .2678454E-01 -.3571878E-01 .4464575E-01 -.7210562E+01 .2147573E+00 -.2032813E-01 -.3259343E-01 .3841307E-01 -.8411637E+01 .2454369E+00 -.2877361E-01 .1295394E-01 .3155512E-01 -.9847800E+01 .2761165E+00 .1427407E-01 .3249273E-01 .3548981E-01 -.1140950E+02 2.4. ENV files Files with env File Type contain backtransformed EXAFS date chi(q). The first column contains k-values in units of inverse Angstroms. These values will be evenly k-spaced, with delta k = 0.05 inv Ang. The second, third, fourth, and fifth columns of an env file contains the real, imaginary, amplitude, and phase parts of chi(q), respectively. The backtransformed k-space outputs from feffit are examples of ASCII env data files. Here is part of one of them: # data : cu 10k background by autobk # chi: skey ASCII of cu010k.dat using skey ASCII of chi.dat # e0 = 8982.61; pre-edge range =[ -50.0 -200.0]; edge step = 2.257 #--------------------------------------------------------------------- # k real(chi(k)) imag(chi(k)) ampl(chi(k)) phase(chi(k)) .5000000E+00 .7191563E-01 -.2600794E-01 .7647399E-01 .5936174E+01 .5500000E+00 .6291400E-01 .1202853E-01 .6405354E-01 .6472096E+01 .6000000E+00 .3789376E-01 .4072316E-01 .5562655E-01 .7104558E+01 .6500000E+00 .2566442E-02 .5404260E-01 .5410350E-01 .7806528E+01 .7000000E+00 -.3530601E-01 .4896256E-01 .6036428E-01 .8478718E+01 .7500000E+00 -.6737898E-01 .2606567E-01 .7224505E-01 .9055658E+01 .8000000E+00 -.8631523E-01 -.1053911E-01 .8695626E-01 .9546277E+01 .8500000E+00 -.8719383E-01 -.5396802E-01 .1025442E+00 .9979011E+01 .9000000E+00 -.6847088E-01 -.9584310E-01 .1177886E+00 .1037525E+02 .9500000E+00 -.3230590E-01 -.1278652E+00 .1318832E+00 .1074810E+02 .1000000E+01 .1581111E-01 -.1433793E+00 .1442484E+00 .1110541E+02 3. UWXAFS Binary Format Data Files XAFS data can be stored efficiently and conveniently in the UWXAFS binary format. These files use less disk space than ASCII data files. Since they cannot be easily edited, accidentally deleting some of the data in a UWXAFS binary file is essentially impossible. But the biggest advantage over ASCII data files is that more than one set of data can be held in a single data file, making organization of data easier for the UWXAFS binary files. The drawbacks to the UWXAFS binary files are that they cannot be transported between machines (unless both the architecture and operating system are the same) and that special routines must be used every time data is to be accessed. This second point means that general-purpose programs will be unable to handle data in the UWXAFS binary format. The UWXAFS file handling system allows a single binary file to store several data sets in different "records". Each file has a file type associated with it that determines the format of the data held in the file. Different file types cannot be mixed in a single file. Usually the file type will also be the extension of the file name. This is not required (and is violated by autobk), but it is recommended for most data files. Up to 191 records can be held in a file. Each of these records has the following information associated with it: 1. An integer key, called the nkey, which is the "address" for this record in the data file. This is independent of the content of the record, and depends only on the location in the file. 2. A symbolic skey, the skey, which is a unique "address" for the data in a record no matter where in the data file it is, or even which data file its in. If the data in one record is copied to another file, the skey will move with the data, whereas the nkey can change. 3. The numerical values, stored in a single array. This array has entries which depend of the file type. These will be described below, but it's not essential that you know them. 4. Documentation lines. Up to 20 can be used. The data from a record in a UWXAFS binary file can be retrieved by specifying the file name and either the NKEY or SKEY of the record. Each of the file type stores numerical data in a different way. Usually you don't need to worry about this. But if you're using reform to convert ASCII data into UWXAFS data, it's important to use the correct file type. If you follow the recommended convention that the file extension is the same as the file type (and that files with extension bkg have file type xmu), there should not be any problems. The following sections describe how each of the file types stores the numerical data. Each stores a single buffer of real numbers, which I'll call Buff, which has N_buff elements. 3.1. XMU files Files with xmu File Type contain absorption data on an energy grid which does not need to be evenly spaced. The units of energy are eV. This file type is used for raw absorption data input to autobk and for the background function output data from autobk. The first N_buff/2 elements of Buff contain the energy values (in eV). The second N_buff/2 elements of Buff contain the values of absorption, whose units are not important. Raw synchrotron data of detector intensities must be converted to xmu format before being used in autobk. 3.2. CHI files Files with chi File Type contain chi(k) data. The data is evenly k- space, and this is exploited in the storage of the data. The first element of Buff is k_min, the lowest value for which chi(k) is stored (in inverse Angstroms). The second element of Buff is delta k, (also in inverse Angstroms), which will normally be 0.05. The remaining N_buff - 2 elements of Buff contain the elements chi(k) themselves. 3.3. RSP files Files with rsp File Type contain chi(R) data. The data is evenly R- space, and this is exploited in the storage of the data. The first element of Buff is R_min, the lowest value for which chi(R) is stored (in Angstroms). The second element of Buff is delta R, (also in Angstroms), which will typically be about 0.031. The remaining N_buff - 2 elements of Buff contain the complex elements chi(R) themselves, stored in successive pairs of real and imaginary parts of chi(R). 3.4. ENV files Files with env File Type contain backtransformed EXAFS data (chi(q)). The data is evenly k-space, and this is exploited in the storage of the data. The first element of Buff is k_min, the lowest value for which chi(q) is stored (in inverse Angstroms). The second element of Buff is delta k, (also in inverse Angstroms), which will normally be 0.05. The remaining N_buff - 2 elements of Buff contain the complex elements chi(q) themselves, stored in successive pairs of amplitude and phases of chi(q). 4. Utility Programs for Data Files There are three utility programs in the UWXAFS distribution for the manipulation the data files. These programs are interactive and are fairly straightforward to use. The only real complication is that UWXAFS binary files contain more than one record, so that the record key must be given in addition to its filename. As discussed in section 3, the record of a UWXAFS binary file can be specified by either the numeric or symbolic key. For the data in a single file, the nkey and skey are redundant. The skey remains with the data record even when moved between data files. Furthermore the skey is essentially unique to the data in the record so that two records with identical skeys are almost guaranteed to contain exactly the same data and documents. In contrast, the nkey is just the index of where in the data file that record is kept. All of these utility programs should be fairly self-explanatory, but a synopsis of each will be given below. We should also say that other, similar utility programs are available for free from the XAFS Database at IIT. If you're using UWXAFS format files, we highly recommend using the utilities from this database, including the programs rftoasc and asctorf. If you want idp, get it from here. You may even like the versions of copydf and listdf from this database better than the ones we distribute. 4.1. Reform reform will reformat, or convert, files from ASCII to UWXAFS format or from UWXAFS to ASCII format. The program will prompt you for the input file name. When converting from UWXAFS files to ASCII files, you will then be asked for which record to convert to ASCII format, with this prompt: record (nkey or skey) At this prompt, you can type the nkey or skey of a single file to convert. To reformat more that 1 record from a UWXAFS binary file, there are two choices. You can either type two nkeys, separated by a comma, such as 1, 5, which will, which will convert all records with nkeys between 1 and 5, inclusive. Or you can type all, which will select all records in the file. Typing lis will give a listing of all records in the file, and typing help or ? will list of the above possible responses. After the record(s) to convert have been determined, you'll be asked for the output file name. When converting more than one file, the files will have the prefix you specify, and will have sequentially numbered extensions, so that you'll get file names like cu.001, cu.002, cu.003. When converting from ASCII to UWXAFS files, there is no need to ask for the record, so you'll be asked for the output file name. If this file already exists, the data in the ASCII file will be put in the next available record. If the output UWXAFS file does not exist, the data is put in the record with nkey=1, but you will be asked to specify the file type. Usually this will be the same as the file extension (that is cu.xmu probably has file type xmu), and you can just hit carriages return. See section 3 for the best file type to use for each type of data. After the reformatted file has been written you'll be asked if you want to continue to do more reformatting of data files. The file names of the previous time through will be used as defaults, so you should be able to just hit return more. The principle advantage of this if you have reformed a UWXAFS file to ASCII format with all and now want to put all the ASCII files back together into a single UWXAFS file. If, for example, you converted cu.001 to cu.xmu, and want, continue, the default for the next input file will be cu.002, and you can just hit return until the program ends (when it gets to a file it can't find) to reconstruct the original file. This is necessary if the data is to be transported machines. The binary format is highly dependent on operating system and machine architecture. (The programs rftoasc and asctorf from the IIT database will also dump a binary file for transport, and then reassemble it). 4.2. copydf This will copy part (or all) of the contents of a UWXAFS format file to another UWXAFS format file. 4.3. listdf This will list the records in a UWXAFS format file, and write out part (or all) of the documentation for the records in the file. The outputs can be written to the screen or to a file.