AUTOBK Automated background removal for XAFS data % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Contents 1 Introduction 1 2 Input and Output Files 2 3 Keywords and Controls for autobk.inp 4 4 E_0, Pre-Edge and Normalization 8 5 Post-Edge Background Spline 9 A Examples 13 B Program Notes 15 % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Important Note: The Postscript version of this document is much more readable, includes figures referred to in the text, and is the official supported version of the document. This ascii version is incomplete and not necessarily up-to-date. --Matt Newville % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Autobk was written with the guidance and encouragement of Edward A Stern. The principle idea for the main algorithm used (matching the low-frequency components of chi(k) to a model calculation) was first brought to my attention by Yizhak Yacoby. Autobk grew out of the development of this idea, and the desire for a computer algorithm to easily and reliably separate the XAFS signal and background, with much help from Peteris Livins, Ed Stern, and Yizhak Yacoby. I also thank Daniel Haskel, Maoxu Qian, John Rehr, Bruce Ravel, and Yanjun Zhang for many useful discussions and helpful suggestions. Matthew Newville Department of Physics, FM-15 University of Washington Seattle, Washington USA 98195 newville@phys.washington.edu (206) 543-0435 autobk version 2.61 updated: Jan 25, 1995 % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Chapter 1 Introduction Autobk will remove the background from x-ray absorption data in a reliable and reasonably easy-to-use manner. The XAFS is formed using the relation chi(E) = { mu(E) - mu_0(E) } / Delta_mu_0(E_0) (1.1) where E_0 is the absorption edge energy, mu_0(E) is the atomic-like absorption past the edge, and Delta_mu_0(E_0) is the jump at the edge step. chi(E) is converted into chi(k) where k=sqrt{ 2*m*(E-E_0)/hbar^2} is the momentum of the photo-electron. The autobk approach to background removal has the advantage that very little prior knowledge of the system being studied is required to extract chi(k) from mu(E). The resulting chi(k) has the atomic-like absorption contributions removed, but retains essentially all the local structural information about the near-neighbor environment of the absorbing atom. It is then ready for a more careful analysis of the effect of the local structure on the XAFS. The important steps of background removal can be seen from Eq.(1.1) to be: 1. Determine the edge energy E_0. 2. Determine the normalization constant Delta mu_0(E_0). 3. Find an approximation for the Post-edge background function mu_0(E). Steps 1 and 2 are pretty simple, and will be discussed further in Chapter 4. Step 3 is the hard part. The problem is that the true atomic-like absorption (that is, the non-XAFS absorption) will have some smooth energy dependence, but nobody knows its form. The absorption of an isolated central atom isn't good enough. mu_0(E) (the so-called embedded atom absorption) is the absorption of the central atom in the electronic environment of the solid but with all the scattering from the neighboring atoms turned off. Since mu_0(E) is essentially impossible to measure, it is approximated by a smooth function which has some flexibility and which can be adjusted to give some sort of fit to the measured absorption data. The background function mu_0(E) is found in autobk by using concepts from basic Fourier signal analysis to assist the fundamental physical ideas behind the separation of XAFS and background. mu_0(E) is approximated by a piecewise polynomial (or spline) that can be adjusted so that the low-R components of the resulting XAFS chi(R) (that is, after a Fourier Transform into R-space) are optimized. This optimization is discussed further in Chapter 5, but the basic idea is to eliminate the non-structural parts of chi(R) at low-R. The stiffness of mu_0(E) in autobk is controlled internally and depends only on the size of the low-R range chosen as the background range. This enables a clear definition of the background (as that part of the absorption with dominant R components in the low-R range), and eliminates most of the subjectivity inherent in background removal schemes. The result is that autobk will find a reasonably good background without a lot of playing around with the data. In fact, only one parameter in the user's control has a profound effect on chi(k), and this (the endpoint of the low-R range) has at least some physical significance. % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Chapter 2 Input and Output Files ===Section 2.1 Input Files================================================ To run autobk you need an input file named autobk.inp which will control the running of the program and a data file containing measured absorption data mu(E). The form and contents of autobk.inp will be further discussed in Chapter 3. The file containing mu(E) will be discussed in section 2.3. An input data file containing a standard chi(k) can also be given. The purpose of this standard chi(k) will be discussed in Chapter 5. The input data file names can be any valid filenames up to 70 characters long allowed on your system (usually subdirectory paths can be given) and will be given in autobk.inp. In summary, there are three inputs: 1. autobk.inp, the input file for the program. 2. A data file containing the mu(E) of which the the background is to be found. 3. A data file containing the standard chi(k). (optional) ===Section 2.2 Output Files=============================================== After the background is found, autobk will write autobk.log, containing a synopsis of the program inputs and outputs. An output data file for the chi(k) found from the background removal will also be written. There are five additional data files that can be written. Each of these optional files contains different ways of representing data in the background removal process, and each has its own keyword to select whether or not that form of the data is written. The naming conventions for the output files will be discussed in the next section. The most useful of the optional outputs is mu_0(E), which will be written by default (i.e., unless you put "bkgout = false" in autobk.inp). This will give the mu_0(E) at the same energy points as the input mu(E) data. Outside the energy region used in the background removal (e.g., in the pre-edge region), this function will take the same values of the input absorption data. The background can also be written to k-space using "bkgksp = true". The data itself can be written in R-space (that is, after the Fourier transform with the same window and k-weighting on chi(k) used for the background removal) using "datrsp = true". If a standard chi(k) is used in the background removal, it can be written to k-space using "theksp = true" and to R-space using "thersp = true". This will contain the standard after its amplitude has been altered in the fit. All outputs in k-space will be written between k_min and k_max. Outputs in R-space will be written between 0 and 10.0 Angtroms. Since the Fourier Transform Window (which smoothes the data and reduces ``ringing'' in chi(R)) used in background removal is typically much sharper than for analysis, the R-space outputs from autobk are not intended for general use, but only for diagnostic checks of the background removal. ===Section 2.3 Data File Formats========================================== As for all uwxafs3.0 data analysis programs, there are two options for the format of the data files. The data may be in either a specially formatted binary file known as a UWXAFS binary file (also called an RDF file), or in a specially formatted ASCII column file. More information on these file formats, including the format specifications and a discussion of the relative merits of the two file formats can be found in the uwxafs3.0 document files.doc. The two file handling formats can be mixed in autobk, so that the input data can be in the UWXAFS format and the output data can be in the ASCII format, or vice versa. If the input data are in UWXAFS format, it must be in a file with file type `XMU'. Both the file name and record key (either nkey or skey) must be specified for the input. As further explained in files.doc, a UWXAFS format file can hold more than one data scan in separate ``records'' of the file. The nkey and skey (the numeric key and symbolic key, respectively) are both ways to address the separate records, and either of them can be used to access a particular data set. If the input data file is in ASCII format, it must contain mu(E) data for exactly one scan and it must be in a file with the following format (see files.doc). The ASCII data file begins with document lines (any number of such lines are allowed, and the first 15 will be kept and written to the output files). These are followed by a required line of minus signs (`-'). The line following this will be ignored (so that column labels can be put in). After that, columns of numerical data for E (in eV) and mu(E) will be read. Each data pair must be on its own line. Any data past the second column will be ignored (but will not cause any side-effects). Also please note that the first column needs to be energy in eV, not KeV and not angle of the monochromator. The second column is not intensity of some detector. You must convert raw data from the synchrotron yourself, as there is no common format for such data. If you find this requirement a hardship, contact us and we may be able to help. The example file cu10k.dat is an example of an ASCII column data file that contains mu(E). Outputs files will be written to files with names that depend on the contents of the file, the output file format (either UWXAFS or ASCII), and the user-chosen name. If the user specifies the output file name by "out = test" in autobk.inp, the output files will be named according to the table below. contents of file filename file type keyword in (ascii) (uwxafs) autobk.inp data chi(k) testk.chi test.chi chi - mu_0(E) teste.bkg test.bkg xmu bkgout mu_0(k) testk.bkg test.chi chi bkgksp standard chi(k) testk.stn test.chi chi theksp standard chi(R) testr.stn test.rsp rsp thersp data chi(R)R testr.dat test.rsp rsp datrsp % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Chapter 3 AUTOBK.INP ===Section 3.1 General Format of AUTOBK.INP=============================== Input commands to autobk will be read from the file autobk.inp. These inputs will name the data file to use for the mu(E) data and set all user options and controls. Autobk uses keywords to describe and assign values to all program parameters. The use of keywords allows the input file to be easily read and the values of the program parameters to be easily modified. The keywords have fairly transparent meanings, and are assigned values with keyword ``sentences'' with syntax: keyword value. The keyword must be one of the valid keywords listed below. The is an equal sign or white space (a blank or TAB) surrounded by any number of white spaces. The value is provided by the user and will be interpreted as a number, a logical flag, or a character string, depending on the nature of the keyword --- the list below will indicate what kind of value each keyword takes. Logical flags all have values true or false (t and f will work, too). If a keyword's value is a number or logical flag (but not a character string), the assigning keyword sentence can be put on the same line as other numerical and logical keyword sentences. Keywords that take character strings as their value must occur on their own line. Autobk does not distinguish keywords by case. But to accommodate many operating systems, it does distinguish the names of external files by case. Keyword sentences are allowed to occur in any order in the file. Internal comments can be written anywhere in autobk.inp. ===Section 3.2 Summary of Keywords======================================== Here is a brief list of all the keywords for all the program controls and parameters in autobk with a brief description of the meaning of their values. The form of the values taken by these keywords are indicated by c, n, or l for character strings, numbers, and logical flags, respectively. Where appropriate, valid options for the values are given in parentheses and default values are given in brackets. The following sections give more detailed explanations for each keyword. General/Miscellaneous: %/! - End-of-Line Comment: ignore everything on the line after % or ! # - Comment Line: ignore line if # is in 1st column ---- - End of Data set: stop reading inputs for this data set. Data Input and Output: Title c User chosen title line [none] Formin c File Format of input data file (uw/ascii)[found from input data] Formout c File Format of output data file (uw/ascii)[same as input format] Format c File Format of both input and output data file (uw/ascii) [none] Data c Name of input data file containing mu(E) [none] Theory c Name of input data file with chi(k) for standard [none] Fixamp l Flag for not fitting the amplitude of the standard chi(k) (T/F) [F] Fixe0 l Flag for not fitting the value of E_0 (T/F) [F] Out c Name of output data file [same as input] Bkgout l Flag for writing mu_0(E) to output file (T/F) [T] Bkgksp l Flag for writing mu_0(k) to output file (T/F) [F] Theksp l Flag for writing chi(k) of standard to output file (T/F) [F] Thersp l Flag for writing chi(R) of standard to output file (T/F) [F] Datrsp l Flag for writing chi(R) of data to output file (T/F) [F] Allout l Flag for writing all of the above outputs (T/F) [F] Pre-Edge, E_0, and Normalization: E0 n Edge energy in eV. [found from data] Pre1 n Low energy limit of pre-edge range, relative to E_0 [-200] Pre2 n High energy limit of pre-edge range, relative to E_0 [-50] Nor1 n Low energy limit of normalization range, relative to E_0 [+100] Nor2 n High energy limit of normalization range, relative to E_0 [+300] Step n Edge step [found] Fitting Ranges and Fourier Transform Windows: Rbkg n R_bkg, the maximum R value to fit the background [1.0] R1st n R_1st for fit of standard to the first shell [rbkg+2.0] Emin n Starting E for the background spline [0.0] Emax n Ending E for the background spline [last data point] Kmin n Starting k for the background spline [0.0] Kmax n Ending k for the background spline [last data point] Kweight n k-weight for Fourier Transform [1.0] Dk1 n Low-k Fourier Transform Window Parameter [0.0] Dk2 n High-k Fourier Transform Window Parameter [0.0] Dk n Both Dk1 and Dk2 [0.0] Iwindo n Integer to select Fourier Transform Window Function [0] ===Section 3.3 General and Miscellaneous Keywords========================= % or ! indicates a comment anywhere in autobk.inp, including end-of-line comments. * or # indicates a comment line in autobk.inp if it is the first character on the line. ---- Stop reading inputs for this data set, and begin background removal on this data set. autobk will return to this place after the background is found to read inputs for another data set. In this way, more than one background removal can be done with a single autobk.inp file. ===Section 3.4 Data Input and Output Keywords============================= Title User-chosen title line which will be written to the output files. This must be on its own line. Formin file format to use for the input data files. The choices are UWXAFS and ASCII. See Chapter 2 and the uwxafs3.0 document on data files for more details. The default is for autobk to find the input format itself, from the input data. This does not need to be on its own line. Formout file format to use for the output data files. The choices are the same as for Formin, and the default is to use the format used as the input format. This does not need to be on its own line. Format sets both Formin and Formout. Data Name of input data file containing mu(E). For UWXAFS format files, this file must have file type `XMU'', and either the nkey or skey must also be given, so that the syntax must be something like: Data = cu.xmu, 1 or Chi = cu.xmu , TROUT. For ASCII input data format, only the input file name is needed. This should be on its own line. See Chapter 2 for more details. Theory Name of the input data file containing chi(k) for the standard. The naming conventions will be the same as for the mu(E) data file above. If the UWXAFS format is used, this must have file type ``CHI''. This should be on its own line. See Chapter 5 for more on the use of this standard chi(k). Fixamp Logical flag to prevent the amplitude of the standard chi(k) from being rescaled automatically in the fit. The default is false, so that the amplitude of the standard will be adjusted so as to match the first shell chi(R) of the data between R_bkg and R_1st. Fixe0 Logical flag to prevent E_0 from being varied in the fit. The default is false, so that E_0 of the data chi(k) will be varied. Out Prefix for the output file name. See Chapter 2 for more details, and an explanation of the file name suffixes. This does not need to be on its own line. Bkgout Logical flag for writing output data for mu_0(E). This data will be written at exactly the same energy points as the input mu(E) data. The default is true. Bkgksp Logical flag for writing out mu_0(E)k. The default is false. Theksp Logical flag for writing out chi(k) of the standard. The default is false. Thersp Logical flag for writing out chi(R) of the standard. The default is false. Datrsp Logical flag for writing out chi(R) of the data. The default is false. Allout Logical flag for writing all of the above outputs. ===Section 3.5 Pre-Edge, E_0, and Normalization Keywords================== E0 Edge Energy in eV. The default value for the starting E_0 is set near the point of maximum derivative on the absorption edge. If a standard chi(k) is used, this value will be adjusted unless the flag fixe0 is set to true. Pre1 The low energy limit of pre-edge range, over which a line is fit to help determine the normalization constant, as discussed in Chapter 4. The value is relative to E_0, and the default is -200.0 eV. Pre2 The high energy limit of pre-edge range, over which a line is fit to help determine the normalization constant, as discussed in Chapter 4. The value is relative to E_0, and the default is -50.0 eV. Nor1 The low energy limit of post-edge range, over which a quadratic polynomial is fit to mu_0(E) to determine the normalization constant, as discussed in Chapter 4. The value is relative to E_0, and the default is 100.0 eV. Nor2 The high energy limit of post-edge range, over which a quadratic polynomial is fit to mu_0(E) to determine the normalization constant, as discussed in Chapter 4. The value is relative to E_0, and the default is 300.0 eV. Edge The value of the edge step normalization constant, Delta_mu_0(E_0) in Eq. (1.1). Specifying this value will overwrite the default normalization constant found as discussed in Chapter 4. ===Section 3.6 Fourier Transform Parameters and Fitting Ranges============ For further information on the meaning and effect of the Fourier Transform Parameters, see Chapter 6 of the feffit document. Rbkg R_bkg, the maximum R over which the background function mu_0(E) is fit. Note that this value is not corrected for any phase-shifts, and so corresponds to chi(R), not the absolute interatomic distance. The default in 1.0 Angtroms. R1st R_1st, the maximum R of the first shell to use in making the amplitude of the standard and data chi(R) equal. As for Rbkg, this is not corrected for any phase shifts. The default is Rbkg + 2.0 Angtroms. Emin Low-E value (relative to E_0) of the region over which the background function mu_0(E) is fitted. This value will correspond exactly to Kmin below, and only one needs to be specified. The default is 0. eV. Emax High-E value (relative to E_0) of the region over which the background function mu_0(E) is fitted. This value will correspond exactly to Kmax below, and only one needs to be specified. The default is the last data point. Kmin Low-k value of the region over which the background function mu_0(E) fitted. This value will correspond exactly to Emin above, and only one needs to be given. The default is 0. Ang^(-1). Kmax High-k value of the region over which the background function mu_0(E) is fitted. This value will correspond exactly to Emax above, and only one needs to be given. The default is the k of the last point data. Kweight k-weighting for the Fourier Transform. The default is 1. Dk1 Low-k Fourier Transform Window Parameter (window ``sill'') for the Fourier Transform. The default is 0.0. Dk2 High-k Fourier Transform Window Parameter (window ``sill'') for the Fourier Transform. The default is 0.0. Dk Sets both Dk1 and Dk2 to the same value. The default is 0.0. Iwindo Integer index to specify which of the possible Window Types to use for the Fourier Transform. The default is~0, indicating Hanning Windows. See Chapter 6 of the feffit document for details of XAFS Fourier transforms, including % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Chapter 4 E_0, Pre-Edge, and Normalization In addition to fitting the background function mu_0(E), the data must be properly normalized in order to construct the chi(E) of Eq. (1.1). Both the normalization process and the conversion from chi(E) to chi(k) require a good estimate of the threshold energy E_0. Both of these topics will be discussed in this chapter, and the solutions found by autobk will be explained. The value of E_0 can be specified by the user, or autobk can make a reasonable guess at it. The value autobk finds will rarely be extremely poor, but it can easily be off by a few volts from where you might pick it. By default the value of E_0 will be chosen as an energy point in the edge, near where dmu/dE, the derivative of mu(E), has a maximum. Numerical derivatives are not very trustworthy and the maximum might find a glitch in the data, so the value chosen is picked more safely than that. The initial value of E_0 (either entered by the user or found by autobk) will be varied if a standard chi(k) is used for the background removal, unless the logical flag fixe0 is explicitly set to true. The fitting of E_0 isn't very sensitive, because the important part of autobk is to get mu_0(E), which doesn't depend much on the E_0. The fit to the standard chi(R) over the first shell will depend slightly on E_0 but this is much less important than the fit to the low-R region. It is rare for E_0 to be adjusted more than a few volts. he fitted value of E_0 probably shouldn't be trusted very much, and E_0 should be more carefully and accurately determined in the analysis of chi(k). The normalization in autobk is done by a single constant number, Delta mu_0(E). Because energy-dependences of x-ray detectors are usually comparable in size to the energy-dependence of mu_0(E) there is little point in normalizing by a energy-dependent background. In any event, the primary energy-dependences of the detectors and mu_0(E) are not difficult to estimate (as with the uwxafs3.0 program atoms), so that these corrections can be later put into the analysis. See the example of pure Cu in the feffit document for how this can be done. The constant value of Delta mu_0(E_0) can be set in autobk using the keyword step. If it is not given, this constant will be found by taking the difference in the extrapolation of smooth functional fits to the pre-edge total absorption mu(E) and post-edge background absorption mu_0(E) (after it is found, of course) at the threshold energy, E_0, so that Delta mu_0(E_0) = mu^+_0(E_0) - mu^-(E_0). (4.1) The measured absorption below the edge step (the so-called pre-edge region) is fit to a straight line over the energy region between [E_0 + E_pre1, E_0 + E_pre2] . Both E_pre1 and E_pre2 can be set by the user with keywords re1 and pre2, and have default values of -200, and -50 eV, respectively. These two numbers are relative to E_0, so they should be negative numbers that are in the measured pre-edge region of the data. This fitted line is then extrapolated to E_0, giving mu^-(E_0). The values of the slope and intercept of this pre-edge line will be written to autobk.log. The background function mu_0(E) (found as discussed in Chapter 5) is fit to a quadratic polynomial in E over the energy region between [E_0 + E_nor1, E_0 + E_nor2]. Both E_nor1 and E_nor2 can be set by the user with keywords nor1 and nor2, and have default values of 100, and 300 eV, respectively. These two numbers are relative to E_0, and should be positive numbers that are in the measured region of the data. This fitted polynomial is then extrapolated to E_0, giving mu^+_0(E_0). % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Chapter 5 Post-Edge Background Function The mu_0(E) term in Eq.(1.1) represents the absorption due to the deep core level of an isolated absorbing atom in the solid. This so-called embedded atom absorption will differ from the absorption for a truly isolated atom because of the overlap of electron orbitals of neighboring atoms. It should still be a smoothly varying function of energy, so that it can be well-separated from the oscillatory part of the total absorption, but its actual form is poorly known. Current state-of-the-art calculations of mu_0(E) can get only qualitative agreement with experiments, and their use in analysis is not yet reliable. The reluctantly accepted practice in XAFS analysis is to use as the background function mu_0(E) ``some reasonably smooth function which, in some manner, approximates the original mu(E)''. All background removal techniques use this approach, and the differences between different techniques lies in how this rather qualitative criterion is interpreted. In this chapter I'll explain how autobk interprets the background criterion and why we think it's a good interpretation. The main principle of autobk is to apply the ideas of information theory to the separation of the background and XAFS. This allows the qualitative background criteria to be made quantitative and codified. The bearing of information theory on XAFS will be discussed in section 5.2. ===Section 5.1 Overview=================================================== Autobk uses a piecewise polynomial, or spline, to approximate mu_0(E). The spline is chosen to optimize the R-components of chi(R), the Fourier Transform of chi(k), below R_bkg. The stiffness of the spline is controlled by the number of knots, points at which the different polynomial pieces meet, and where there can be a discontinuity in some high order derivative. The number of knots in the background spline is chosen to be the number of independent points in the low-R range of chi(R), between R = [0.0, R_bkg]. This is simply given by the number of independent points in this region, which is N_bkg = 1 + 2 * Delta_k * R_bkg / pi . (5.1) where R_bkg is an estimate of the low-R edge of the first peak in the resulting chi(R), and Delta_k is the k-range of the data. N_bkg is the number of degrees of freedom in the data below R_bkg. From the ideas of information theory, the knots of the spline are required to be equally spaced in k, which will minimize the spectral leakage of the background into the region above R_bkg. Autobk uses fourth order splines (i.e., cubic splines) to ensure that no more than one full oscillation of the spline can occur between knots. This means that the highest measurable R value (the so-called Nyquist critical frequency) is R_bkg, and that all components of the background above R_bkg comes from spectral leakage due to the finite k-range. ===Section 5.2 Information theory and XAFS================================ There is a fundamental result of information theory that limits number of frequencies that can be distinguished in a signal with finite time duration. This is essentially a restatement of the standard uncertainty relation for Fourier conjugate components. For signal analysis, this result (often attributed to Shannon) can be stated quantitatively by saying that for a signal with time duration Delta_t that two frequencies cannot be distinguished if they differ by less than Delta_omega = pi / Delta_t. Since only a finite frequency range (or bandwidth) can be used for any real signal, there is a finite limit on the number of distinguishable frequencies measurable in a signal. In this sense there is an upper limit on the amount of information that can be transmitted in a signal through its different frequencies. This limit is just the maximum number of independently detectable frequencies in the signal, given by N =~ Delta_t * Delta_omega / pi, (5.2) where Delta_t is the time duration of the signal and Delta_omega is the measurable frequency range. We can eliminate the approximate nature of Eq. (5.2) by looking at the Fourier series upon which the sampling theory is based. The interpretation from this approach is that the information is taken as the values of the Fourier coefficients, which come in pairs, spaced at intervals in omega of pi / (2 Delta_t). This gives the amount of information a clear interpretation and immediately leads to N = 1 + ( Delta_t * Delta_omega / pi ), (5.3) The 1 represents the constant term in the Fourier series expansion, and corresponds to there being one Fourier coefficient at omega = 0 and pairs of coefficients at all non-zero multiples of pi / (2 Delta_t). For XAFS, the conjugate variables are k and 2R, we sample chi(k) between k_min and k_max, and want to get structural information between R_min and R_max. The amount of information we can get out of an XAFS measurement is therefore given by Eq. (5.3) to be simply given by 2 (R_max - R_min) * (k_max} - k_min) N = ------------------------------------ + 1 (5.4) pi This is valid for the whole R-range, even the low-R region where there is no structural information! Since XAFS analysis is intended to give structural information, and since the low-R region has none, the interpretation in autobk is that all the information below the first peak in chi(R) can be used to determine the background. We then have a clear definition of how much information can be used in getting the background, given by Eq. (5.1). R_bkg is the only term in this definition which depends at all on the physical details of the atomic distribution of the system, and is easily interpreted as the low-R cut-off below which the data will not be analyzed for its structural content. Finally, since we're interpreting the background information as coefficients in a Fourier series, we know the information must be equally spaced in both k- and R-space. ===Section 5.3 Using Splines to Approximate mu_0(E)======================= Most XAFS background routines use stiff splines to estimate mu_0(E). A spline is a piecewise polynomial, a function made up of several contiguous polynomial sections. Using cubic polynomial pieces (so-called cubic splines) is very common. The places where two polynomial pieces meet are called ``knots''. At these knots, the function must be continuous in its value, but some of its derivatives might have discontinuities. Usually only it's highest non-trivial derivative is discontinuous, so that one degree of freedom is associated with each knot. Splines are commonly used to approximate functions that are expected to be fairly smooth, but whose actual form is not completely known. They are especially easy to use because they can be made arbitrarily flexible and are very easy to calculate in terms of a small number of degrees of freedom (typically, one for each knot). Endpoints of the spline need to be dealt with as special cases. Autobk uses something very similar to standard cubic splines (it actually uses b-splines of fourth order, but the distinction is not important for the discussion here). One free coefficient is associated with each of the N_bkg knots of the spline. The value of this free coefficient is optimized as discussed in the next section. Good initial values for the free coefficients of the spline turn out to be easy to get for splines by guessing that the spline goes through the mu(E) values at the knot locations. Aside from giving the initial guesses of the spline coefficients the mu(E) data is not explicitly used for evaluating the background function. The spline is not forced in any way to go through any mu(E) points, including either of the endpoints. ===Section 5.4 The Optimization of mu_0(E)================================ The background function mu_0(E) is chosen in autobk to be a spline with N_bkg free coefficients, where N_bkg, given by Eq (5.1), depends only on R_bkg and the k-range of the data. These free coefficients are the ordinate values for the spline at each of its knots (which are evenly k-spaced), and completely specify the spline. In this sense the mu_0(E) depends only on the N_bkg free spline coefficients, which I'll denote as the vector y (with N_bkg components), so that mu_0(E)=mu_0(E, y). The criterion for choosing the best mu_0(E) is then reduced to finding the N_bkg components of y which will minimize the non-structural components of the resulting chi(R) below R_bkg. This expression of the ``smoothness'' argument for mu_0(E) can easily be solved using a least-squares minimization algorithm. The function to minimize in the least-squares sense in autobk is a function of R, but really depends only on the N_bkg spline coefficients, and is / mu(k) - mu_0(k, y) \ f(R, y) = FT| ------------------- - chi_standard(k) |, R < R_bkg, (5.5) \ Delta_mu_0(E_0) / where FT represents the Fourier transform. More precisely, this is the XAFS Fourier transform as described in Chapter 6 of the feffit documentation, which gives further details and the effect of the Fourier parameters on this Fourier transform. See Appendix A for suggested values for the Fourier Transform parameters to use in background removal. The non-linear least-squares minimization of f(R, y) is done using the Levenberg-Marquardt algorithm. f(R, y) is complex, and both its real and imaginary components are be minimized. Since f(R, y) is minimized only over the region below R_bkg, there is no danger of mistakenly removing ``real'' data above R_bkg, even though N_bkg may seem like a fairly large number of knots. In this way, all information in mu(E) with R components below R_bkg are used to give mu_0(E), and none of the components above R_bkg are used. To ensure that only the non-structural components of mu(E) are removed in the background removal, the expected spectral leakage from the first shell XAFS from a standard chi(k) should be included in the determination of mu_0(E), as is indicated by the term chi_standard(k) in Eq. (5.5). If no standard chi(k) is given, chi_standard}(k) will be set to zero, and the minimization of f(R, y) in Eq. (5.5) will be equivalent to minimizing just the low-R components of mu(E). Since only the spectral leakage (due to the finite k-range of the data) in the low-R region will be used, this standard chi(k) needs to be only a rough estimate of the first shell XAFS. Usually either a calculation from feff of the expected first shell XAFS spectra or an experimentally measured standard chi(k) (one for which the background removal is trusted) can be used as the standard chi(k). If you're planning to compare different chi(k) data in later analysis, we recommend using a single standard chi(k) (either from feff or experiment) for all background removals. Our experience is that the resulting background is not extremely sensitive to the details of the standard chi(k), though it is important to have the right backscattering atom at roughly the right distance. If a standard chi(k) is used in the optimization of mu_0(E), the value of E_0 and the amplitude of the standard chi(k) can be altered in the fit. The fit of E_0 is not very accurate because there isn't much information in the low-R region that depends on E_0. So E_0 rarely gets moved by more than 5 eV. The fitting of E_0 can be turned of by saying "fixe0 = true" in autobk.inp. The amplitude of the standard chi(k) will normally be scaled by a constant factor so that the first shell of the standard chi(R) is the same size as that of the data chi(R). This is done to make the leakage into the low-R region roughly the right size. This scale factor is chosen to make the amplitudes of the standard and data chi(R) over the first shell region, R = [R_bkg + pi/Delta_k, R_1st] equal in size. Note that this region begins at the pair of independent points after the background R-region, so as to prevent significant correlation of background parameters with this scale factor. R_1st can be selected using the keyword R_1st in autobk.inp, and has a default value of R_bkg+2.0 Angstroms. This adjustment of the amplitude of the standard chi(k) can be turned of by putting "fixamp = true" in autobk.inp. ===Section 5.5 References================================================= Most of the topics in this chapter are further discussed in >> W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in FORTRAN. Cambridge University Press, Cambridge, 2nd edition, 1992. The following books offer more on specialized topics. Information Theory: >> L. Brillouin, Science and Information Theory. Academic Press, New York, 1962. Nonlinear Least-Squares Fitting: >> P. R. Bevington. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, New York, 1969. Splines (the routines used in autobk are from this book): >> C. deBoor. A practical guide to splines. Springer-Verlag, New York, 1978. The physics literature also has a few items of interest. Autobk (the preferred reference to the algorithms used in this program): >> M. Newville, P. Livins, Y. Yacoby, J. J. Rehr, and E. A. Stern. Near-edge x-ray-absorption fine structure of Pb: A comparison of theory and experiment. Phys. Rev. B, 47(21):14126--14131, 1993. Some other background-removal algorithms: >> G. Li, F. Bridges, and G. S. Brown. Multielectron x-ray photoexcitation observations in x-ray-absorption fine-structure background. Phys. Rev. Lett. 68(10):1609--1612, 1992. >> E. A. Stern, P. Livins, and Z. Zhang. Thermal vibration and melting from a local perspective. Phys. Rev. B 43:8850--8860, 1991. >> J. W. Cook Jr. and D. E. Sayers, Criteria for automatic x-ray absorption fine structure background removal. J. Appl. Phys. 52(8):5924--5031, 1981 Attempts to calculate mu_0(E) from first principles: >> J. J. Rehr, C. H. Booth, F. Bridges, and S. I. Zabinsky. X-ray-absorption fine structure in embedded atoms. Phys. Rev. B 49(17):12347--12350, 1994. Information Theory and XAFS: >> E. A. Stern. Number of relevant independent points in x-ray-absorption fine-structure spectra. Phys. Rev. B 48(13):9825--9827, 1993. % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Appendix A Examples All the files mentioned in this chapter should have been included in your distribution of the uwxafs3.0 programs. If you don't have these files, contact us and we'll get them to you. All files and examples use the ASCII file type and have been renamed to guard against any programs in the uwxafs3.0 distribution from easily overwriting them. The distributed files include atoms-cu.inp which, if copied to atoms.inp, can be run through atoms. This will write a feff.inp, which can be run though feff. The cu-feff.chi file used in the example below should be exactly the same as chi.dat generated by feff. Running atoms and feff and examining the outputs for this simple example is probably a worthwhile exercise. ===Section A.1 Pure Cu example============================================ Shown below is auto1.inp. This file will need to be copied autobk.inp to run the program. This is about as simple as autobk gets. %------------------% autobk.inp %-------------------% title = Cu 10K, with standard from feff data = cu10k.dat % xmu data output = cu.dat standard = cu-feff.chi % = chi.dat from feff %------------------% end of autobk.inp %-------------------% Running this example will write autobk.log which contains a brief summary of what happened during the running of the program, including the values of the k and R ranges used, and the value of E_0 used. Output data files will be cuk.chi, containing the chi(k) data, and cue.bkg, containing the mu_0(E) data on the same grid as the input data in cu10k.dat. ===Section A.2 Further Examples=========================================== The above example is about all there is to running autobk. You may want to add a few more of the parameters listed in Chapter 3, but you really shouldn't need too many of them besides rbkg and E0. In any case, here are some more examples, showing how most of the important keywords are used. This should give you an idea of what kinds of inputs are needed for autobk. The next section will have some more concrete suggestions. But go ahead and play with the Cu data until you have a reasonably good feel for how the program parameters effect the background removal, and decide what you like best. Actually, this one autobk.inp (auto2.inp in the uwxafs3.0 distribution) has 3 different background removals, which is a convenient way to test changing some parameter, and is also useful for processing many scans at once. %------------------% autobk.inp %-------------------% title = Cu 10K, no standard, e0 set to 8980., rbkg =1.5 data = cu10k.dat % xmu data output = ab-1.dat e0 = 8980.0 rbkg = 1.5 -------------------- title = Cu 50K, w/ standard data = cu50k.dat output = ab-2.dat e0 = 8980.0 fixe0 = true % don't fit e0 standard = cu-feff.chi fixamp = true % fix amp of standard rbkg = 1.5 kweight = 0 -------------------- title = Cu 50K, no standard, rbkg =1.5, kw = 1 data = cu50k.dat % xmu data output = ab-3.dat rbkg = 1.5 kweight = 1 %------------------% end of autobk.inp %-------------------% ===Section A.3 Suggestions================================================ Except for E_0 and R_bkg, I strongly suggest using the program defaults for the numerical parameters of autobk. Most importantly, do not use the same Fourier transform parameters that you would use for analysis (say, with feffit). The hard part of background removal is the low-k (or near-edge) part of the spectrum, so you want to emphasize this region. So use a small k-weighting, like 1 (the default) or maybe even 0. Also, turn the Fourier Windows ``sills'' off, so that dk = 0. Otherwise the endpoints of the k-range will get no weight in the fit and mu_0(E) will be unstable near the endpoints. Also, at least try to start at low-k, like 0.00. If there is a strong white line you'll probably need to move k_min just above the white line. The autobk background won't be able to follow most white lines. The value of R_bkg is the most important parameter, and the hardest to pick. Start with 1.0 Angstroms or half the near-neighbor distance. You may then want to adjust it to a value where chi(R) is a small fraction of the maximum of the chi(R) for the first peak. Remember that the region you use for background-removal cannot be used in analysis. In fact the analysis of the first shell should begin on the next independent data point after R_bkg, namely R_bkg + pi/Delta_k, where Delta_k is the k-range of the data. Don't go too far into the first shell or you won't have any data to analyze! Suggested values for a few important parameters in autobk: keyword Suggested values notes ---------------------------------------------------------------- Kmin 0.00 - 0.20 unless there is a ``white line'' Kweight 1, or even 0 3 is probably too big Dk 0 not bigger than 1. Rbkg 1.0 or half R_nn keep |chi(R_bkg)| << max{|chi(R)|} ---------------------------------------------------------------- % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Appendix B Program Notes This appendix is intended for those who want or need to deal with the source code of autobk, probably to change some of it because it doesn't work on their machine, or because they've thought of some way to make the code better fit their needs. If you are setting out to change the source code, feel free to contact me. ===Section B.1 Code Portability and Code Compilation====================== The 1977 ANSI Standard for FORTRAN has been followed closely, so that autobk should easily compile on any machine and run without any problems. The only significant departures from FORTRAN 77 are the assumption of the ASCII character set and the use of INTEGER*2 variables for the UWXAFS binary file handling routines. There are, unfortunately, aspects of FORTRAN which are machine- and compiler-dependent by design. One such aspect occurs in autobk in the form of a compiler-dependent dimension for the ``word-length'' of the data in the UWXAFS binary files. The code cannot easily be made truly standard without significant changes to the UWXAFS binary file handling routines. The distributed code will, however, work on most machines, with the notable exception of a Vax. Changing the first executable statement of autobk from "vaxflg = .false." to "vaxflg = .true." will make the code work on a Vax. The UWXAFS binary file handling routines also use character strings which are 2048 characters long. Though standard, some compilers need to be told to accept character strings this long. The notable example of such a compiler is xlf (for AIX, IBM's Unix flavor), which needs the compiler switch ``-qcharlen=2048''. While compiling on any machine, we recommend including some form of array bounds checking. And if you have any problems with the compilation, it may be worthwhile to turn off compiler optimization flags. There may be some persistent, benign compiler warnings when you compile autobk. There may be an ``inconsistent variable type'' warning in the routines from fftpack (routines with names like passf3 and cffti). There may also be ``comparison is always false'' warnings when using f2c. These can both be safely ignored. ===Section B.2 Adding More Data Types to AUTOBK=========================== If the two data file formats (UWXAFS, ASCII) are not acceptable or convenient to your needs (that is, if you prefer using some other format), other choices could be added with a minimal amount of coding. The input and output of data files is fairly well-isolated, with subroutine inpdat and outdat controlling which data format to use. If you'd like another file format either contact us about it or follow the example of the routines inpcol and outcol, which read and write files in the ASCII column data format. The AUTOBK document is finished. Have a nice day.