This article explains how to compile data from different sources - spectrometer, lab analysis spreadsheet, time information - for the subsequent creation of analysis models. It is intended for users of PEAXACT - Software for Quantitative Spectroscopy from S-PACT.
You will end up with a nicely cleaned data set to immediately continue with modelling.
Creating and training a chemometric model requires the raw spectra, but also involves associated properties of the measured material:
- timestamps
- lab analysis values (reference data)
- categorical labels
- meta data
Typically, these data originate from various sources like spectrum files, spreadsheets, lab/process information management systems etc. and need to be linked to the spectra. Only then the data set can reasonably be used for calibration, classification, and analysis purposes. This article describes typical worksteps when creating such a data set.
Loading spectra
PEAXACT can load spectra from single-spectrum files or from files containing multiple spectra like time-resolved measurements. Consider using a flat folder structure to organize your files (e.g. one folder per experiment or per date), and take a look at our recommendations how to handle spectra from multiple levels of subfolders.
Loading spectra into PEAXACT is easily achieved by dragging individual files or whole folders from the Windows Explorer and dropping them into the main window's Samples Panel. The panel gets filled with a list of recognized spectra, named after the file plus an identifier (#1, #2, ...) which is useful for multi-spectrum files.
For inspection and visualization of the data set as a whole, and for adding additional information to the spectra, use the PEAXACT Data Inspector.
Adding Timestamps
Most spectrum file formats carry additional information about parameters of the measurement. If you have a choice, consider saving your spectra in a format that supports such meta data. The acquisition timestamp is of particular interest, because it is the typical key to link sample information from other sources, e.g. results from offline analysis. Use the Load Timestamp button in the Data Inspector to automatically pull these timestamps into PEAXACT.
If you are rather interested in relative times than in absolute timestamps, e.g. when analyzing the time evolution of a batch process, use the Timestamp > Time conversion, allowing you to select the spectrum of time = 0 and the unit (seconds, minutes, hours, ...) of the relative time.
Adding Labels
Samples are often labelled or grouped, e.g. by a batch ID, material supplier name, sample code or the like. It is always useful to combine these values with the spectra, in order to elucidate the specific differences between groups, or to later train a Classification Model.
In cases where the folder path or the spectrum filename can already be used for grouping - great, just use a filter expression in Data Inspector to filter the table. In cases where you want to use a separate text label you first need to add a new categorical feature to the table and insert some values. In PEAXACT, a categorical feature is recognized as such if its name is put in curly brackets (e.g. {Batch ID}).
Join Tables
A crucial step in putting together your data set is adding reference values which are often stored in external spreadsheets, or can be exported from databases into such. Instead of manually copying over values one by one to the related spectra, the PEAXACT Data Inspector provides a convenient way for automatically joining tables. If your current table in PEAXACT and your reference table have (at least) one feature in common it can be used for matching up rows of both tables.
For example, you could join tables by matching timestamps. This even works if the acquisition times of spectra and measurement times of reference values do not agree perfectly, because PEAXACT allows you to specify a tolerance and joins the closest match.
Marking Bad Samples
To spot problematic samples in your data set - e.g. measurement errors or deficient reference values - a visual inspection is often sufficient, eventually supported by data pretreatments to highlight the systematics in the data. You can use the Selection Tool in any plot to select spectra and then set their Quality to Bad. Bad samples will be ignored by future modeling and analysis steps.
Preserving the Data Set
After you have put so much work into gathering all pieces of information in one place, you certainly wouldn't want to lose it. With the PEAXACT Data Inspector you can export the new table to a spreadsheet file and preserve it for the upcoming modeling challenges. Next time, just load the table file and PEAXACT will automatically reload all the spectra and the associated features - without the need to repeat any of the steps above. Nice, isn't it?