Working with Data from Multiple Subfolders

This article answers a frequently asked question about how to load data files from multiple subfolders into PEAXACT.

Currently, we all keep saying that 80% of our work as data scientists is preparing the data. I personally feel a lot of time is spent on structuring the data in folders. In an ideal PEAXACT world, the data of each measurement campaign would be present in one folder with one data file per spectrum. Those files consist of a standardized file format for a specific data type (e.g. XY data of a spectrum) including a timestamp. Back in the real world, there are cases where files are spread in numerous subfolders and even might include identical file names for different data sets.

So what are the above mentioned cases and how can we restructure the files to make our live easier in PEAXACT?

Case 1: Multiple subfolders with differing file names

Well that is an easy one. Simply drag and drop the entire folder into the PEAXACT window.

Case 2: Multiple subfolders with multiple file formats

If all files of interest share the same file name (Hello NMR folks!) jump to case 3.

Now assuming, that you are only interested in one file format we suggest using the search field in your Windows file explorer. You can filter the files in a selected folder plus its subfolders for a specific format (e.g. search for ".spc") and then drag and drop the search results into the PEAXACT window.

Case 3: Multiple subfolders with identical file names

The problem we are facing here is that samples with identical file names are difficult to distinguish in the PEAXACT window. For real time applications (e.g. using PEAXACT ProcessLink) identical filenames are certainly no problem as we are looking only for the latest file in a folder and its result. However, for model development in PEAXACT we prefer to identify certain files by their name. To do so, we should copy and rename the data files before loading them into PEAXACT. Below, we explain how you can do this with Siren. It is free of charge, distributed under a General Public License and can be executed without prior installation.

  1. Select the top level folder path which contains your data
  2. Include sub directories
  3. Filter for a file name of interest, e.g. *\\spectrum.1d
  4. Enter an expression to rename the files
    e.g. %P\\RenamedFiles\\%p4-%p3-%p2 %p1 %f
    In this specific case, files are assumed to be located in 4 levels of subfolders. They will be copied to a single folder RenamedFiles (within the top level directory %P) and renamed after the 4 subfolders (%p1, %p2, ...). E.g., a file from \2020\08\25\1032\spectrum.1d gets copied and renamed to \RenamedFiles\2020-08-25 1032 spectrum.1d
  5. Select all files
  6. Create copies

Finally, you can load the renamed files into PEAXACT via drag and drop.

Notes:

  • Siren can be called from the command line to automate the steps above:
    "siren.exe" --dir "." --filter "*\\spectrum.1d" 
                --expression "%P\\RenamedFiles\\%p4-%p3-%p2 %p1 %f" 
                --subdir --select "*.*" --copy --nosaveparam --quit
  • You must use double \\ for path separators.
  • If you want to call Siren from a Windows batch file, you must also double each %.
 

SPACT GmbH

Burtscheider Str. 1
52064 Aachen
Tel.: +49 241 - 9569 9812
Fax: +49 241 - 4354 4308
E-Mail:
Internet: www.s-pact.de