-
-
Notifications
You must be signed in to change notification settings - Fork 86
Closed
Description
Currently, the file query function (e.g., file()) supports basic glob patterns like * for file path matching. This is useful for selecting all files with a certain extension, but it lacks the flexibility to select files based on more complex or non-contiguous patterns.
For example, if I have a directory with daily data files named in a YYYYMMDD.parquet format, I might want to query data for specific days of the month (e.g., the 1st, 2nd, and 21st). With the current glob-only matching, my only option is to match all files and then filter them, which is inefficient.
-- This would select ALL parquet files in the directory
SELECT * FROM file('/home/data/*.parquet', Parquet);
An ideal syntax would look something like this:
-- This query should match and process only the specified files:
-- /home/data/20250901.parquet
-- /home/data/20250902.parquet
-- /home/data/20250921.parquet
SELECT * FROM file('/home/data/202509(01|02|21).parquet', Parquet);
Thank you for considering this enhancement.
Metadata
Metadata
Assignees
Labels
No labels