Skip to content

[Feature Request] Support Regular Expressions in File Path Matching #372

@yidigo

Description

@yidigo

Currently, the file query function (e.g., file()) supports basic glob patterns like * for file path matching. This is useful for selecting all files with a certain extension, but it lacks the flexibility to select files based on more complex or non-contiguous patterns.

For example, if I have a directory with daily data files named in a YYYYMMDD.parquet format, I might want to query data for specific days of the month (e.g., the 1st, 2nd, and 21st). With the current glob-only matching, my only option is to match all files and then filter them, which is inefficient.

-- This would select ALL parquet files in the directory
SELECT * FROM file('/home/data/*.parquet', Parquet);

An ideal syntax would look something like this:

-- This query should match and process only the specified files:
-- /home/data/20250901.parquet
-- /home/data/20250902.parquet
-- /home/data/20250921.parquet
SELECT * FROM file('/home/data/202509(01|02|21).parquet', Parquet);

Thank you for considering this enhancement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions