Skip to content

Releases: StarRocks/starrocks

3.3.15

19 Jun 11:42
0f56e12
Compare
Choose a tag to compare

3.3.15

Bug Fixes

Fixed the following issues:

  • Missing double quotes for string parameters in statistics INSERT statements. #59713
  • Downgrade failure caused by Rollup tasks. #59735
  • Incorrect function parameters in the result of SHOW CREATE VIEW. #59714
  • A security issue where SQL statements with syntax errors exposed sensitive information in the Audit Log. #59442
  • Error "Query version not found". #59194
  • Failure to change data distribution using the ALTER TABLE statement. #59360
  • An issue where root user processes were still visible when admin protection was enabled. #59435
  • Failure of INSERT OVERWRITE into Hive. #59469
  • Missing Tablet ID in the max_tablet_rowset_num log item. #59467
  • An error caused by misconfigured Persistent Index parameters on a Duplicate table. #56040
  • TaskRun history being archived on FE Follower nodes. #59393
  • External catalog-based materialized view refresh errors. #59369
  • Missing minimum version in Tablet information on shared-data clusters. #59373
  • Abnormal maximum column unique ID in native tables of shared-data clusters due to version compatibility logic errors. #59190
  • Materialized view refresh failure on Iceberg catalogs when the source Iceberg table is dropped and recreated, and manual refresh also fails after the materialized view is set to active. #59287
  • Contamination of parameters in materialized view refresh tasks. #59052
  • Data loss caused by Persistent Index when loading snapshot fails. #59247
  • Issues caused when subcolumns of STRUCT appear in multiple predicates. #59216
  • Query failure after renaming columns. #59178
  • Loading failure due to multiple Stream Load requests. #59181
  • Inability to refresh Hive table-based materialized views at the partition level in Unified Catalog. #59139
  • Incorrect UNION plan causing FE out-of-memory (OOM). #59030
  • Version loss during data loading. #59006
  • Predicate loss when queries are rewritten to synchronous materialized views. #58831
  • Issues with BITMAP/HLL/PERCENTILE data types in window functions. #58776
  • Metadata changes to the external tables in Hive Catalog cannot be refreshed. #54596

Behavior Changes

  • Introduced FE configuration parameter task_runs_max_history_number to control the number of historical TaskRuns retained in the information_schema.task_runs view, reducing memory usage. #59161

3.5.0

16 Jun 07:45
10d7323
Compare
Choose a tag to compare

Release Date: June 13, 2025

Upgrade Notes

  • JDK 17 or later is required from StarRocks v3.5.0 onwards.
    • To upgrade a cluster from v3.4 or earlier, you must upgrade the version of JDK that StarRocks depends, and remove the options that are incompatible with JDK 17 in the configuration item JAVA_OPTS in the FE configuration file fe.conf, for example, options that involve CMS and GC. The default value of JAVA_OPTS in the v3.5 configuration file is recommended.
    • For clusters using external catalogs, you need to add --add-opens=java.base/java.util=ALL-UNNAMED to the JAVA_OPTS configuration item in the BE configuration file be.conf.
    • In addition, as of v3.5.0, StarRocks no longer provides JVM configurations for specific JDK versions. All versions of JDK use JAVA_OPTS.

Shared-data Enhancement

  • Shared-data clusters support generated columns. #53526
  • Cloud-native Primary Key tables in shared-data clusters support rebuilding specific indexes. The performance of the indexes is also optimized. #53971 #54178
  • Optimized the execution logic of large-scale data loading operations to avoid generating too many small files in Rowset due to memory limitations. During the import, the system will merge the temporary data blocks to reduce the generation of small files, which improves the query performance after the import and also reduces the subsequent Compaction operations to improve the system resource utilization. #53954

Data Lake Analytics

  • [Beta] Supports creating Iceberg views in the Iceberg Catalog with Hive Metastore integration. And supports adding or modifying the dialect of the Iceberg view using the ALTER VIEW statement for better syntax compatibility with external systems. #56120
  • Supports nested namespace for Iceberg REST Catalog. #58016
  • Supports using IcebergAwsClientFactory to create AWS clients in Iceberg REST Catalog to offer vended credentials. #58296
  • Parquet Reader supports filtering data with Bloom Filter. #56445
  • Supports automatically creating global dictionaries for low-cardinality columns in Parquet-formatted Hive/Iceberg tables during queries. #55167

Performance Improvement and Query Optimization

  • Statistics optimization:
    • Supports Table Sample. Improved statistics accuracy and query performance by sampling data blocks in physical files. #52787
    • Supports recording the predicate columns in queries for targeted statistics collection. #53204
    • Supports partition-level cardinality estimation. The system reuses the system-defined view _statistics_.column_statistics to record the NDV of each partition. #51513
    • Supports multi-column Joint NDV collection to optimize the query plan generated by CBO in the scenario where columns correlate with each other. #56481 #56715 #56766 #56836
    • Supports using histograms to estimate the Join node cardinality and in_predicate selectivity, thus improving the estimation accuracy in data skew. #57874 #57639
    • Optimized Query Feedback. Queries with the identical structure but different parameter values will be categorized as the same type and share the same tuning guide for plan execution optimization. #58306
  • Supports Runtime Bitset Filter as an alternative for optimization to Bloom Filter in specific scenarios. #57157
  • Supports pushing down Join Runtime Filter to the storage layer. #55124
  • Supports Pipeline Event Scheduler. #54259

Partition Management

Cluster Management

  • Upgraded FE compile target from Java 11 to Java 17 for better system stability and performance. #53617 #57030

Security and Authentication

  • Supports secure connections encrypted by SSL based on the MySQL protocol. #54877
  • Enhanced authentication using external systems:
  • Supports Group Provider to obtain the user group information from external authentication services. The group information can then be used in authentication and authorization. Group Provider supports acquiring group information from LDAP, operating systems, or files. Users can query the user group they belong to using the function current_group(). #56670

Materialized Views

  • Supports creating materialized views with multiple partition columns to allow users to partition the data with a more flexible strategy. #52576
  • Supports setting query_rewrite_consistency to force_mv to force the system to use the materialized view for query rewrite, thus keeping performance stability at the cost of data timeliness to a certain extent. #53819

Loading and Unloading

  • Supports pausing Routine Load jobs on JSON parse errors by setting the property pause_on_json_parse_error to true. #56062
  • [Beta] Supports transactions with multiple SQL statements (currently, only INSERT is supported). Users can start, apply, or undo a transaction to guarantee the ACID (atomicity, consistency, isolation, and durability) properties of multiple loading operations. #53978

Functions

Read more

3.4.4

10 Jun 10:11
bc987bb
Compare
Choose a tag to compare

Release Date: June 10, 2025

Improvements

  • Storage Volume now supports ADLS2 using Managed Identity as the credential. #58454
  • For partitions based on complex time function expressions, partition pruning works well for partitions based on most DATETIME-related functions
  • Supports loading Avro data files from Azure using the FILES function. #58131
  • When Routine Load encounters invalid JSON data, the consumed partition and offset information is logged in the error log to facilitate troubleshooting. #55772

Bug Fixes

The following issues have been fixed:

  • Concurrent queries accessing the same partition in a partitioned table caused Hive Metastore to hang. #58089
  • Abnormal termination of INSERT tasks caused the job to remain in the QUEUEING state. #58603
  • After upgrading the cluster from v3.4.0 to v3.4.2, a large number of tablet replicas encounter exceptions. #58518
  • FE OOM caused by incorrect UNION execution plans. #59040
  • Invalid database IDs during partition recycling could cause FE startup to fail. #59666
  • After a failed FE CheckPoint operation, the process could not exit properly, resulting in blocking. #58602

3.3.14

16 May 06:59
2ce6e68
Compare
Choose a tag to compare

Release Date: May 14, 2025

Improvements

Bug Fixes

Fixed the following issues:

  • Issues with the JSON data type in first_value/last_value/lead/lag window functions. #58697
  • Deadlock caused by table-level locks from base tables during materialized view writes (after the bug fix, DB-level locks are used). #58615
  • INSERT tasks hang when the target table is deleted. #58603
  • Failure to change active/inactive state of materialized views with List partitions. #58575
  • Incorrect streaming_load_current_processing metric. #58565
  • Data version update errors caused by continuous loading and replica clone tasks. #58513
  • Failed to refresh materialized views on external tables. #58506
  • Incorrect if() results on ARM architecture. #58455
  • Materialized view rewriting generated incorrect query plans. #58487
  • Iceberg table metadata did not refresh automatically. #58490
  • Incorrect query plan generated by group_concat. #57908
  • Mass Tablet load failures caused by unhandled exceptions during loading. #58393
  • Constant folding failed due to type mismatches while pruning List partitions with generated columns (after the bug fix, an implicit cast rule was added). #54543
  • Mismatch between aggregate function return type and original column type (after the bug fix, the column type is cast to the function output type). #58407
  • broadcast_row_limit set to 0 or below failed to prevent BROADCAST JOIN generation. #58307
  • Broker Load used BE nodes that had already been blacklisted. #58350
  • Asynchronous tasks persist in the background and cannot be dropped after manually cancelling materialized view refresh tasks. #58310
  • Failed to create expression partitions with month or year granularity. #58182
  • ngram_search generated invalid query plans. #58190

3.4.3

10 Jun 10:12
a01aa59
Compare
Choose a tag to compare

Release Date: April 30, 2025

Improvements

  • Routine Load and Stream Load support the use of Lambda expressions in the columns parameter for complex column data extraction. array_filter/map_filter can be used to filter and extract ARRAY/MAP data. Complex filtering and extraction of JSON data can be achieved by combining the cast function to convert JSON array/JSON object to ARRAY and MAP types. For example, COLUMNS (js, col=array_filter(i -> json_query(i, '$.type')=='t1', cast(js as Array<JSON>))[1]) can extract the first JSON object from the JSON array js where type is t1. #58149
  • Supports converting JSON objects to MAP type using the cast function, combined with map_filter to extract items from the JSON object that meet specific conditions. For example, map_filter((k, v) -> json_query(v, '$.type') == 't1', cast(js AS MAP<String, JSON>)) can extract the JSON object from js where type is t1. #58045
  • LIMIT is now supported when querying the information_schema.task_runs view. #57404

Bug Fixes

The following issues have been fixed:

  • Queries against ORC format Hive tables are returned with an error OrcChunkReader::lazy_seek_to failed. reason = bad read in RleDecoderV2: :readByte. #57454
  • RuntimeFilter from the upper layer could not be pushed down when querying Iceberg tables that contain Equality Delete files. #57651
  • Enabling the spill-to-disk pre-aggregation strategy causes queries to fail. #58022
  • Queries are returned with an error ConstantRef-cmp-ConstantRef not supported here, null != 111 should be eliminated earlier. #57735
  • Query timeout with the query_queue_pending_timeout_second parameter while the Query Queue feature is not enabled. #57719

3.3.13

22 Apr 07:49
0279646
Compare
Choose a tag to compare

3.3.13

Release Date: April 22, 2025

Improvements

  • Added memory consumption metrics for queries in FE in audit logs and the QueryDetail interface. #57731
  • Optimized the strategy for concurrent creation of expression partitions. #57899
  • Added monitoring metrics for the number of active FE nodes. #57857
  • The information_schema.task_runs view supports pushdown of the LIMIT clause. #57404
  • Fixed several CVE issues. #57705 #57620
  • Primary Key tables support retry during the PUBLISH stage, enhancing system disaster recovery capabilities. #57354
  • Reduced memory consumption of Flat JSON. #57357
  • The information_schema.routine_load_jobs view adds the timestamp_progress column, consistent with the SHOW ROUTINE LOAD statement return. #57123
  • Disallowed unauthorized behaviors from StarRocks to LDAP. #57131
  • Supports returning an error when the schema of an AVRO file does not match the schema of the Hive table. #57296
  • Materialized views support the excluded_refresh_tables property. #56428

Bug Fixes

Fixed the following issues:

  • Flat JSON does not support the get_json_bool function. #58077
  • SHOW AUTHENTICATION statement returns the password. #58072
  • The percentile_count function returns incorrect values. #58038
  • Issues caused by spilling strategies. #58022
  • After a BE is blacklisted, Stream Load still dispatches tasks to the BE, causing task failures. #57919
  • Issues when using the cast function with semi-structured data types. #57804
  • The array_map function returns incorrect values. #57756
  • In the scenario of a single tablet, using multiple distinct functions on the same column with a single-column GROUP BY clause leads to incorrect query results. #57690
  • MIN/MAX values in the profiles of big queries are inaccurate. #57655
  • Non-partitioned materialized views based on Delta Lake data cannot rewrite queries. #57686
  • A Routine Load deadlock issue. #57430
  • Predicate pushdown issues with DATE/DATETIME columns. #57576
  • An issue when the percentile_disc function has an empty input. #57572
  • When modifying the bucket distribution of a table with the statement ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ..., specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005
  • ALTER TABLE MODIFY COLUMN fails with expression partitioned tables based on str2date function. #57487
  • CACHE SELECT issue with semi-structured columns. #57448
  • Upgrade compatibility issue caused by hadoop-lib. #57436
  • Case sensitivity error issues when creating partitions. #54867
  • Some columns generate incorrect sort keys during updates. #57375
  • Unknown issues caused by nested window functions . #57216

3.4.2

11 Apr 14:40
c15ba7c
Compare
Choose a tag to compare

Release Date: April 10, 2025

Improvements

  • FE supports graceful shutdown to improve system availability. When exiting FE via ./stop_fe.sh -g, FE will first return a 500 status code to the front-end Load Balancer via the /api/health API to indicate that it is preparing to shut down, allowing the Load Balancer to switch to other available FE nodes. Meanwhile, FE will continue to run ongoing queries until they finish or timeout (default timeout: 60 seconds). #56823

Bug Fixes

The following issues have been fixed:

  • Partition pruning might not work if the partition column is a generated column. #54543
  • Incorrect parameter handling in the concat function could cause a BE crash during query execution. #57522
  • The ssl_enable property did not take effect when using Broker Load to load data. #57229
  • When NULL values exist, querying subfields of STRUCT-type columns could cause a BE crash. #56496
  • When modifying the bucket distribution of a table with the statement ALTER TABLE {table} PARTITIONS (p1, p1) DISTRIBUTED BY ..., specifying duplicate partition names could result in failure to delete internally generated temporary partitions. #57005
  • In a shared-data cluster, running SHOW PROC '/current_queries' resulted in the error "Error 1064 (HY000): Sending collect query statistics request fails". #56597
  • Running INSERT OVERWRITE loading tasks in parallel caused the error "ConcurrentModificationException: null", resulting in loading failure. #56557
  • After upgrading from v2.5.21 to v3.1.17, running multiple Broker Load tasks concurrently could cause exceptions. #56512

Behavior Changes

  • The default value of the BE configuration item avro_ignore_union_type_tag has been changed to true, enabling the direct parsing of ["NULL", "STRING"] as STRING type data, which better aligns with typical user requirements. #57553
  • The default value of the session variable big_query_profile_threshold has been changed from 0 to 30 (seconds). #57177
  • A new FE configuration item enable_mv_refresh_collect_profile has been added to control whether to collect Profile information during materialized view refresh. The default value is false (previously, the system collected Profile by default). #56971

3.3.12

08 Apr 03:47
4cd554b
Compare
Choose a tag to compare

3.3.12

Release date: April 3, 2025

New Features

  • Supports the percentile_approx_weighted function. #56654
  • Supports modifying properties of Hive Catalog and Hudi Catalog. #56212
  • Paimon Catalog supports manifest cache. #55788
  • Supports SHOW PARTITIONS for tables in Paimon Catalog. #55785
  • Supports statistics collection for Paimon Catalog. #55757

Improvements

  • Various improvements and bug fixes related to statistics. #57147 #57238 #57170 #57154 #57124 #57047 #56956 #57031 #56904 #56950 #56671 #55922
  • Optimized error messages when table creation fails. #57055
  • Enhanced retry mechanism for Broker Load. #56987
  • Improved performance of array_generate. #57252
  • Aborted ongoing Compaction tasks for deleted partitions. #56943
  • Optimized error messages when ALTER TABLE fails. #57054
  • Removed unnecessary reverse step from array_agg() to improve performance. #56958
  • Added checksum verification for replicas in Primary Key tables. #56519
  • Masked sensitive information in the FILES function output. #56684
  • Reduced noisy logs related to materialized views. #56672
  • Upgraded Iceberg version to 1.7.1. #55271

Bug Fixes

  • INSERT INTO FILES did not support CSV delimiter conversion. #57126
  • Issues with Iceberg REST Catalog. #55416
  • Predicate was lost during rewrite for view-based materialized views. #57153
  • Paimon Catalog failed to read tables with schema changes. #56796
  • Timezone conversion issue in Paimon Catalog. #56879
  • SHOW MATERIALIZED VIEWS did not display default_catalog information. #56362
  • In Trino dialect mode, time strings containing 'T' were not accepted. (Solution: replaced parse_datetime with str_to_jodatime.) #56565
  • Incorrect result of first_value function. #56467
  • Incorrect result of concat_ws function. #56384

Behavior Changes

  • Added authentication to the FE Profile interface. #56914
  • Changed default value of session variable big_query_profile_threshold from 0 to 30. #56520

3.4.1

13 Mar 02:36
2f78e09
Compare
Choose a tag to compare

Release Date: March 12, 2025

New Features and Enhancements

  • Data lake analytics supports Deletion Vector in Delta Lake.
  • Supports secure views. By creating a secure view, you can prevent users without the SELECT privilege on the referenced base tables from querying the view (even if they have the SELECT privilege on the view).
  • Supports for Sketch HLL (ds_hll_count_distinct). Compared to approx_count_distinct, this function provides higher-precision approximate deduplication.
  • Storage Volume in the shared-data clusters supports Azure Data Lake Storage Gen2.
  • Supports SSL authentication for connections to StarRocks via the MySQL protocol, ensuring that data transmitted between the client and the StarRocks cluster cannot be read by unauthorized users.

Bug Fixes

The following issues have been fixed:

  • An issue where OLAP views affected the materialized view processing logic. #52989
  • Write transactions would fail if one replica was not found, regardless of how many replicas had successfully committed. (After the fix, the transaction succeeds as long as the majority replicas succeed. #55212
  • Stream Load fails when a node with an Alive status of false was scheduled. #55371
  • Files in cluster snapshots were mistakenly deleted. #56338

Behavior Changes

  • Graceful shutdown is now enabled by default (previously it was disabled). The default value of the related BE/CN parameter loop_count_wait_fragments_finish has been changed to 2, meaning that the system will wait up to 20 seconds for running queries to complete. #56002

3.3.11

20 Mar 03:36
bc77e6b
Compare
Choose a tag to compare

3.3.11

Release date: March 7, 2025

Improvements

  • Files supports exporting JSON type data into Parquet files. #56406
  • Optimized Data Cache WarmUp performance for cloud-native tables in shared-data clusters. #56190
  • Supports parsing AT TIME ZONE expressions and the from_iso8601_timestamp function in Trino. #56311 #55573
  • Partial Updates for Primary Key tables within shared-data clusters supports Condition Updates. #56132
  • Extended support for statistics collection across all types of SQL statements. #56257
  • Supports configuring the maximum number of returned rows for SHOW PROC '/transaction'. #55933
  • Supports creating asynchronous materialized views on Oracle-type JDBC Catalog tables. #55372
  • MemTracker on BE WebUI supports pagination with 25 rows per page. #56206

Bug Fixes

Fixed the following issues:

  • FE does not support casting constant TIME data types into DATETIME. #55804
  • Stream Load transaction interface does not support the starrocks_fe_table_load_rows and starrocks_fe_table_load_bytes metrics. #44991
  • Changes to automatic statistics collection do not take effect. #56173
  • Materialized views in abnormal states caused issues with SHOW MATERIALIZED VIEWS. #55995
  • Text-based materialized view rewrite does not work across different databases. #56001
  • Metadata compatibility issues in JDBC Catalogs. #55993
  • Issues of handling the JSON data type in JDBC Catalogs. #56008
  • Incorrect Sort Key settings during Schema Change. #55902
  • Credential information leak issue in Broker Load. #55358

Behavior Changes

  • Added authentication to the query_detail interface in FE. #55919