Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good First Issue]StarRocks Hands-on Tasks 2024 #40894

Open
16 of 51 tasks
wangsimo0 opened this issue Feb 6, 2024 · 15 comments · May be fixed by #50586
Open
16 of 51 tasks

[Good First Issue]StarRocks Hands-on Tasks 2024 #40894

wangsimo0 opened this issue Feb 6, 2024 · 15 comments · May be fixed by #50586
Assignees

Comments

@wangsimo0
Copy link
Contributor

wangsimo0 commented Feb 6, 2024

Hi Rockstars,

This is a list of proposed Hands-on tasks. If you're new to StarRocks and eager to engage with the community, here are some issues that are well-suited for you to dive into :) These issues are suitable for gaining hands-on experience and becoming familiar with StarRocks development. Also this is an open list, you are welcome to propose more tasks.

Please @kateshaowanjou or @wangsimo0 to book the issue, and add a comment in the issue you picked, so the issue won't be assigned to others. And always discuss with the community about the design before actually developing, some of the issues are really big, don't hesitate to seek help from the community.

External Catalog related issues

Information Schema

External Catalog

In version 3.2 and later, StarRocks enhances compatibility with more BI tools by supporting the information_schema database in External Catalog. This feature serves as a valuable tool for obtaining structured information. While several views within information_schema currently return empty, efforts are underway to optimize support for these views to ensure comprehensive coverage.
StarRocks aligns with MySQL's pattern in supporting information_schema, as it follows the MySQL protocol. We better maintain the compatibility with MySQL, provide as much information as we can, and optimize for efficiency to minimize time consumption. consumed.

  • Columns view
  • Views view

Default Catalog

Trino's Compatibility Issues

In version 3.0 and later, StarRocks supports Trino's SQL_dialect mode; however, ongoing enhancements are necessary to further optimize this functionality.

New Functions

Function Mapping

  Trino's function/expression StarRocks' function/expression comment assginee
map_agg(key, value) → map<K,V> map()  @Jcnessss
show schemas from <catalog_name> Show databases from <catalog_name> #40868  
array_sort(array(T), function(T, T, int)) -> array(T) array_sortby(, array0 [, array1...]) This one needs to pay attention to the input order.
sequence(start, stop)sequence(start, stop, step)In integers data type array_generate([start,] end [, step])  
last_day_of_month(x) → date last_day(x,'month');  
map_from_entries(array(row(K, V))) -> map(K, V) map_from_arrays. This one needs to pay attention to the transformation. SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); equals to SELECT map_from_arrays([1,2],['x','y']);
current_catalog catalog()   thanks to @macroguo-ghy
current_schema database()   thanks to @macroguo-ghy
slice(x, start, length) → array array_slice(input, offset, length)  
approx_set(x) → HyperLogLog HLL_HASH(column_name)  
empty_approx_set() → HyperLogLog HLL_EMPTY()  
merge(HyperLogLog) → HyperLogLog HLL_RAW_AGG(hll)  

Other Enhancements

  • Apache Ranger's policy translator

StarRocks support using Hive service in Ranger to control access towards hive tables. However we discover there are still some community users want to manage all the privs in StarRocks ranger service. So we need a translator(maybe a script)

  • Add catalog information in FE's query_detail @happut

After enabling collect query details using admin set frontend config("enable_collect_query_detail_info"="true") user can get query detail using curl -uroot: http://172.26.81.138:8030/api/query_detail?event_time=<unixtimestamp_value> , the information is like ...."database":"simo","sql":"insert into abc values (1,2),(2,3)","user":"root"....
There is no catalog information. Like "catalog":"defaut_catalog"

Apache Hudi & Delta Lake Capabilities

  • Add Hudi sink (✨ HIGH priority)
  • Add Delta Lake sink (✨ HIGH priority)

More Connectors

  • Oracle catalog
  • Kudu catalog @predator4ann
  • StarRocks catalog
  • Greenplum catalog
  • SQLSever catalog
  • Clickhouse catalog
  • Trino catalog
  • DB2 catalog
  • Druid catalog
  • Oceanbase catalog
  • SAP Hana catalog

More Capabilities

  • Hive UDF compatible
  • Spark SQL compatible structure
  • Hive SQL compatible structure
  • Impala SQL compatible structure
@alberttwong
Copy link
Contributor

I'd add iceberg tagging and branch query

@alberttwong
Copy link
Contributor

#37959

@241600489
Copy link

I want to pick #38989 @wangsimo0

DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 21, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 22, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 24, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 25, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 25, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 25, 2024
DataScientistSamChan added a commit to DataScientistSamChan/starrocks that referenced this issue Mar 25, 2024
mergify bot pushed a commit that referenced this issue May 11, 2024
wanpengfei-git pushed a commit that referenced this issue May 11, 2024
@kateshaowanjou kateshaowanjou self-assigned this May 14, 2024
node pushed a commit to vivo/starrocks that referenced this issue May 17, 2024
@yangzho12138
Copy link

I want to pick #37089 @wangsimo0

@kateshaowanjou
Copy link
Contributor

I want to pick #37089 @wangsimo0
You need to also comment under the issue #37089 so I can assign it to you.
If you have any issues during the development process, I can introduce you to the relevant discussion group. https://853921.ma3you.cn/articles/b12e90J/

happut added a commit to happut/starrocks that referenced this issue Jun 17, 2024
happut added a commit to happut/starrocks that referenced this issue Jun 17, 2024
happut added a commit to happut/starrocks that referenced this issue Jun 26, 2024
happut added a commit to happut/starrocks that referenced this issue Jun 26, 2024
happut added a commit to happut/starrocks that referenced this issue Jun 26, 2024
happut added a commit to happut/starrocks that referenced this issue Jun 26, 2024
stephen-shelby pushed a commit that referenced this issue Jun 26, 2024
mergify bot pushed a commit that referenced this issue Jun 26, 2024
…ay_of_month Support(#40894) (#47529)

Signed-off-by: happut <[email protected]>
(cherry picked from commit e6c7c3d)
wanpengfei-git pushed a commit that referenced this issue Jun 26, 2024
@yangzho12138
Copy link

I want to pick #46105 @wangsimo0

@FLAYhhh
Copy link

FLAYhhh commented Aug 6, 2024

@wangsimo0 Hi, I want to add Delta Lake Compatibilities. Has this requirement been resolved?

@kateshaowanjou
Copy link
Contributor

@wangsimo0 Hi, I want to add Delta Lake Compatibilities. Has this requirement been resolved?
Are you referring to the "Add Delta Lake sink" function? There's no one working on it at the moment and it'd be awesome if you are willing to give it a try!😎

@FLAYhhh
Copy link

FLAYhhh commented Aug 8, 2024

Sure thing! I'd be happy to take this on.

@amoghmargoor
Copy link

@kateshaowanjou @wangsimo0 Can I pick this issue: #38989 if its not being worked upon by anyone ?

@kateshaowanjou
Copy link
Contributor

Sure thing! I'd be happy to take this on.

This issue is not the easiest one so feel free to add my WeChat:wanjoushao if you need help!

@Jcnessss
Copy link
Contributor

Jcnessss commented Aug 22, 2024

@kateshaowanjou @wangsimo0 We are migrating from Trino to Starrocks and working on the functions. Can I pick the map_agg issue?

@SoraNimi
Copy link

I want to pick this issue #46060 @wangsimo0 @kateshaowanjou

@colagy
Copy link

colagy commented Jan 22, 2025

Trino (formerly known as PrestoSQL) indeed supports the RANGE BETWEEN clause in window functions. RANGE BETWEEN is used to define a window frame based on a range of values in the ordering column. This is different from ROWS BETWEEN, which is based on a physical number of rows.

Basic Usage of RANGE BETWEEN

RANGE BETWEEN is typically used in conjunction with the ORDER BY clause to define a window frame based on the range of values in the ordering column. Here are some common usage examples:

Example 1: Calculating Aggregations Over a Value Range

Suppose you have a table sales with columns date and amount, and you want to calculate the sales amount for each date along with the total sales amount for the previous 7 days.

SELECT
    date,
    amount,
    SUM(amount) OVER (
        ORDER BY date
        RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
    ) AS total_amount_7_days
FROM
    sales;

In this example, RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW defines a window that includes the current row and all rows within the previous 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.