Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] - Floating point operations with Integer columns #879

Open
praveentiru opened this issue Aug 14, 2021 · 5 comments
Open

[Help] - Floating point operations with Integer columns #879

praveentiru opened this issue Aug 14, 2021 · 5 comments

Comments

@praveentiru
Copy link

praveentiru commented Aug 14, 2021

Posted from stackoverflow: Original post
I am calculating fraction of orders that were not filled from a large list of order lines. I am using datafusion crate to perform analysis. I want to build a table that looks as shown below:

+--------+--------------+---------------+--------------+
| Month  | Total Orders | Missed Orders | Missed Ratio |
+--------+--------------+---------------+--------------+
| 201803 | 10           | 3             | 0.3          |
+--------+--------------+---------------+--------------+

To achieve this I have return following code:

    let result = record_count
        .select(vec![col("Month"), 
            col("Total Orders"), 
            col("Missed Orders"),
            (col("Missed Orders").cast_to(&DataType::Float64, &m_order_schema).unwrap() / col("Total Orders").cast_to(&DataType::Float64, &t_order_schema).unwrap()).alias("Missed Ratio")])?;

The total orders and missed orders column as integers so, I am casting them to float to get fraction. But, Missed Ratio column comes out as integer with all zeros. Result looks as shown below:

+--------+--------------+---------------+--------------+
| Month  | Total Orders | Missed Orders | Missed Ratio |
+--------+--------------+---------------+--------------+
| 201803 | 10           | 3             | 0            |
+--------+--------------+---------------+--------------+

Question: How to perform floating point operations with integer columns?

@alamb alamb added question Further information is requested and removed question Further information is requested labels Oct 2, 2021
@alamb
Copy link
Contributor

alamb commented Oct 2, 2021

Sorry for the late response

Service Level column comes out as integer with all zeros.

I am not sure I understand the question -- there is no Service Level column in the output

If you could provide a self contained reproducer we might be able to help more

@praveentiru
Copy link
Author

@alamb My bad. Service level = 1/Missed Ratio. I have edited the question to replace Service Level with Missed Ratio.

@alamb
Copy link
Contributor

alamb commented Oct 3, 2021

@praveentiru I don't normally use the dataframe API (and for your case, the SQL interface might work better).

But in any event, I tried to reproduce the problem you are having, and I was not able to. Here is the program I used:

async fn test_cast() {
    let mut ctx = ExecutionContext::new();

    let month: Date32Array = vec![Some(1000)].into_iter().collect();
    let total_orders: Int64Array = vec![Some(10)].into_iter().collect();
    let missed_orders: Int64Array = vec![Some(3)].into_iter().collect();

    let batch = RecordBatch::try_from_iter(vec![
        ("Month", Arc::new(month) as ArrayRef),
        ("Total Orders", Arc::new(total_orders) as ArrayRef),
        ("Missed Orders", Arc::new(missed_orders) as ArrayRef),
    ]).unwrap();

    let m_order_schema = DFSchema::try_from_qualified_schema(
        "m_orders",
        batch.schema().as_ref()
    ).unwrap();

    let t_order_schema = DFSchema::try_from_qualified_schema(
        "t_orders",
        batch.schema().as_ref()
    ).unwrap();

    let table = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();

    let record_count = ctx.read_table(Arc::new(table)).unwrap();

    let result = record_count
        .select(vec![col("Month"),
            col("Total Orders"),
            col("Missed Orders"),
                     (col("Missed Orders").cast_to(&DataType::Float64, &m_order_schema).unwrap() / col("Total Orders").cast_to(&DataType::Float64, &t_order_schema).unwrap()).alias("Service Level")]).unwrap();

    result.show().await.unwrap();

}

And when I ran that code, it appears to produce the output you expected:

Starting tests
+------------+--------------+---------------+---------------+
| Month      | Total Orders | Missed Orders | Service Level |
+------------+--------------+---------------+---------------+
| 1972-09-27 | 10           | 3             | 0.3           |
+------------+--------------+---------------+---------------+

@houqp
Copy link
Member

houqp commented Oct 3, 2021

@praveentiru did you install your python binding from source? if not, it might be really out of date.

@praveentiru
Copy link
Author

@houqp I am working in Rust directly. I will try the code from @alamb and get back. I could not investigate over weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants