You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to consume multiple CDC Avro topics, each of their schemas defines a nested source schema, but the source definitions differ between topics. I haven't found a way to generate the C# schemas from .avsc files in a way where all of the consumers worked.
Note that I'm not in control of producing to or managing these topics. I can't change anything about them.
The schemas and other code used in the example below is fictional but roughly represents the real schemas I'm dealing with.
Generating the schema for companies with avrogen -s schema-companies.avsc . creates value.cs and Source.cs files. The value.cs file isn't problematic, so I'm leaving it out.
source.cs
// ------------------------------------------------------------------------------// <auto-generated>// Generated by avrogen, version 1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e// Changes to this file may cause incorrect behavior and will be lost if code// is regenerated// </auto-generated>// ------------------------------------------------------------------------------namespaceio.debezium.connector.postgresql{usingSystem;usingSystem.Collections.Generic;usingSystem.Text;usingglobal::Avro;usingglobal::Avro.Specific;[global::System.CodeDom.Compiler.GeneratedCodeAttribute("avrogen","1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e")]publicpartialclassSource:global::Avro.Specific.ISpecificRecord{publicstaticglobal::Avro.Schema_SCHEMA=global::Avro.Schema.Parse("""{"type":"record","name":"Source","namespace":"io.debezium.connector.postgresql","fields":[{"name":"version","type":"string"},{"name":"ts_ms","type":"long"},{"name":"name","type":"string"}],"connect.name":"io.debezium.connector.postgresql.Source"}""");privatestring_version;privatelong_ts_ms;privatestring_name;publicvirtualglobal::Avro.SchemaSchema{get{returnSource._SCHEMA;}}publicstringversion{get{returnthis._version;}set{this._version=value;}}publiclongts_ms{get{returnthis._ts_ms;}set{this._ts_ms=value;}}publicstringname{get{returnthis._name;}set{this._name=value;}}publicvirtualobjectGet(intfieldPos){switch(fieldPos){case0:returnthis.version;case1:returnthis.ts_ms;case2:returnthis.name;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Get()");};}publicvirtualvoidPut(intfieldPos,objectfieldValue){switch(fieldPos){case0:this.version=(System.String)fieldValue;break;case1:this.ts_ms=(System.Int64)fieldValue;break;case2:this.name=(System.String)fieldValue;break;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Put()");};}}}
Generating the C# classes for projects with avrogen -s schema-projects.avsc . replaces the Source.cs with:
Source.cs
// ------------------------------------------------------------------------------// <auto-generated>// Generated by avrogen, version 1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e// Changes to this file may cause incorrect behavior and will be lost if code// is regenerated// </auto-generated>// ------------------------------------------------------------------------------namespaceio.debezium.connector.postgresql{usingSystem;usingSystem.Collections.Generic;usingSystem.Text;usingglobal::Avro;usingglobal::Avro.Specific;[global::System.CodeDom.Compiler.GeneratedCodeAttribute("avrogen","1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e")]publicpartialclassSource:global::Avro.Specific.ISpecificRecord{publicstaticglobal::Avro.Schema_SCHEMA=global::Avro.Schema.Parse("""{"type":"record","name":"Source","namespace":"io.debezium.connector.postgresql","fields":[{"name":"version","type":"string"},{"name":"name","type":"string"}],"connect.name":"io.debezium.connector.postgresql.Source"}""");privatestring_version;privatestring_name;publicvirtualglobal::Avro.SchemaSchema{get{returnSource._SCHEMA;}}publicstringversion{get{returnthis._version;}set{this._version=value;}}publicstringname{get{returnthis._name;}set{this._name=value;}}publicvirtualobjectGet(intfieldPos){switch(fieldPos){case0:returnthis.version;case1:returnthis.name;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Get()");};}publicvirtualvoidPut(intfieldPos,objectfieldValue){switch(fieldPos){case0:this.version=(System.String)fieldValue;break;case1:this.name=(System.String)fieldValue;break;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Put()");};}}}
It overwrites the companies' Source class with the projects' Source class as both sources share the same name and namespace in the Avro schema, but now the remaining Source class doesn't have the ts_ms field, which makes the companies consumer fail on deserialization with an error similar to:
Confluent.Kafka.ConsumeException: Local: Value deserialization error ---> Avro.AvroException: Unable to cast object of type 'System.Nullable`1[System.Int64]' to type 'System.String'. in field schema in field source ---> Avro.AvroException: Unable to cast object of type 'System.Nullable`1[System.Int64]' to type 'System.String'. in field schema ---> System.InvalidCastException: Unable to cast object of type 'System.Nullable`1[System.Int64]' to type 'System.String'.
at io.debezium.connector.postgresql.Source.Put(int fieldPos, object fieldValue) in Whatever/Source.cs:line 227
at Avro.Specific.SpecificDefaultReader.ReadRecord(object reuse, RecordSchema writerSchema, Schema readerSchema, Decoder dec) --- End of inner exception stack trace ---
at Avro.Specific.SpecificDefaultReader.ReadRecord(object reuse, RecordSchema writerSchema, Schema readerSchema, Decoder dec)
at Avro.Specific.SpecificDefaultReader.ReadRecord(object reuse, RecordSchema writerSchema, Schema readerSchema, Decoder dec) --- End of inner exception stack trace ---
at Avro.Specific.SpecificDefaultReader.ReadRecord(object reuse, RecordSchema writerSchema, Schema readerSchema, Decoder dec)
at Avro.Generic.DefaultReader.Read<T>(T reuse, Decoder decoder)
at Avro.Specific.SpecificReader<T>.Read(T reuse, Decoder dec)
at Confluent.SchemaRegistry.Serdes.SpecificDeserializerImpl<T>.Read(DatumReader<T> datumReader, Decoder decoder)
at Confluent.SchemaRegistry.Serdes.SpecificDeserializerImpl<T>+<Deserialize>d__7.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task, ConfigureAwaitOptions options)
at Confluent.SchemaRegistry.Serdes.SpecificDeserializerImpl<T>+<DeserializeAsync>d__6.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task, ConfigureAwaitOptions options)
at Confluent.SchemaRegistry.Serdes.AvroDeserializer<T>+<DeserializeAsync>d__7.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task, ConfigureAwaitOptions options)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable<TResult>+ConfiguredTaskAwaiter.GetResult()
at Confluent.Kafka.SyncOverAsync.SyncOverAsyncDeserializer<T>.Deserialize(ReadOnlySpan<T> data, bool isNull, SerializationContext context)
at Confluent.Kafka.Consumer<TKey, TValue>.Consume(int millisecondsTimeout) --- End of inner exception stack trace ---
at Confluent.Kafka.Consumer<TKey, TValue>.Consume(int millisecondsTimeout)
at Confluent.Kafka.Consumer<TKey, TValue>.Consume(CancellationToken cancellationToken)
...
My guess is that the parser which is unaware of the ts_ms field is trying to parse name where the message actually contains ts_ms before the name.
Replacing Avro namespaces
I tried to adjust the namespaces of the generated C# classes so that each topic and its dependent schemas are under a special namespace. For example, the namespace Coffeeandco.Schemas.Companies would contain the value.cs and Source.cs for the companies topic, and so on. The command:
// ------------------------------------------------------------------------------// <auto-generated>// Generated by avrogen, version 1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e// Changes to this file may cause incorrect behavior and will be lost if code// is regenerated// </auto-generated>// ------------------------------------------------------------------------------namespaceCoffeeandco.Schemas.Companies{usingSystem;usingSystem.Collections.Generic;usingSystem.Text;usingglobal::Avro;usingglobal::Avro.Specific;[global::System.CodeDom.Compiler.GeneratedCodeAttribute("avrogen","1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e")]publicpartialclassSource:global::Avro.Specific.ISpecificRecord{// even this string now contains the Coffeeandco.Schemas.Companies namespace... huh...publicstaticglobal::Avro.Schema_SCHEMA=global::Avro.Schema.Parse("""{"type":"record","name":"Source","namespace":"Coffeeandco.Schemas.Companies","fields":[{"name":"version","type":"string"},{"name":"ts_ms","type":"long"},{"name":"name","type":"string"}],"connect.name":"io.debezium.connector.postgresql.Source"}""");privatestring_version;privatelong_ts_ms;privatestring_name;publicvirtualglobal::Avro.SchemaSchema{get{returnSource._SCHEMA;}}publicstringversion{get{returnthis._version;}set{this._version=value;}}publiclongts_ms{get{returnthis._ts_ms;}set{this._ts_ms=value;}}publicstringname{get{returnthis._name;}set{this._name=value;}}publicvirtualobjectGet(intfieldPos){switch(fieldPos){case0:returnthis.version;case1:returnthis.ts_ms;case2:returnthis.name;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Get()");};}publicvirtualvoidPut(intfieldPos,objectfieldValue){switch(fieldPos){case0:this.version=(System.String)fieldValue;break;case1:this.ts_ms=(System.Int64)fieldValue;break;case2:this.name=(System.String)fieldValue;break;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Put()");};}}}
Notice that even the namespace in the Avro schema string (_SCHEMA field) changed to Coffeeandco.Schemas.Companies. That seems to be a problem, because the consumer now fails with:
Confluent.Kafka.ConsumeException: Local: Value deserialization error
---> Avro.AvroException: Schema mismatch
...
My guess is that we changed the Avro schema, so understandably, the Avro parser is complaining.
Revert _SCHEMA namespaces
Next I tried reverting the namespaces in the generated class' _SCHEMA field back to what is in schema registry.
Manually changing the Source.cs file to:
// ------------------------------------------------------------------------------// <auto-generated>// Generated by avrogen, version 1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e// Changes to this file may cause incorrect behavior and will be lost if code// is regenerated// </auto-generated>// ------------------------------------------------------------------------------namespaceCoffeeandco.Schemas.Companies{usingSystem;usingSystem.Collections.Generic;usingSystem.Text;usingglobal::Avro;usingglobal::Avro.Specific;[global::System.CodeDom.Compiler.GeneratedCodeAttribute("avrogen","1.12.0+8c27801dc8d42ccc00997f25c0b8f45f8d4a233e")]publicpartialclassSource:global::Avro.Specific.ISpecificRecord{// changed namespace here in the string...publicstaticglobal::Avro.Schema_SCHEMA=global::Avro.Schema.Parse("""{"type":"record","name":"Source","namespace":"io.debezium.connector.postgresql","fields":[{"name":"version","type":"string"},{"name":"ts_ms","type":"long"},{"name":"name","type":"string"}],"connect.name":"io.debezium.connector.postgresql.Source"}""");privatestring_version;privatelong_ts_ms;privatestring_name;publicvirtualglobal::Avro.SchemaSchema{get{returnSource._SCHEMA;}}publicstringversion{get{returnthis._version;}set{this._version=value;}}publiclongts_ms{get{returnthis._ts_ms;}set{this._ts_ms=value;}}publicstringname{get{returnthis._name;}set{this._name=value;}}publicvirtualobjectGet(intfieldPos){switch(fieldPos){case0:returnthis.version;case1:returnthis.ts_ms;case2:returnthis.name;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Get()");};}publicvirtualvoidPut(intfieldPos,objectfieldValue){switch(fieldPos){case0:this.version=(System.String)fieldValue;break;case1:this.ts_ms=(System.Int64)fieldValue;break;case2:this.name=(System.String)fieldValue;break;default:thrownewglobal::Avro.AvroRuntimeException("Bad index "+fieldPos+" in Put()");};}}}
Now there's no schema mismatch, but the consumer still fails:
Confluent.Kafka.ConsumeException: Local: Value deserialization error
---> Avro.AvroException: Unable to find type 'staging.cdc.procore.public.companies.compact.Value' in all loaded assemblies in field before
---> Avro.AvroException: Unable to find type 'staging.cdc.procore.public.companies.compact.Value' in all loaded assemblies
...
For whatever reason, the code from the Apache.Avro nuget package tries to find the class in loaded assemblies using the namespace from the _SCHEMA field and is unable to do that when that namespace differs from the C# class namespace.
What next?
I know I can always use a generic consumer instead, but I really do want to take advantage of schema registry and the fact that our company uses it. How can I make this work? Thanks in advance for any advice.
The text was updated successfully, but these errors were encountered:
michaldivisprocore
changed the title
Question: consuming Avro topics with nested schema differences
Question: avrogen for topics with nested schema differences
Feb 9, 2025
Hi,
Hopefully, this is the right place to ask this.
I'm trying to consume multiple CDC Avro topics, each of their schemas defines a nested
source
schema, but thesource
definitions differ between topics. I haven't found a way to generate the C# schemas from.avsc
files in a way where all of the consumers worked.Note that I'm not in control of producing to or managing these topics. I can't change anything about them.
The schemas and other code used in the example below is fictional but roughly represents the real schemas I'm dealing with.
Companies schema:
Projects schema:
Generating the schema for companies with
avrogen -s schema-companies.avsc .
creates value.cs and Source.cs files. The value.cs file isn't problematic, so I'm leaving it out.source.cs
Generating the C# classes for projects with
avrogen -s schema-projects.avsc .
replaces the Source.cs with:Source.cs
It overwrites the companies'
Source
class with the projects'Source
class as both sources share the same name and namespace in the Avro schema, but now the remainingSource
class doesn't have thets_ms
field, which makes the companies consumer fail on deserialization with an error similar to:My guess is that the parser which is unaware of the
ts_ms
field is trying to parsename
where the message actually containsts_ms
before thename
.Replacing Avro namespaces
I tried to adjust the namespaces of the generated C# classes so that each topic and its dependent schemas are under a special namespace. For example, the namespace Coffeeandco.Schemas.Companies would contain the value.cs and Source.cs for the companies topic, and so on. The command:
This produces Source.cs that look like:
Notice that even the namespace in the Avro schema string (
_SCHEMA
field) changed toCoffeeandco.Schemas.Companies
. That seems to be a problem, because the consumer now fails with:My guess is that we changed the Avro schema, so understandably, the Avro parser is complaining.
Revert _SCHEMA namespaces
Next I tried reverting the namespaces in the generated class'
_SCHEMA
field back to what is in schema registry.Manually changing the Source.cs file to:
Now there's no schema mismatch, but the consumer still fails:
For whatever reason, the code from the Apache.Avro nuget package tries to find the class in loaded assemblies using the namespace from the
_SCHEMA
field and is unable to do that when that namespace differs from the C# class namespace.What next?
I know I can always use a generic consumer instead, but I really do want to take advantage of schema registry and the fact that our company uses it. How can I make this work? Thanks in advance for any advice.
The text was updated successfully, but these errors were encountered: