-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
We encountered an issue while generating data in Databricks using the following schema configuration:
[{"column_name":"id","data_type":"int","options":{"minValue":1,"step":1,"random":false,"percentNulls":0}},{"column_name":"name","data_type":"string","options":{"words":[2,10],"random":true,"percentNulls":0}},{"column_name":"dob","data_type":"date","options":{"begin":"1990-02-20","end":"2023-01-10","random":false,"percentNulls":0}}]
An exception was raised in the Python worker while calling the pandasGenerateText method in dbldatagen (version 0.4.0). The error occurs when the numpy.clip function attempts to cast output from float64 to uint8 using the same_kind casting rule, leading to a UFuncOutputCastingError.
Stack Trace:
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbldatagen/text_generators.py", line 881, in pandasGenerateText
results = self.generateText(rows, rows.size)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbldatagen/text_generators.py", line 768, in generateText
para_stats = np.clip(para_stats_raw, self._minValues, self._maxValues, out=stats_array)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2169, in clip
return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 68, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 45, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/numpy/core/_methods.py", line 99, in _clip
return um.clip(a, min, max, out=out, **kwargs)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'clip' output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'
The numpy.clip operation should handle type conversion correctly or adjust the target type to avoid casting conflicts.
Environment
Databricks: 14.3 LTS (Apache Spark 3.5.0, Scala 2.12)
Python: 3.10
dbldatagen: 0.4.0
numpy: 1.26.4
Metadata
Metadata
Assignees
Labels
No labels