Dataflow pipeline with runtime arguments runs well using DirectRunner, but encounters argument error when switching to DataflowRunner.
File "/home/user/miniconda3/lib/python3.8/site-packages/apache_beam/options/pipeline_options.py", line 124, in add_value_provider_argument
self.add_argument(*args, **kwargs)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1386, in add_argument
return self._add_action(action)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1749, in _add_action
self._optionals._add_action(action)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1590, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1400, in _add_action
self._check_conflict(action)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1539, in _check_conflict
conflict_handler(action, confl_optionals)
File "/home/user/miniconda3/lib/python3.8/argparse.py", line 1548, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --bucket_input: conflicting option string: --bucket_input
Here is how the argument defined and called
class CustomPipelineOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--bucket_input',
default="device-file-dev",
help='Raw device file bucket')
pipeline = beam.Pipeline(options=pipeline_options)
custom_options = pipeline_options.view_as(CustomPipelineOptions)
_ = (
pipeline
| 'Initiate dataflow' >> beam.Create(["Start"])
| 'Create P collection with file paths' >> beam.ParDo(
CreateGcsPCol(input_bucket=custom_options.bucket_input)
)
Notice this only happens with DataflowRunner. Anyone knows how to solve it? Thanks a lot.
question from:
https://stackoverflow.com/questions/65945352/gcp-dataflow-argparse-argumenterror-using-dataflowrunner-but-not-directrunner 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…