1、全量抽取模板
sqoop import \
-Doracle.sessionTimeZone=Asia/Shanghai \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.map.cpu.vcores=4 \
-Dmapreduce.map.memory.mb=3096 \
-Dmapred.child.java.opts=-Xmx3096m \
--null-string '\\N' \
--null-non-string '\\N' \
--hive-drop-import-delims \
--connect jdbc:oracle:thin:@//xxxxx.xx.xxx.xx:1521/orcl \
--username jkxxda \
--password xxx\
--table JKDA.HYPERTENSIONVISITING \
--delete-compile-dir \
--hive-overwrite \
--hcatalog-database stage \
--hcatalog-table stg_xxx_d \
--hcatalog-partition-keys day \
--hcatalog-partition-values 2022-11-08 \
--hcatalog-storage-stanza 'stored as orc' \
-m 2
2、按条件增量抽取模板
hive -e "
alter table ${lake_database}.${lake_table} drop partition(day='${biz_date}');
"
sqoop import \
-Doracle.sessionTimeZone=Asia/Shanghai \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.map.cpu.vcores=4 \
-Dmapreduce.map.memory.mb=3096 \
-Dmapred.child.java.opts=-Xmx3096m \
--null-string '\\N' \
--null-non-string '\\N' \
--connect jdbc:oracle:thin:@//xxxxx.xx.xxx.xx:1521/orcl \
--query "SELECT ${oracle_columns} FROM JKDA.UP WHERE rownum<=10 AND \$CONDITIONS" \
--split-by '${split-by}' \
--username xxx\
--password xxxxx\
--hive-overwrite \
--hcatalog-database '${lake_database}' \
--hcatalog-table '${lake_table}' \
--hcatalog-partition-keys day \
--hcatalog-partition-values '${biz_date}' \
--hcatalog-storage-stanza 'stored as orc' \
-m 1
注意
- 1、当 -m 参数大于1是后要与 split-by 配合使用,split-by 字段尽量选择int 数值类型
- 2、$CONDITIONS 为sqoop脚本固定书写格式,'' 是转义符号
- 3、--hive-overwrite 再测试时不生效,建议单独写一个 HQL处理前删除对应分区数据