我正在开发Spring Boot v2.2.5.RELEASE和Spring Batch example。在本例中,我使用JdbcPagingItemReader
从一个数据中心的Postgres
系统中读取了500万条记录,并将其写入到另一个数据中心的MongoDB
中。
此迁移速度太慢,需要提高此批处理作业的性能。我不确定如何使用分区,因为我在表中有一个PK保存UUID值,所以我不能考虑使用ColumnRangePartitioner
。有没有实现这一点的最好方法?
方法1:
@Bean
public JdbcPagingItemReader<Customer> customerPagingItemReader(){
// reading database records using JDBC in a paging fashion
JdbcPagingItemReader<Customer> reader = new JdbcPagingItemReader<>();
reader.setDataSource(this.dataSource);
reader.setFetchSize(1000);
reader.setRowMapper(new CustomerRowMapper());
// Sort Keys
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("cust_id", Order.ASCENDING);
// POSTGRES implementation of a PagingQueryProvider using database specific features.
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("from customer");
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
然后是Mongo编写器,我使用Spring Data Mongo作为自定义编写器:
作业详细信息
@Bean
public Job multithreadedJob() {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<Transaction, Transaction>chunk(100)
.reader(fileTransactionReader(null))
.writer(writer(null))
.taskExecutor(taskExecutor)
.build();
}
方法2: AsyncItemProcessor和AsyncItemWriter将是更好的选择,因为我仍然必须使用相同的JdbcPagingItemReader阅读?
方法3:分区,如何在PK为UUID的地方使用它?
转载请注明出处:http://www.yojatech.com/article/20230526/1524203.html