Sort
Sort component sort the data in ascending or descending order according to the key specified.
By default sorting is done in ascending order. To make the flow in descending order the descending radio button has to be clicked.
In the parameter max-core value is required to be specified. Though there is a default value, it recommended to use $ variable which is defined in the system [$MAX_CORE, $MAX_CORE_HALF etc].
Sort within groups
Sort within Groups refines the sorting of data records already sorted according to one key specifier: it sorts the records within the groups formed by the first sort according to a second key specifier.
In parameter part there are two sort keys
1) major key: it is the main key on which records are already sorted.
2) minor key : If the records are already sorted according to major key, according to minor key records are resorted within major key group.
Partition by key and sort
Previously it was mentioned a partition by key component is generally followed by a sort component. If the partitioning key and sorting key is the same instead to using those two components partition by key and sort component should be used
In this component also key and max-core value has be mentioned as per same rule of sort component
Dedup Sorted
Dedup Sorted separates one specified data record in each group of data records from the rest of the records in the group i.e. removes duplicate records from the flow according to key specified.
The duplicate records from a flow can be removed by three ways by this component by mentioning the keep parameter.
1) first: This default the value. This implies the first record of the duplicates ( i.e. same key value) will be kept
2) last: This implies the last record of the duplicates ( i.e. same key value) will be kept
3) unique-only: In this case all the duplicate records will be removed
The above picture suggest where to fix different parameters for dedup sorted component
Checkpoint Sort
Checkpointed Sort sorts and merges data records, inserting a checkpoint between the sorting and merging phases depending on the key specified.
It is the sub-graph containing two components “partial sort” and “merge runs”. But neither of the components works individually.
Sort component sort the data in ascending or descending order according to the key specified.
By default sorting is done in ascending order. To make the flow in descending order the descending radio button has to be clicked.
In the parameter max-core value is required to be specified. Though there is a default value, it recommended to use $ variable which is defined in the system [$MAX_CORE, $MAX_CORE_HALF etc].
Sort within groups
Sort within Groups refines the sorting of data records already sorted according to one key specifier: it sorts the records within the groups formed by the first sort according to a second key specifier.
In parameter part there are two sort keys
1) major key: it is the main key on which records are already sorted.
2) minor key : If the records are already sorted according to major key, according to minor key records are resorted within major key group.
Partition by key and sort
Previously it was mentioned a partition by key component is generally followed by a sort component. If the partitioning key and sorting key is the same instead to using those two components partition by key and sort component should be used
In this component also key and max-core value has be mentioned as per same rule of sort component
Dedup Sorted
Dedup Sorted separates one specified data record in each group of data records from the rest of the records in the group i.e. removes duplicate records from the flow according to key specified.
The duplicate records from a flow can be removed by three ways by this component by mentioning the keep parameter.
1) first: This default the value. This implies the first record of the duplicates ( i.e. same key value) will be kept
2) last: This implies the last record of the duplicates ( i.e. same key value) will be kept
3) unique-only: In this case all the duplicate records will be removed
The above picture suggest where to fix different parameters for dedup sorted component
Checkpoint Sort
Checkpointed Sort sorts and merges data records, inserting a checkpoint between the sorting and merging phases depending on the key specified.
It is the sub-graph containing two components “partial sort” and “merge runs”. But neither of the components works individually.