AB INITIO TUTORIALS

Best online resource for Ab Initio Tutorial Tutorials

Ab Initio

A partition is a file that is a portion of a multifile. 2. A partition is a segment of a parallel computation. To partition data is to divide it into segments, so the data can run in parallel. Some components partition data. There are number of partition components likely” partition by


Partition by key

Partition by Key reads records from the in port and distributes data records to its output flow partitions according to key values.



In the parameter field key has to be mentioned

A partition by key component is generally followed by a sort (shall be discussed later)component

See the example below



[In the above example in Join component sort parameter is used as input must be sorted ]


Partition by Round Robin

Partition by round-robin distributes blocks of data records evenly to each output flow in round-robin fashion. Partitioning key is not required.
The difference between Partition by Key and Partition by Round Robin is the 1st one may not distribute data uniformly across the all partition in a multi file system but the latter does.

Partition by Expression

Partition by Expression distributes data records to its output flow partitions according to a specified expression.



In the function parameter we need to mention the required expression
For example
((next_in_sequence()*number_of_partition() + this_partition())/number_of_partition)/1000
expression will distribute all the records in block of 1000 records in round robin fashion across all partition
For example
if (record_sub_typ=="cg1") 0
else if (record_sub_typ=="cg2") 1
else 3
expression suggess all the records having value record_sub_typ is “cg1” will pass through flow 0 and if value record_sub_typ is cg2all the records will pass through flow 1 else rest of the records will pass through flow 2.


Partition by Range

Partition by Range distributes data records to its output flow partitions according to the ranges of key values specified for each partition. This component is not frequently used
Use the same key specifier for both components.
Make the number of partitions on the flow connected to the out port of Partition by Range the same as the value (n) in the num_partitions parameter of Find Splitters.

This component
Reads splitter records from the split port, and assumes that these records are sorted according to the key parameter.
Determines whether the number of flows connected to the out port is equal to n (where n-1 represents the number of splitter records).If not, Partition by Range writes an error message and stops the execution of the graph.
Reads data records from the flows connected to the in port in arbitrary order.
Distributes the data records to the flows connected to the out port according to the values of the key field(s), as follows:
a) Assigns records with key values less than or equal to the first splitter record to the first output flow.
b) Assigns records with key values greater than the first splitter record, but less than or equal to the second splitter record to the second output flow, and so on.



Partition with Load Balance

Partition with Load Balance distributes data records to its output flow partitions, writing more records to the flow partitions that consume records faster. This component is not frequently used.

Related Posts Plugin for WordPress, Blogger...
Click
For Special
Download