Hi,
While working on [1], I observed that extra memory is allocated in 'create_list_bounds'
function which can be avoided. So the attached patch removes extra memory
allocations done inside 'create_list_bounds' function and also removes the
unused variable 'cell'.
In the existing code, in create_list_bounds(),
- It iterates through all the partitions and for each partition,
- It iterates through the list of datums named 'listdatums'.
- For each non null value of 'listdatums', it allocates a memory for 'list_value' whose type is 'PartitionListValue' and stores value and index information.
- Appends 'list_value' to a list named 'non_null_values'.
- Allocates memory to 'all_values' variable which contains information of all the list bounds of all the partitions. The count allocated for 'all_values' is nothing but the total number of non null values which is populated from the previous step (1).
- Iterates through each item of 'non_null_values' list.
- It allocates a memory for 'all_values[i]' whose type is 'PartitionListValue' and copies the information from 'list_value'.
The above logic is changed to following,
- Call function 'get_non_null_count_list_bounds()' which iterates through all the partitions and for each partition, it iterates through a list of datums and calculates the count of all non null bound values.
- Allocates memory to 'all_values' variable which contains information of all the list bounds of all the partitions. The count allocated for 'all_values' is nothing but the total number of non null values which is populated from the previous step (1).
- Iterates through all the partitions and for each partition,
- It iterates through the list of datums named 'listdatums'.
- For each non null value of 'listdatums', it allocates a memory for 'all_values[i]' whose type is 'PartitionListValue' and stores value and index information directly.
The above fix, removes the extra memory allocations. Let's consider an example.
If there are 10 partitions and each partition contains 11 bounds including NULL value.
Parameters | Existing code | With patch |
Memory allocation of 'PartitionListValue' | 100+100 = 200 times | 100 times |
Total number of iterations | 110 + 100 = 210 | 110 + 110 = 220 |
As we can see in the above data, the total number of iterations are increased slightly
(When it contains NULL values. Otherwise no change) but it improves in case of
memory allocations. As memory allocations are costly operations, I feel we should
consider changing the existing code.
Please share your thoughts.
Thanks & Regards,
Nitin Jadhav