In _tc_free_children_internal() there is a call to talloc_parent_chunk() if there are references. If this can be moved to after the _tc_free_children_internal() fails, then this call is much faster if there are (eg) 100,000 siblings, as can happen in python due to large number of pytalloc objects.
Bug #14710 needs this otherwise the test is just a CPU spinner