A Enjoyable Threading State of affairs With da.Walk

This web page was created programmatically, to learn the article in its authentic location you may go to the hyperlink bellow:
https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/td-p/1696136/jump-to/first-unread-message
and if you wish to take away this text from our website please contact us


So I’ve some library code that handles loading in file databases and indexing the contained information. There are then grouped out into dictionaries so I iteratively use da.Walk to extract every datatype from the dataset.

This can in fact be fairly sluggish, particularly when loading in a database that is on a community fileserver. No drawback although, that is one thing that may be solved fairly just by creating some threads utilizing concurrent.futures.ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor, as_completed
from arcpy.da import Walk
from pathlib import Path

def stroll(ds: str, dtype: str | None = None):
    """walk a dataset filtering on the supplied datatype"""
    paths: listing[Path] = []
    for root, _, gadgets in Walk(ds, datatype=dtype):
        for itm in gadgets:
            paths.append(Path(root)/itm)
    return paths

def extract_types(ds: str, dtypes: listing[str]) -> dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    information: dict[str, list[Path]] = {}
    with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
        futures = {executor.submit(stroll, ds, dtype): dtype for dtype in dtypes}
        for future in as_completed(futures):
            information[futures[future]] = future.end result()
    return information

 

Seems easy sufficient, spool up one thread per Walk name and await the outcomes to allow them to be accomplished concurrently, lets run it:

>>> extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [], 'Table': []}

 

Hmmm. There’s no output, however there’s positively each Tables and Feature Classes in that gdb… Let’s strive a syncronous extract methodology:

def extract_types_sync(ds: str, dtypes: listing[str]) -> dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    return {
        dtype: stroll(ds, dtype)
        for dtype in dtypes
    }

 

And run that:

>>> extract_types_sync("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}

 

Okay, so there IS information within the database, and Walk is ready to discover it. Lets strive the concurrent model once more:

>>> extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}

 

So now the concurrent model is ready to discover the info,  however solely **after** working a Walk syncronously? This little bug persists by way of interpreter periods it appears. So let’s examine if warming up the Walk operate can repair it:

def extract_types(ds: str, dtypes: listing[str]) -> dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    for _ in Walk(ds): break
    information: dict[str, list[Path]] = {}
    with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
        futures = {executor.submit(stroll, ds, dtype): dtype for dtype in dtypes}
        for future in as_completed(futures):
            information[futures[future]] = future.end result()
    return information

 

And run yet one more time:

>>> extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}

 

Now it really works! This is basically odd although. I’m guessing that da.Walk depends on some international state that is not initialized in a sub thread and have to be initialized in the primary thread. This is unquestionably odd habits although, and I figured that I’d share it right here in case anybody else occurs to run into it. I’m additionally curious how this sample will work when 3.14 is adopted and now we have entry to the InterpreterPoolExecutor. Will the arcpy international state should be shared for features so simple as da.Walk?


This web page was created programmatically, to learn the article in its authentic location you may go to the hyperlink bellow:
https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/td-p/1696136/jump-to/first-unread-message
and if you wish to take away this text from our website please contact us