Use the copy activity to create a file of all BLOB file names in a container

Baker, Rick 0 Reputation points
2024-05-08T21:00:20.2666667+00:00

I have a blob container with approximately 100 image files. I want to get the name of each of those image files and copy the names to a single file.

I have tried to create pipelines using Get Metadata/Set Variable/Foreach/Copy and I just can't get it to work. I can get it to work on a container of delimited files, but not on a container of images.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,696 questions
{count} votes

2 answers

Sort by: Most helpful
  1. hossein jalilian 4,040 Reputation points
    2024-05-08T21:10:42.4+00:00

    Thanks for posting your question in the Microsoft Q&A forum.

    You can follow these steps:

    1. Create a new pipeline in Azure Data Factory.
    2. Add a Get Metadata activity to the pipeline. Configure it with the following settings:
      • Field: "Child items"
      • Source dataset: The dataset pointing to your BLOB container
      • Recursive: True
    3. Add a For Each loop activity to the pipeline.
    4. Inside the For Each loop, add a Copy Data activity.
      • Source dataset: A dataset pointing to a text file in your BLOB container ("file_names.txt")
      • Sink dataset: The same dataset as the source
      • Append data: True
    5. In the Copy Data activity, click on the Source tab and select Add dynamic content.
         @item().name
      
    6. Save the pipeline and trigger a run.

    After the pipeline run completes, you should find a file named "file_names.txt" in your BLOB container containing the names of all the image files.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful


  2. Smaran Thoomu 10,405 Reputation points Microsoft Vendor
    2024-05-13T14:07:20.3266667+00:00

    Hi @Baker, Rick

    Thanks for the question and using MS Q&A platform.

    To accomplish your objective, you'll need to begin by copying the list of files as an array. Then, you can utilize a data flow to transform it into a format that lists each file name row by row.

    First, use the Get Metadata activity to retrieve the list of files in your blob container. Then, filter the list to only include image files. Next, use a For Each loop to iterate through the filtered list and store the file names in an append variable.

    enter image description here

    Create a dummy file with only one value and use the Copy Activity to append the append variable value to it as an additional column.

    enter image description here

    Finally, use a data flow to transform the array of file names into a row-wise format. Use the Derived Column transformation to convert the array into a proper array format using the expression split(replace(replace(Images,'[',''),']',''),','). Then, use the Flatten transformation to transform the array into rows.

    Below is the pipeline JSON code:

    {
        "name": "pipeline1",
        "properties": {
            "activities": [
                {
                    "name": "Get Metadata1",
                    "type": "GetMetadata",
                    "dependsOn": [],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "Binary1",
                            "type": "DatasetReference"
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSettings",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "BinaryReadSettings"
                        }
                    }
                },
                {
                    "name": "Filter1",
                    "type": "Filter",
                    "dependsOn": [
                        {
                            "activity": "Get Metadata1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('Get Metadata1').output.childItems",
                            "type": "Expression"
                        },
                        "condition": {
                            "value": "@contains(item().name,'.jpg')",
                            "type": "Expression"
                        }
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "Filter1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('Filter1').output.value",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Append variable1",
                                "type": "AppendVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "filenames",
                                    "value": {
                                        "value": "@item().name",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "Copy data1",
                    "type": "Copy",
                    "dependsOn": [
                        {
                            "activity": "ForEach1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "source": {
                            "type": "DelimitedTextSource",
                            "additionalColumns": [
                                {
                                    "name": "ImageNames",
                                    "value": {
                                        "value": "@string(variables('filenames'))",
                                        "type": "Expression"
                                    }
                                }
                            ],
                            "storeSettings": {
                                "type": "AzureBlobStorageReadSettings",
                                "recursive": true,
                                "enablePartitionDiscovery": false
                            },
                            "formatSettings": {
                                "type": "DelimitedTextReadSettings"
                            }
                        },
                        "sink": {
                            "type": "DelimitedTextSink",
                            "storeSettings": {
                                "type": "AzureBlobStorageWriteSettings",
                                "copyBehavior": "MergeFiles"
                            },
                            "formatSettings": {
                                "type": "DelimitedTextWriteSettings",
                                "quoteAllText": true,
                                "fileExtension": ".txt"
                            }
                        },
                        "enableStaging": false,
                        "translator": {
                            "type": "TabularTranslator",
                            "mappings": [
                                {
                                    "source": {
                                        "name": "ImageNames",
                                        "type": "String"
                                    },
                                    "sink": {
                                        "name": "Images",
                                        "physicalType": "String"
                                    }
                                }
                            ],
                            "typeConversion": true,
                            "typeConversionSettings": {
                                "allowDataTruncation": true,
                                "treatBooleanAsNumber": false
                            }
                        }
                    },
                    "inputs": [
                        {
                            "referenceName": "DelimitedText1",
                            "type": "DatasetReference"
                        }
                    ],
                    "outputs": [
                        {
                            "referenceName": "DelimitedText2",
                            "type": "DatasetReference"
                        }
                    ]
                },
                {
                    "name": "Data flow1",
                    "type": "ExecuteDataFlow",
                    "dependsOn": [
                        {
                            "activity": "Copy data1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataflow": {
                            "referenceName": "dataflow1",
                            "type": "DataFlowReference"
                        },
                        "compute": {
                            "coreCount": 8,
                            "computeType": "General"
                        },
                        "traceLevel": "Fine"
                    }
                }
            ],
            "variables": {
                "filenames": {
                    "type": "Array"
                }
            },
            "annotations": []
        }
    }
    
    

    In the example provided above, I have used an image file in JPG format.

    This approach should allow you to create a file containing the names of all the image files in your blob container in a row-wise format.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments