Put Blob REST API Content-Length error

Jez Walters 20 Reputation points
2024-05-07T10:06:05.8133333+00:00

I'm trying to write the XML response obtained from a REST API GET request to a file in Azure Blob Storage, using Azure Data Factory.

I'd prefer to use the Copy activity to achieve this, but it doesn't support XML sinks. I've therefore been forced into using separate Web activities, to get the response and then write it to Blob storage.

I have 3 activities in my pipeline:

  1. A Web activity, to issue my REST API GET request
  2. A Set Variable activity, to assign the response obtained from step 1 to the variable called "Body"
  3. A Web activity, to issue a REST API PUT request (using the "Body" variable from step 2)

This approach works fine for small responses but when I try doing the same for larger responses (around 1MByte), I get the following error:

Error calling the endpoint 'https://mystorageaccount.blob.core.windows.net'. Response status code: 'NA - Unknown'. More details: Exception message: 'NA - Unknown [ClientSideException] Bytes to be written to the stream exceed the Content-Length bytes size specified.'. Request didn't reach the server from the client. This could happen because of an underlying issue such as network connectivity, a DNS failure, a server certificate validation or a timeout

What am I doing wrong, or is there a better way of achieving my goal (but still using Azure Data Factory)?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,470 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,694 questions
{count} votes

6 answers

Sort by: Most helpful
  1. Jez Walters 20 Reputation points
    2024-05-08T11:16:16.8766667+00:00

    I may be misunderstanding you or doing something wrong, but I've just created a new Azure Blob Storage Dataset (selecting the XML format), and I've also created a new Azure Blob Storage Linked Service for the Dataset to connect through. However, the new Dataset isn't available to select as a Sink for the Copy activity.

    The following article states that the XML format "is supported as source but not sink" in ADF pipelines:

    https://learn.microsoft.com/en-us/azure/data-factory/format-xml#xml-connector-behavior

    Please bear in mind that I'm not trying to use ADF to shred the XML in the response to my REST API GET request. I only want to store the response "as-is", as the shredding will be performed downstream (by a stored procedure in my Azure SQL Database).

    0 comments No comments

  2. Jez Walters 20 Reputation points
    2024-05-08T11:22:32.8433333+00:00

    Given how I've configured my REST API "Put Blob" PUT request, how can the number of bytes in the Body (to be written to the stream) exceed the specified Content-Length in the Header, if they're both derived from the same ADF pipeline variable?

    Please can you also cite where the Content-Length limit you mentioned is documented?

    0 comments No comments

  3. Jez Walters 20 Reputation points
    2024-05-08T11:33:30.3733333+00:00

    I've tried increasing the timeout for the REST API "Put Blob" PUT request, but I'm still getting the same error as previously.

    Although it is technically possible for me to chop the XML response from my REST API GET request into smaller chunks, this isn't straight-forward as each file still needs to be well-formed XML.

    0 comments No comments

  4. Anand Prakash Yadav 6,390 Reputation points Microsoft Vendor
    2024-05-09T10:31:33.0966667+00:00

    Hello Jez Walters

    Thank you for posting your query here!

    As per the documentation, XML format is supported as a source but not as a sink in ADF pipelines. Regarding the Content-Length limit, the documentation on Azure Blob Storage does not specify a maximum Content-Length, but it does mention that the maximum block size for a BlockBlob is 100 MB, and the maximum size for a BlockBlob is approximately 4.75 TB.

    To address the issue of the Content-Length limit, you may need to split the XML response into smaller chunks before writing them to Azure Blob Storage. Each chunk can still be well-formed XML.

    If your XML response is within these limits, the issue might not be with Azure Blob Storage's limitations but rather with how the Content-Length is being computed or how the data is being streamed during the PUT operation.

    However, unfortunately, without direct access to your Azure Data Factory configuration and pipeline code, it's challenging to provide a specific solution. If you have reviewed your pipeline configuration, including the mappings and transformations applied to the response data, and ensuring that the Content-Length header is accurately calculated, we might need a deeper investigation into the issue.

    If the issue persists, we will require some details from your end. I have requested the same in the 'Private Message' section. Kindly check it and get back to us.

    I hope this helps! Please let me know if you have any other questions or need further clarification.

    0 comments No comments

  5. Jez Walters 20 Reputation points
    2024-05-09T11:26:22.7233333+00:00

    Thanks for your response, but I'm nowhere near the Blob size limits you've mentioned, as I'm only trying to write around 1MByte of XML data to Blob storage.

    Although the problem appears to be related to the size of the Blob (as I can successfully write smaller blobs), it isn't related to me exceeding the maximum Blob size. It may be an issue with how the Blob is being streamed (which I have no control over) - although this is only a wild guess at this stage.

    Splitting the XML data I'm receiving from my REST API GET request into smaller chunks isn't trivial. I would need to ensure that each chunk is well-formed, so it wouldn't be a simple matter of cutting the XML text after some arbitrary number of characters. I also note that the chunk size I would need isn't documented anywhere, so I don't know when this would even be required (apart from noting failures).

    As I've already commented, I don't see how the Body of the REST API PUT request and the value of the Content-Length Header can't match, as they're both derived from the same pipeline variable (see my first follow-up post above).

    As I see it, this is a very simple scenario: issue a REST API GET request and write the XML response to Blob storage. There are no mappings or transformations in my pipeline, so this too is not the source of my problem.

    0 comments No comments