Azure Data Factory- Copy data from on-perm SFTP server to Azure blob.

Sangam Dubey
7 min readJul 14, 2020

Azure data factory is a powerful Integration tool which provides many options to play with your data.

In this article we will see how easily we can copy our data from on-perm sftp server to Azure blob storage. Below are the steps for completing this activity.

  1. Integration Runtime creation
  2. DataSet creation
  3. Creating your copy pipeline.

1.Integration Runtime creation:

Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the data integration capabilities across different network environments.

Data Factory offers three types of Integration Runtime (IR) and we need to choose one based on our requirement.

So whenever we have to do any Data operation in Azure public network then we can choose Azure provided IR but when we are dealing with Private network(on-perm data source or any private network environment) then we have to create self-hosted IR.

The Reason for this is , IR is the component which works as a mediator between data factory source and destination. Any request/response travel through IR and IR provides the native compute for any data movement and transformation.

IR itself is a big topic to discuss but here I will keep it simple and we will focus on our main goal to copy data from SFTP to Azures blob.

Different data copy scenario and Integration Runtime selection

So in our scenario we have our data source in On-Perm SFTP server so we need to setup our Integration Runtime. To do so we need a windows machine where we can install our IR setup. This windows machine can be your existing windows server or you can launch a new window VM in Azure and setup your IR.

To setup IR on your windows machine we need to follow below steps:

a. On the Let’s get started page of Azure Data Factory UI, select the Manage tab from the leftmost pane.

b. Select Integration runtimes on the left pane, and then select +New.

c. On the Integration runtime setup page, select Azure, Self-Hosted, and then select Continue

d. Enter a name for your IR, and select Create.

e. On the Integration runtime setup page, select the link under Option 1 to open the express setup on your computer. Or follow the steps under Option 2 to set up manually.

For Option1, click on the link and download the .exe file on your windows machine where you want to setup your IR and run the executable.

For Option2, copy thee key and then Select Download and install integration runtime.

Run the installer and then paste the key and then click Register.

At the end click Finish and you will see your setup is done.

You can also follow this link where you can see details for IR installation:

Note: Once your IR is up and running then you have to make sure you can connect to your private network resource from this machine. In our scenario our source is On-perm SFTP so we need to first test if this windows machine where our IR is running is able to connect to SFTP server. This is very important step as all your request to SFTP server will be routed from this machine. So make necessary changes from your network level to allow this machine to connect to SFTP server.

2. DataSet creation :

In our scenario our data sets are: On-Perm SFTP server & Azure blob storage.

To create these datasets in Azure Data Factory, first we need to create linked services. So we will first create two linked services, one for SFTP and one for azure blob.

Linked service for SFTP:

a. On the Let’s get started page of Azure Data Factory UI, select the Manage tab from the leftmost pane.

b. On the Linked services page, select +New to create a new linked service.

c. Select SFTP and click continue.

d. Enter details for your SFTP server and choose your self hosted IR from drop down.

Once all information provided then click on “Test connection”, If all your setup is correct then your connection will be successful and you will be able to connect to your SFTP server.

Linked service for Azure Blob:

a. From Linked services page, select +New to create a new linked service and then select Azure Blob Storage.

b. Enter details of your blob storage where you need to copy your data and click on Test Connection to verify your connection is working fine.

SFTP DataSet:

a. To create your SFTP dataset, goto your pipeline section and then click

DataSets->New dataset

b. Select SFTP and click continue.

c. Select filetype of your SFTP server. In my case I am copying .pgp files which are currently not supported by Azure Data Factory so I am choosing Binary to copy my file as is.

d. And at the end give your dataset name and select the Integration Runtime which we created at 1st step.

Blob Storage DataSet:

Similar way we will create a dataset for Azure blob storage account.

3. Data Factory copy pipeline:

Once your data sets are ready, you have a source configured (from where you want to copy your data) and your destination. Now we need to create a copy pipeline which will allow us to copy data from our source to destination.

To create your pipeline follow below steps.

a. From Data Factory pipeline option click on new pipeline.

b. In the right side panel give name to your pipeline.

c. From left side from activities tab expand “Move & transform” option and drag and drop copy data activity.

c. Click on your “Copy data” activity and down you will see details about your activity. First in “General” section provide general information about your activity like name, description etc.

d. Now click on the source and select your source dataset which is SFTP dataset. We have multiple option to select our files from source. I have selected “Wildcard file path” and given directory name and from that directory copying all .pgp files.

d. Now click on the sink option and select your Azure blob storage dataset.

You are all ready to start your copy.

Now to test your pipeline click on debug button at top and if all your setup is correct your pipeline will run successfully.

Once everything is tested and validate publish your pipeline and you can create triggers to execute your pipeline.

Happy Learning :) !!!

--

--