How and Why to Add a Source Code Repository to Azure Data Factory
For developers, it's very beneficial to have a source code repository. Asource code repository helps to keep all your changes, to manage tasks,branches, share the code with a team and simply put, to keep it in safeplace.
In this post, I’ll tell you why you should connect your AzureData Factories to a source code repository, and I’ll demo how to do so:
- To do this, I’ll start in my data factory inside my Azure portal.
- When I go into Author & Monitor, I have the ability to either:
- Set up a code repository within the landing page or main page here or,
- I can go directly into my data factory and I can add it in theleft-hand corner and pull down to where it says, ‘set up coderepository’.
- One thing to note is the code repository itself has been supportedfor a while, but recently with the release of Data Flows, they’re nowsupporting GitHub within the repositories as well (previously is wasonly Azure DevOps).
- So, I select my GitHub account and fill in the information. *Ifyou’re doing this for the first time, it’s going to prompt you to loginto your GitHub account when you do this. In this case, I’ve alreadypreviously connected this so it’s going to know about my repositories.
- Next, I select my repository name and I’ll go to my playground branch and I’m going to use my existing playground.
- One field will ask me ‘Branch to import resources into:’ so if I’mimporting resources, I can select an existing one or create a new one.For this demo I’m going to pick my playground.
- Before I hit Save, notice on the left-hand side I’ve got zeropipelines, one data set and zero data flows. But when I connect to myplayground, it’s going to bring everything in I’ve previously connectedto within any of my areas I’ve saved up into or checked my code intoanything in that playground branch. So, now you’ll see I have 6pipelines, 23 data sets and 4 data flows.
- One of the other nice pieces of being able to add source control isif I want to add a new data set. I just select my SQL Server I waspreviously connected to; I leave it on default for now and connect in.
- I then select one of the tables; I selected the Product Category Table.
- You’ll see at the top you have the option to Save All or Publish.Save All is going to save any of the changes across the tabs, so you cantell when there’s been a change, whether it be data set or pipeline ordata flow by having a star next to the name.
- Now instead of needing to publish every time you’re doingdevelopment, you can just save it and it will save here. So, rather thanhaving to publish the entire pipeline and do any of the error checkingand make sure the pipeline is in good standing, you can now just savepart way through. This is a huge advantage over having to publish theentire pipeline which could cause some challenges which might not beefficient for development and such.
The new features of GitHub being added in gives us another great opportunity if you didn’t previously use Azure Dev Ops (formerly known as Visual Studio Team Services). I’ll be doing more upcoming blogs around Data Factory in Azure Every Day that will be beneficial to you with some of the nuances as the product has greatly evolved since releasing Data Flows.