Working with Microsoft Azure for 20 hours and why I will not use it again

Microsoft Azure Anger

Last weekend I attended a Hackathon at Microsoft. Overall it was an awesome experience and I had a lot of fun, so this post has nothing to do with the event itself and neither does it reflect my overall opinion on Microsoft. They do awesome stuff in a lot of fields, but with Azure, they are definitely underdelivering.

During the event, I started to get in contact with the Azure platform. Our project idea was to create a website where you can search for news and then via sentiment analysis this news would be sorted by “happiness”. The news search and sentiment analysis are offered via Azures so-called cognitive services that abstract the ML models away and you let you simply use an API for accessing those services....so far so good. With this premise most of you coders out there will have the thought: “This sounds too easy to fill 24h of programming”. Exactly what I thought...and was already thinking about also coding an Alexa skill and so on to fill the time. With two experienced developers, we thought the backend would be done in about 4h (conservative calculation) as it would only be stitching together three APIs and delivering that info to a JSON REST API for our frontend team. For keeping the fun up and having more learnings during the project we decided to do the backend as a serverless function. But then Azure got into our way...

In the end, it took us ~9h to develop the backend as a serverless function consisting of mainly of a 40 line JavaScript file we had to develop in the in-browser “editor” that Azure offers as all the other approaches we tried didn't work out and we ended up abandoning them. Once again: 9 hours for 40 lines of JS code stitching together three APIs...that is insane. (Btw at 3 am we decided to switch to GCP (google cloud platform) and that did the job in about 45 minutes)

So for sure we did things wrong and it could have been done faster, but this blog post is about the hard onboarding and overall bad structure of Azure. Please also keep in mind that Azure is still in a more-or-less early stage and not all of it is broken. In the following, I will walk you through the timeline of this disaster and suggestions I would have in mind to fix some of the most confusing steps. Actually, I will try to avoid these mistakes in my own future projects, so thanks Microsoft by showing me a way how not to do things xD

Just a bit more background: My partner in the backend had some experience with GCP and I do most of my current projects with AWS, so we did know how things work there...couldn't be too hard to transfer that knowledge to the Azure platform.

Start of the project

So first of all creating a new Azure account, that is not that hard and after entering credit card info you get 100$ of free credit. I actually like how Microsoft solved that here: You have two plans. You start with the 100$ free tier and if you spend all of that money you manually have to change to the pay-as-you-go plan. So that protects you of opening up an account, doing some testing, forgetting about it and then a month later you get a huge bill (happened to me with AWS). So that is nice for protecting new users that just start to test the system. Good job here Microsoft!

After setting up the account I created a new project and added some of the resources we needed. Creating a serverless function I recognized the tag “(Preview)” on the function I created but didn't think more about it...but actually, that sign should be something like Experimental/Do not use/Will most likely not work properly. We created a Python serverless function (apparently Python functions are still beta there) and tried to get some code in there.

There are three ways to get code into an azure function:

...for full-featured functions. As we selected the experimental/beta/preview functionality Python we only had the latter two options. Not that bad as it is the same for AWS and I am used to deploying my code via the AWS cmd...shouldn't be way harder with Azure.

My suggesting: Do not do publish functionality that is obviously not ready yet. Do internal testing instead of using your users for that task.

Azure plugins for VS code

Microsoft overs a wide range of VS code plugins for Azure. As that is my main editor anyways I wanted to give them a try. So for the functionality of serverless functions, you need the functions plugin and about 9 other mandatory ones that are some sort of base plugins. 50MB and three VS Code crashes later the required plugins were finally installed properly. The recommended login method did not work and I had to choose the method of authenticating via the browser instead. Not that big of a deal, but as they recommend the inline method one would think that should work. (Didn't work for the other folks in my team either...so it had nothing to do with my particular machine)

You would think that 500MB should be enough for finally being able to deploy some code...but you still need 200MB more for the Azure cli that is required for the plugins to work properly.

Finally having installed all of it you can see all your Azure functions and resources in VS code. I started to get a bit excited as it looked like from now on the development would be straight forward and easier as I am used to from AWS.

But that 700mb of code did not work properly....the most important function “deploy” failed without any detailed error message...AAAAAAARRRG. Why do I have to install all that crap and then it can't do the most basic task it has to do: get my code into their cloud.

Keep your tooling modular and try to do fewer things, but do them right

Code templates

A nice idea is that on creating a new serverless function Azure greets you with a basic boilerplate code example showing you how to handle the basic data interfaces.

It might have been because we selected the alpha functionality “Python”, that we didn't actually get Python code here but JavaScript. So your function is prepopulated with code that is not able to run because it is the wrong programming language. We were lucky and recognized that right away, but you could get really confusing error messages here if you then start developing in JS but actually having a Python runtime.

Better no boilerplate code than one in the wrong programming language

But at least it is colorful

So next try with the Azure CLI. The first thing that you recognize is that the CLI has all sorts of different colors...but that does not help if you are annoyed and want to get things done.

That is a thing you also see in the Azure web interface...it has got quite a few UX issues but they do have over five color themes that you can choose from for styling the UI...Microsoft I'm not sure if you set your priorities right here ;)

Also, the CLI did not get us where we wanted....either due to our own incompetence or due to the CLI itself, no clue. Either way, I would blame Azure as it is their job to help developers onboard and at least get basic tasks (we still only want to deploy a simple “hello world”) done in an acceptable time.

Focus less on making your UI shine in every color of the rainbow and try to improve documentation and onboarding examples

Full ownership of a resource still does not give you full privileges

After finally being able to deploy at least the “hello world” we wanted to go a step further...work concurrently on that project. Yes until now we mainly did pair programming on a single machine.

As I was the owner of that resource I also wanted to give my teammate full access to it, so that he could work on the resource and add functions if required. I granted him “owner” access rights (the highest that were available) but he was still not able to work properly with that function. In the web UI it did work more or less but than again in VS code there's no chance to do anything (adding a function or deploying it). I ended up doing something that goes against everything I learned about security: I logged in with my credentials on his machine.

So imagine yourself now already sitting in front of your laptop for about 4 ½ hours and you did not manage to do any of the actual work you set out to do.

Ditching Azure Functions and switching to GCP

That was the moment when we ditched the idea of doing the backend as an Azure function. We switched to GCP where we started all over again. As I've never worked with that platform either I expected a similar hard start, as I already had in the last few hours with Azure. But then about 25 minutes later we achieved more on GCP than with Azure until then.

Something both Azure and GCP do better than AWS is that they have the logs of a serverless function in the same window as the function itself. AWS has a different approach here and you have to change to the cloud logs when you want to get info about your function and how it worked. Props to both Google and Microsoft for solving this a lot better!

Actually a hint for AWS: Give your user all controls and info at a single place

Cognitive services

The prices you could win at the Hackathon were attached to using Azure and thereby we stuck to the cognitive services for doing the news search and the sentiment analysis. Overall the API is straight forward: Send your data and get the results back.

One thing we got told in a presentation and that you should keep in mind when using the cognitive services: You do not control the model and it could change at any moment in time. So if you use the cognitive services for productive use, you should continuously check that the API didn't change its behavior in a way that influences your product in a bad way. But most of the time it is still a lot cheaper and better than building the model yourself

The problem that we did have with the services were again authentication issues. Quite confusing some of the cognitive services (e.g. the sentiment analysis) have different API base URLs depending on where you register that cognitive service and others do not. As I assume they need that manual setting of data centers for a particular (unknown to me) reason. Indeed I would propose to have all the cognitive services bound to a location.

The news search, for example, is not bound to a location and so we had two different behaviors of the API base URLs in our so short and easy application:

Pointing to the wrong location is pure incompetence on the developer side but it would help a lot if there would be a distinct error code/message for that scenario.

Have the same base URL behavior for all cognitive services

Return some sort of 'wrong location'-error if you have a valid API token but you are pointing to the wrong location

Insufficiently documented SDKs

Azure offers SDKs for using their services. We gave the JS SDK for the cognitive services a try. Here we had both ups and downs: First, props to the developers coding the SDKs, as they are straight forward and do what they should. Even the code itself looks good...but why the hell do I have to look into the code of the SDKs to get all the options the functions offer? When you stick to the documentation provided via the GitHub readme or NPM you only get a fraction of the functionality. We were confused that Microsoft's own SKDs seemed not to be API complete. Looking into the code we saw they are actually API complete and do offer a lot more options than documented.

Please Microsoft: Properly document your functionalities!

IMO there must be deep problems with the internal release processes at Azure. It is not acceptable that an IT company that's been in the industry for so long allows itself such a basic mistake. You should not release your products (and I see the SDKs as such) without proper documentation.

“Code Examples”

During our trial and error period of trying to get the JS SDK running, we stumbled upon the quickstart guide for the cognitive services Quickstart: Analyze a remote image using the REST API with Node.js in Computer Vision

Instead of using their own SDK and explaining how to use it they show you how to manually build an HTTP request in JS. Sure that can be helpful for new JS coders, but if you have an SDK for that particular reason...why are you not using it? Looks like the left hand is not knowing what the right-hand does.

Stick to one way of doing things. If you have an SKD, also use it in your quickstart guides for being consistent

Conclusion

In the end, we did port the code back from GCP to an Azure function (again ~1h of work). We selected JS instead of Python and coded completely in the web UI...that did work. I now know how real Microsoft business developers do their daily business...never leave the web UI and just accept that life is hard.

Microsoft failed to deliver a adequate experience here and lost me as a potential customer. How can it be that I was able to do the same things in a fraction of the time in GCP? (And keep in mind: it was already 3am in the morning, I was super tired and I also never worked with GCP before)

None of the three major players are perfect and sure I understand it is hard to deliver fast and keeping good quality in this highly competitive market. But maybe actually going the step further will help to win in the end.

Once again: This is me only rating the onboarding experience of Azure in particular! No general opinion on Microsoft.

Last one: The Azure web UI didn't work in Chrome. So if you have issues with that, Firefox did the trick for us ;)