Embedded Google Maps not working with Google App Engine


This one was actually pretty easy to solve, but was interesting anyway.

Developing locally I was OK using http requests for my script files as my local server has now issues with using either http or https for requests. However, App Engine has other ideas. In my scripts file I was including the below reference:


This worked no problem on my machine, but when I published to Google App Engine I saw the below:


Once I changed the script to use HTTPS:


I also needed to add my new site to the list of allowed referrers, by going to the Google Cloud Dashboard > APIs and Services > Selecting my project from the dropdown > clicking Maps Javascript API > Credentials > Editing the credential and adding in a record for my site.

Then I was presented with what I wanted to see:


Data Type Changing in UNION


An interesting issue came up recently where a union was creating a rounding problem. One side of the UNION contained a field with data type NUMBERIC(38,2) and the other side, (the equivalent field) was DECIMAL(24,9). One field having the bigger precision (number of digits to the left of the decimal point) and the other have a bigger scale (number of digits to the right of the decimal point)..

Let’s  see it in action..







And when we union them together: 






The answer lies in the fact that SQL Server will preserve the biggest scale over the biggest precision. After all I’d be far more annoyed if I lost £1000 that .0120144414324p..

The solution to this problem is to ensure that you cast to the same data type in both of your queries to ensure you know the data types that will come out of the other end.

SQL Server – Find all non empty tables


So there are a couple of ways to do this, one is to go to every table in your database and check whether there is data, or you can go to the indexes and check the row count, here I’ll show you how to do both:

1: hit every table (don’t do this in a production environment or your DBAs will go mental..

IF OBJECT_ID('tempdb..#Tables') IS NOT NULL
	DROP TABLE #Tables;

--Create some temp tables and declare variables
TableName VARCHAR(100),
HasRows BIT,
Processed BIT

Result INT

DECLARE @TableCount INT = 0, @i INT = 1;

DECLARE @CurrentTable NVARCHAR(150);

--Get the name of each table in the database wrapped in square brackets.
INSERT INTO #Tables (TableName, HasRows, Processed)
SELECT '[' + s.name + '].[' + o.name + ']', 0, 0
FROM sys.objects o
INNER JOIN sys.schemas s ON s.schema_id = o.schema_id
WHERE type = 'u'
--AND o.name IN (N'Tbl_Name','Tbl_OtherTable');

SELECT @TableCount = COUNT(1) FROM #tables;

--Loop through all of the tables and check if there are rows
WHILE @i <= @TableCount

SELECT @CurrentTable = 'SELECT TOP 1 1 FROM ' + TableName FROM #Tables WHERE TableId = @i;

--PRINT @CurrentTable;

INSERT INTO @Result EXEC sp_executesql @CurrentTable;

--Populate the result table
UPDATE t SET t.HasRows = Result
FROM #Tables t
INNER JOIN @Result r ON t.TableId = @i;

SET @i = @i + 1;


--Get the rows we're interested in
FROM #Tables
WHERE HasRows = 1


The other way is to hit the indexes and check the row counts:

SELECT DISTINCT s.name [Schema], o.name [Table]--, i.name [IndexName], p.[rows] [RowCount]
FROM sys.objects o
INNER JOIN sys.schemas s	ON s.schema_id = o.schema_id
INNER JOIN sys.indexes i	ON i.object_id = o.object_id
INNER JOIN sys.partitions p ON p.index_id = i.index_id
								AND p.object_id = i.object_id
WHERE o.type = 'U'
AND p.[rows] > 0
ORDER BY s.name, o.name


You can remove the DISTINCT and uncomment i.name and p.rows to get the count of rows in each index.



SSIS Cache Transform – Failed to prepare the cache for new data


Encountered this error today when using a cache transform to populate a lookup cache with some data that I’d look up against multiple times:


I hadn’t set up the cache transform to use a .caw file (more info here), so there were no permissions issues with writing to disk, all that should be happening is writing to memory. Here’s what my SSIS package looked like when the error was received:


In the above we get the number of days worth of data to load and then load each day separately into the destination table (we’re loading each day as we hit some external limits if we don’t). The issue here was you can’t overwrite or append data to the cache. So all that was required was to move the ‘Load the lookup cache’ step outside of the for loop.


After the cache was moved out of the loop, it loaded successfully.


Using Microsoft Azure Machine Learning


There are many new technologies arriving all the time, but unless you’ve been living under a rock, you’ll have already heard about the impact the cloud is having on
business day to day.

One of the most exciting cloud offerings is the Microsoft Azure platform, based in data centres across Europe. Here you’ll be able to find everything from Linux Virtual Machines to web servers to HD Insight Clusters.

I’ve had the opportunity to travel to the Microsoft data centre in Dublin, and it is impressive as it is large. Moving to commodity hardware as the backbone of the platform allows normal users to gain access to massive clusters of compute power priced on usage. Imagine asking your boss if you could purchase a 100 node cluster  for your server room just to play with for a few days? The Microsoft Azure platform grants you that ability for a fraction of the cost.

One of the most powerful offerings however is Microsoft’s cloud based Machine Learning workspace. This offering allows users to make use of powerful machine learning algorithms to solve the kind of problems procurement have been wanting to solve for years, namely how can we quickly improve our forecasts and predict outcomes?

Given that I work for an airline, the easiest place to start is with our very own type of product, flights. One of the things we need to make sure of as a business is that our flights arrive on time and our customers arrive safely and happily at their destinations without any delays. With this in mind is there any way we can use machine learning to predict flight delays? Of course there is. I’ll show you how below.

Step 1: Get your Azure Subscription

You can sign up for a free account here. This will allow you to test out Microsoft Azure and more specifically to this blog, Microsoft Azure Machine Learning.

Step 2: Sign in

Once you’ve signed into the portal you’ll see the dashboard.


Now go to Browse > Machine Learning


At the moment you’ll get taken back to the old Azure Portal (for everything other than ML always use the new one, it’s leaps forward)

Step 3: Create a new experiment by clicking the big + sign that says NEW


You’ll be offered the below screen and you can go ahead and click Quick Create to set up your new experiment. You’ll need to add a new workspace name, select a location (West Europe is closest to us here in Leeds of the options). You will also need to create a new storage account to store your experiment data and the results of your experiments.


Azure will then go ahead and create you a new Azure Machine Learning workspace, pretty simple huh!

Step 4: Once your workspace has been created you can access it by clicking the Sign-in to ML Studio button:


Step 5: Create a new experiment

On the Experiments tab click the + NEW button and select Blank Experiment


Now you’ll be presented with a blank experiment, and now we can get started!


Pause to reflect

Let’s take a minute to explore the ML workspace. On the left of the canvas you’ll see Saved Datasets, Data Format Conversions, Feature Selection and many other areas, these areas contain modules we can use during our experiment. Machine learning in the format we’re using consists of a small number of parts:

1: Acquire data – Here we get the data we want to use
2: Select Columns – Here we select which values in our data set are important
3: Split data – This is an especially important concept to understand. In this step we need to split our dataset into two parts,1 for training the model and 1 for scoring the model.

4: Training Set: This set contains the thing we’re attempting to predict, in this case it will contain the columns used to do the predicting along with the value we’re trying to predict. It’s this set that the ML algorithm (we’ll learn about algorithms later) will use to work out which columns are important and how they affect the outcome, in our case we’re going to predict whether a flight is delayed by more than 15 minutes or not, i.e. delayed (1) or not delayed (0).

5: Scoring Set: In this set we don’t provide the outcome, i.e. whether the flight was delayed or not, rather we let the algorithm determine this value. Now because we already know the outcome we can test the predicted value created by the model against the value we already know. This is how we score our model and work out how accurate it is.

Using a very common business concept, we’ll split our data 80/20. 80 for training the model and 20 for scoring the model.

6:  Algorithm Selection. This is where we pick which algorithm we’re going to use. There are a whole bunch to choose from and you can use this cheat sheet to work out which one is best suited to your problem. Because we’re only predicting whether the flight was delayed (1) or not delayed (0) we’ll be using a Classification algorithm.

!Remember! This is SCIENCE! This means we don’t have to get the algorithm selection correct the first time. It’s called an experiment for a reason, so run as many different algorithms as you can and mark down all of the results until you find the best fit. Maybe what you’re trying to predict can’t be predicted, that’s fine, science is all about having a hypothesis and testing it to see whether it’s true or not.

7: Evaluate model. This is where we test how well the algorithm has predicted what we asked it to. We’ll learn more about this later.

Now let’s get started!

Step 6: Get your data

Just to make things so much simpler for us, Microsoft have included an On Time Performance dataset that we can use. You can find it by clicking Saved Datasets > Samples > Flight Delays Data


Drag this over to the first space on your experiment canvas.

Now you can right click on the circle at the bottom and select Visualize. This will show you the data and give you some distribution graphs. Note that the ArrDel15 column contains 1’s and 0’s. This is boolean logic, 1 meaning true (is delayed more than 15 minutes) and 0 meaning false (isn’t delayed)


Step 7: Select columns.

Here we need to go to the Manipulation menu and select ‘Select Columns in Dataset’

















Next you’ll need to grab the small circle at the bottom of the Flight Delays Data dataset and drag down to connect it with the top circle of the Select Columns in Dataset transformation


Now click the Launch Column Selector button to pick the columns we want to use


Don’t select all of the columns as selecting any of the delay columns will let the algorithm cheat by giving it the answers, i.e. if [ArrDelay] is over 15 then the chance of [ArrDel15] being 1 is 100% and the algorithm will pick up on this. It also makes sense that there’s a higher chance of the flight having an arrival delay if it it had a departure delay, so let’s exclude [DepDelay] and [DepDel15]. We only need to include [ArrDel15] as this is the column we’re trying to predict



Step 8: Split the data

This is where we split our data into training and scoring datasets. From the menu on the left, select Data Transformation > Sample and Split


Now click on Split Data and you’ll see the properties pop up in Properties panel. Set the ‘Fraction of rows in the first output dataset’ to be 0.8, which is the 80% we mentioned before.

Step 9: Train the model

The Train Model transformation takes two inputs and has one output. The inputs are one side of the split data (this is our training set), and the algorithm we want to use.

From menu select Train > Train Model, drag it onto your canvas and join the right hand input to the output of the Split Data transformation. Then click on Train Model and in the properties panel on the right click the ‘Launch column selector’. This is where we pick the field we want to predict, in our case [ArrDel15], which is the field that shows if the flight was late arriving by more than 15 minutes.


Now from the menu select Initialize Model > Classification > Two Class Boosted Decision Tree, drag it onto the canvas and join it up to the left input of the Train Model transformation. Why I’ve chosen this algorithm is beyond the scope of this blog post and you can find more information here. For now you can test all of the Classification algorithms to see which one gives you the best fit.



Step 10: Score the model

This step will test how well our model has predicted our outcome.

From the menu select Machine Learning > Score > Score Model and drag it onto the canvas. Now we can connect the right hand output from the Split Data transformation up to the right hand input of the Score Model transform and the output of the Train Model up to the left hand input.


Step 11: Evaluate Model

This will show us the results of our experiment and whether we managed to predict our outcome. Remember this is an experiment and we may well find out we’re missing something or we need to find out more about our data, which is all fine.

From the menu select Machine Learning > Evaluate > Evaluate Model and connect it to the output of the Score Model transform.


Step 12: Run our experiment!!

Now we can run out experiment by clicking the run button


How did we do? Right click on the output of the Evaluate Model and select Visualize to find out. Here you can see an output of how well the model was able to predict whether each flight would arrive within 15 minutes or not. Below are the results we’ve achieved.


If you pay attention to the bottom left graph you’ll see the mix of predictions and results that the algorithm achieved


True Positive: The number of records the algorithm predicted would be true (delayed) that were true (delayed).
False Positive: The number of records the algorithm predicted would be true (delayed) that were actually false (on time).
False Negative: The number of records the algorithm predicted would be false (on time) that were true (delayed).
True Negative: The number of records the algorithm predicted would be false (on time) that were false (on time).
Accuracy: This is a ratio of correctly predicted results. This is the (True Positives + True Negatives) / The total number of records.
You can find out more information about Precision, Recall, and F1 Score here.

It’s obvious that we need more information to be able to accurately predict whether a flight will be delayed or not, but it gives a great introduction into how to use Azure Machine Learning, and we can now go on to augment our dataset with other predictor values, maybe weather or strike information might help, the possibilities are out there, it’s your job as a data scientist to find them! Good luck.





SSMS – Running scripts in SQLCMD Mode


According to MSDN, SQLCMD can be used to process Windows System commands and Transact-SQL statements in the same script.

One of the most useful things you can do is pass parameters into your script from the command line by using the :setvar command and use them to replace sections of your script, such as user names. Variables set when using :setvar cannot be updated as they are passed into the script when it is called.

You can turn on SQLCMD in SQL Server Management Studio by selecting SQLCMD Mode form the Query menu:

USE [BestDatabase]

:setvar ENV INT

	CREATE USER [BestDomain\BD_Svc$(ENV)] FROM LOGIN [BestDomain\BD_Svc$(ENV)];


The above script accepts the parameter ENV and uses it to replace $(ENV) with the value of the parameter, in this case INT. As you can see the above script, when executed, will add the SQL Server login BestDomain\BD_SvcINT, to the BestDatabase database.

I’m sure you can imagine the rest of the possiblities!

Login failed for user ‘NT AUTHORITY\ANONYMOUS USER’


I needed to manually run a project deployment model SSIS package (on box A) that talks to a SQL Server instance on another machine (box B), the package runs a stored procedure and returns some results. Box A and Box B are not linked servers, however the SQL Agent service account and the SQL Server service account from Box A both have permissions to execute the procedure on box B.

When I run the package manually through the SSIS project


it fails with the below error:


What’s going on? I’ve got permission to run the stored procedure on Box B as well, but my user seems to be getting replaced with NT AUTHORITY\ANONYMOUS LOGIN. The issue here is that as of SQL Server needs credential delegation to be turned on. You can do this either through a linked server, or as I did it, by creating a SQL Agent job to handle the delegation for me.

There’s more learning to do here as we may have impersonation handling the delegation for me, but for the purposes of recording how an error message relates to a problem, I think this suffices for the moment and will allow further investigation.