Monday, May 19, 2014

Datawarehouse Performance principles.

Lately I was involving myself on a topic of what technology was most performant for data warehousing : Do We use table partitioning or do we move to in memory databases.

As this topic has been hashed and rehashed thousands and millions of times with everyone / every vendor  pushing his own solution. use this / don't do that. Apply this method / make it like that.
This makes me want to go back to the basics and trying to reapply them to the more modern world.

Most of the techniques develloped over the years are tools to go around the fact that we don't have infinite resources to process your business questions. nowadays with some of the newer techniques developped have lulled us into thinking we DO HAVE INFINITE resources, although we are not in the old days this is not yet true. This means that we will still need to compromise on a lot of things.

In the years there are some true things that I continue to use as a rule of thumb.

Rule 1. Optimise for your goal.
OLTP systems are optimized for update , Decision support is optimized for Reporting. The techniques of Star and snowflake and OLAP are illutstrations of that where the goal is to Report a KPI by an axis of reporting. Other models like graph based decision support will be more difficult to implement in that shape. This is where the mathematical rules of complexity apply, correct modeling will result in algorithms with lower complexity used and thus better performance this is done within the boundries set by rule 2.

Rule 2. Get the gap between the data and the place it is processed as small as possible.
The more distance you get between the data and its processing the more power is lost .
The concept of distance was once explained to me as :
  • same machine / same process.
  • same machine / other processes / IPC.
  • same machine / other processes / local network.
  • different machine / same cluster or lan.
  • different machine / WAN.
the round trips between the process and the data (to illustrate it , picture the performance of shipping a million rows on the other side of the world to count the number of rows)
Eliminating distance can also be done by partitioning data, which means that you skip blocks that you know don't need to be processed based on metadata.
Here processing data in memory means you have eliminated distance between process and data again so Rule 2.

there are exceptions of course and they are dictated by rule 3.

Rule 3. When you want scalability remove as much shared components as possible.
Most systems show a scalability graph that flattens out after some load, the idea of a scalable system is to be as linear as possible , meaning that system load equals system performance.
everywhere that the system is shared at a place where the distance between data and process is big , you have a very big impact on performance.
one used the following to illustrate :

When a dbserver with one CPU and one DISK takes 1 minute to process a query.
How much time does a dbserver with one CPU and three DISKS to process the same query?
How much time does a dbserver with 4 CPUs and one DISK to process the same query?
How much time does a dbserver with 4 CPUs and three DISKS to process the same query?
the answer is you don't know.

The fact that you don't know means that somewhere at an unknown time your system will hit a wall and you will never know what hists you. This can be a very nasty place.

There are a lot of systems which are currently try to get around those issues by creating an inexpensive array of servers and splitting the load over those different machines. those are the massive parallel systems.

But that is theory, in practice the company where you work will have made the choices for you , and then you need to be creative and learn to think out of the box...

G



Tuesday, March 13, 2012

A lost memory

A few weeks ago my grand father died , from the last stages of disease.

His disease made him forget , himself , the people around him , his place in the world.

His disease made him lose his place as a man in this world . Each time his world broke a bit further down, the way he wanted to be seen in the world , the way he wanted to be percieved was shown for what it was in its core.
Pretence fell and the truth is laid bare. Strength and weaknesses gone.

As grandfather he was not there to play and fill my childhood memories good and bad. Later on he hoped to buy in that position to no avail. No that was not his role. His role was putting my father in his role. Being the adam father of cane and abel. Being the adam thrown out of paradise without knowing why. Trying to recreate it with cardboard boxes. His piece of paradise was with his wife , now joined back together in heaven.

In memory of George Maerten.
His book is closed , rested on the shelf.


Monday, August 22, 2011

The death of minor versions.

Oh no it's Firefox 242 coming out , but is it compatible with [whatever tool] version 452 ! will it run on my windose 666 and Office 500.

more and more products, I am following are abandoning minor releases.
what is it with the world?
we used to have major releases that really anounced great new features and great advancements in functionalities. Not anymore , revenue streams require everyone to release a major release every 6 months  so that you have to upgrade every 6 months. Just above the free upgrade period and just below a year.

Before that you needed to buy your new software every 3-4 years and do minor updates in the meanwhile, but as the speed of the upgrades and the incompatibilities creep in , we are forced to upgrade.

Yes, absolutely mister customer.
As vendors are more or less getting the upgrade often option in, they say don't buy a product but a solution.
The vendor solution is in fact to transmogrify the product into a solution. Change the hamer in a nail thrusting tool service, which adapts to the newer nails technology. Currently we are in the middle of that process where most vendors are giving year subscriptions with an entrance fee.
We are not yet at the level SAS software has attained with it's business model where the tool is rented, but I think the industry will get there soon. The advantage of the renting is that the vendor always needs to be on his toes to keep up with the world around it. It will be a necessity and not a luxury. the platforms continue to accelerate and if you can't keep up ,your tools will just fail on you.

my 10 Yo tells me programming is from the middle ages.

This morning at the breakfast table we are discussing computer camps. My son tells me that it would be great , you find a game you like and you play all day!
Shocked and in awe , I tell him about how I did a computer camp when I was young where they thaught programming. And I really loved it!
More shock and awe was coming in my direction.

"Yes Daddy , but everything has already been programmed . You use stuff nowadays. Programming was a thing of the middle ages!"

What to do with such statements.

Thursday, August 18, 2011

Umbrella apocalypse

umbrella apocalypse is the result of huge rainfall and great winds.
Umbrellas broken thrown away , useless to the modern society.
The more I took pictures of those umbrella's , the more I was imagining that they were not umbrellas but an erzatz for regular people.

People also get through hardship, those who make it through are all around us . Those who don't get thown away on the street . For each umbrella I imagined someone lying there broken and lost.

Maybe it is the cowardice of not wanting to take pictures of people in misery , or maybe the umbrellas make the reality more bearable. which one it is I don't know.

I have edited those pictures from my Ipad and uploaded them using a local wifi, I described in a previous post how I executed my mobile workflow.

Photo workflow on the road

Shooting my umbrella apocalypse pictures in amsterdam , made me wonder if I could publish while on vacation with less computing power than what I have at home.

My hardware for the trip :
- my camera with SD cards
- my ipad
- ipad camera connection kit

And of course wifi at our local appartment.

Step 1 importing the pictures on the ipad with the connection kit.

The foto app starts up and imports all pictures flawelessly.
I could complain about the lack of management of pictures like grouping and no multiple delete, but thar would be being picky.

Step 2 editing in photogene.

Excellent color workflow, great intuitive interface. The export to flickr was not. So save the edits and on we go.

Step 3 using the iphone flickr app on the ipad , imo still the best interface.

Upload , add comments and tags.

Voila its even better than the real thing , when you can do that when enjoying the setting sun.

- Posted using BlogPress from my iPhone

Friday, July 22, 2011

Code coverage ?

One mutters in a dark corner about code coverage , and I immediately think about the olden days where applications could only follow one path , like the old and Role playing games.

the code coverage analysis seems to me like something local , but which can only cover a very narrow scope in the modern day programming with all the variables. Objects being modified gazillions of aspects at the same time. threads tearing down the variables in your cores. How can the coverage of
 
Either that or it might be a reflection on how we program and see how we program and remove as much dependencies in our code so that any traced code only has limited variables.
If that were possible then we could prove that each nugget of code is correct and build a solid toolbox of code, but that would be against the nature of reality ?