Friday, July 16, 2010

Autofac Part 3 - Module registration and event handlers

This time we take a look at registration using modules, and using the event handlers provided by Autofac.

Registration using modules
Registration using modules allows more flexibility, reduced complexity, allows configuration parameters to be explicit and better abstraction and type safety.

To use module you have to create a class, which inherits from Autofac.Module and overrides the Load method, fx.:
class ModuleReg : Module
{
  protected override void Load(ContainerBuilder builder)
  {
    base.Load(builder);
    builder.RegisterType<object>();
  }
}
To register a module the following are done:
var builder = new ContainerBuilder();
builder.RegisterModule(new ModuleReg());
var container = builder.Build();
Now you can get all instances, types and everything else registered in the module, resolved just as normal. Modules can register anything you can outside a module, for example you can register more modules.

Modules can are more dynamic as you can set properties and such to indicate different scenarios. More information about module registration can be found here

Autofac events
There are a couple for events which can be handled. These event as described below.
Preparing
Fired before the activation process to allow parameters to be changed or an alternative instance to be provided.
Activating
Fires after the construction of an instance but before it is shared with any other or any members invoked on it.
Activated
Fire when the activation process for a new instance is complete.
Registered
Information about the occurrence of a component being registered with a container.
All events can be handlede by using the methods provided when registering something.
builder.RegisterType<object>().OnPreparing(somedelegate);
builder.RegisterType<object>().OnActivated(somedelegate);
builder.RegisterType<object>().OnActivating(somedelegate);
builder.RegisterType<object>().OnRegistered(somedelegate);
Multiple events can be handled like this:
builder.RegisterType<object>().OnPreparing(somedelegate)
  .OnActivated(somedelegate);
These events are poorly documented, so here is a quick intro. The OnPreparing method, takes this parameter:
Action<PreparingEventArgs>
And you can changed the instance parameters like this:
IEnumerable<Parameter> parameters = new Parameter[] { new NamedParameter("n", 1) };
builder.RegisterType<object>().OnPreparing(e => e.Parameters = parameters);
The OnActivating method take this parameter:
Action<IActivatingEventArgs<TLimit>>
And you can change the instance provided like this:
builder.RegisterType<object>().OnActivation(e => e.ReplaceInstance(new object());
The OnActivated method take this parameter:
Action<IActivatedEventArgs<TLimit>>
And you can use the instance provided like this:
builder.RegisterType<object>().OnActivated(e => e.Instance.Method());
And at last the OnRegistered method, which takes this parameters:
Action<ComponentRegisteredEventArgs>
In my opinion this event aren't very useful, but it is still included here for completeness. You can only get the ComponentRegistry and the ComponentRegistration from this event.
builder.RegisterType<object>().OnRegistered(e =>
  {
    registry = e.ComponentRegistry;
    cr = e.ComponentRegistration;
  });

Thursday, July 1, 2010

Autofac Part 2 - Ordering, Keyed, Named & Metadata Registration

As promised we are going to look into registration ordering, keyed and name registration and registration with metadata.

Registration ordering
When you register something ordering are by default preserved, meaning that the last registration are returned when resolving.
var builder = new ContainerBuilder();
object inst1 = new object();
object inst2 = new object();
builder.RegisterInstance(inst1);
builder.RegisterInstance(inst2);
object resolved = builder.Build().Resolve<object>();
// Here resolved are the same instance as inst2
If you want another behavior it can be done the following way:
object inst1 = new object();
object inst2 = new object();
builder.RegisterInstance(inst1);
builder.RegisterInstance(inst2).PreserveExistingDefaults();
object resolved = builder.Build().Resolve<object>();
// Here resolved are the same instance as inst1
Named and keyed registration
Named and keyed registration are a very handy feature, which allow you to have multiple instances of the same object return if different situations. Here is an example for named registration:
string name = "object.registration";
var cb = new ContainerBuilder();
cb.RegisterType<object>().Named<object>(name);
var c = cb.Build();
object o1 = c.Resolve<object>(name);
// But this will fail unless an registration has been made without
// a name
o1 = c.Resolve<object>();
Keyed registration are basically done the same way:
object key = new object();
var cb = new ContainerBuilder();
cb.RegisterType<object>().Keyed<object>(key);
var c = cb.Build();
object o1 = c.Resolve<object>(key);
// But this will fail unless an registration has been made without
// a key
o1 = c.Resolve<object>();
You are free to mix normal, named and keyed registration as you want, which gives you a lot of flexibility.
Registration with metadata
Metadata allows you to place extra information alongside a registration.
var p1 = new KeyValuePair<string, object>("p1", "p1Value");
var p2 = new KeyValuePair<string, object>("p2", "p2Value");
var builder = new ContainerBuilder();
builder.RegisterType<object>()
    .WithMetadata(p1.Key, p1.Value)
    .WithMetadata(p2.Key, p2.Value);
var container = builder.Build();

Meta<object> objectWithMetadata = container.Resolve<Meta<object>>();
// The metadata can now be accessed through
// objectWithMetadata.Metadata
// The resolved instance can be accessed through
// objectWithMetadata.Value

In the next part I'll look into module registration and using the event handlers provided when registering and resolving.

Tuesday, June 29, 2010

Autofac Part 1 - ContainerBuilder: Simple registration

During my recent adventures, I've started looking into a IoC container an stumbled upon Autofac. The main problem are the lack of proper documentation, and that gives a somewhat bad impression, of a great IoC container.
Therefore I've decided to create some post about the different features present in Autofac, they are not meant to be a tutorial on how to use Autofac, only shed some light on the features Autofac provides.

The ContainerBuilder
One of the most important parts of Autofac, are the ContainerBuilder. It contains all the component registration, and has a lot of features.

Simple registration
Simple registration are easily, done by registering a type.
var cb = new ContainerBuilder();
cb.RegisterType<Abc>();
var c = cb.Build();
var a = c.Resolve<Abc>();

Simple interface registration
Registering a type as an interface, this are of course only allowed in the registered type implements the specified interface.
var cb = new ContainerBuilder();
cb.RegisterType<Abc>().As<IA>();
var c = cb.Build();
var a = c.Resolve<IA>();

External factory
When the container are disposed, all objects in the supplied by the container are also disposed. When using objects from a external factory, this may not be what you want, this is the reason why you can define that some objects shouldn't be disposed at the same time as the container.
var cb = new ContainerBuilder();
cb.RegisterType<Abc>().As<IA>().ExternallyOwned();
var c = cb.Build();
var a = c.Resolve<IA>();

Singleton
By default all dependent component or calls to Resolve, provides you with a new instance, but sometimes you want the same instance to be returned every time, just like a singleton.

var cb = new ContainerBuilder();
cb.RegisterType<Abc>().As<IA>().SingleInstance();
var c = cb.Build();
var a = c.Resolve<IA>();
var b = c.Resolve<IA>();
// a and b are the same object


Stay tuned for the next part where we take a look at registering ordering, keyed and name registration, and metadata.

Tuesday, March 30, 2010

PSSCOR2 .NET Debugger Extension

PSSCOR2 which is a .NET extension for WinDbg has been released, and can be download here. Currently the download is a 64-bit executable, which means you cannot run it on a 32-bit, but you can extract it using you favorite ZIP program. The ZIP file includes both 32-bit and 64-bit versions.

PSSCOR2 it SOS and more, it contains all the commands SOS does, and a lot more. It is especially useful when debugging ASP.NET.

Here is a short list and description of the command available in PSSCOR2:

Object inspection

DumbObj
Dumps the fields and other important object information like EEClass, the MethodTable and the size.

DumpArray
Allows you to examine elements of an array object.

DumpStackObjects
Displays any managed objects within the bounds of the current stack.

DumpAllExceptions
Displays any objects derived from System.Exception located on the managed heap.

DumpHeap
Traverses the garbage collected heap, and collect statistics about objects. It has a lot of various options, useful for restricting the output.

DumpVC
Allows you to examine fields of a value class.

GCRoot
Displays references to an object.

ObjSize
Displays the total size of an object.

FinalizeQueue
Lists all objects registered for finalization.

PrintException
Provides a nice output for Exception objects.

TraverseHeap
Creates a file in a format understood by the CLR profiler, which then can be used to create a graphical display of the GC heap.

DumpField
Dumps the object that is in a given field of an object. Allows you to avoid a !DumpObj on the parent object first.

DumpDynamicAssemblies
Provides you with a list of all dynamically created assemblies and their size.

GCRef
Prints a list of the parents and children of a given object. This allows you to see who holds a reference to an object, even if the object aren't rooted to a thread or handle table anymore.

DumpColumnNames
Dumps the column names of a given DataTable.

DumpRequestQueue
Dumps the queues for ASP.NET, threadpool and RequestQueue objects.

DumpUMService
Displays m_sipStackConnection objects for a UMService (Exchange).

Examininig CLR data structures

DumpDomain
List AppDomains a related information, such as assemblies loaded in the AppDomain.

EEHeap
Enumerates process memory consumed by internal CLR data structures. This can fx. be used to see if it is managed or unmanaged objects which are leaking memory.

Name2EE
Turns a class name into a MethodTable and EEClass.

SyncBlk
Helps you to detect managed deadlocks.

DumpThreadConfig
Dumps some thread values from config files, the values dumped are maxWorkerThreads, maxIoThreads, minFreeThreads, minLocalRequestFreeThreads, maxConnection and autoConfig.

DumpMT
Examines a MethodTable.

DumpClass
Allows you to see attributes and a list of static fields of a type.

DumpMD
Lists information about a MethodDesc.

Token2EE
Turns a metadata token into a MethodTable or MethodDesc.

EEVersion
Prints the CLR version and if the code is running in Workstation on Server mode.

DumpModule
Dumps module information, especially metadata tokens to CLR data structures.

ThreadPool
Lists basic information about the ThreadPool.

DumpHttpRuntime
Displays the HttpRuntime objects and prints some common properties for them.

DumpIL
Prints the IL code associated with a managed method.

PrintDateTime
Prints the time of the DateTime object passed to it.

DumpDataTables
Dumps the DataTables in the heap and shows how many rows and columns each table has.

DumpAssembly
Lists all modules associated with an assembly.

RCWCleanupList
Displays a list of runtime callable wrappers awaiting cleanup.

PrintIPAddress
Prints IP the address of a IPAddress object.

DumpHttpContext
Dumps the HttpContext in the heap.

ASPXPages
Prints information on ASPX pages running on threads.

DumpASPNETCache
Shows objects in the ASP.NET cache.

DumpSig
Displays information about a Sig structure.


DumpMethodSig
Displays information about a MethodSig structure.

DumpRuntimeTypes
Finds all System.RuntimeType objects in the GC heap.

ConvertVTDateToDate
Converts a VALUETYPE DateTime to a date.

ConvertTicksToDate
Converts a DateTime.Ticks or FILETIME to a date.

DumpRequestTable
Lists all request in the ISS ASP.NET request table.

DumpHistoryTable
Dumps the aspnet_wp history table.

DumpBuckets
Dumps entire request table buckets.

GetWorkItems
Displays request and work items associated with a CLinkListNode.

DumpXmlDocument
Prints out a System.Xml.XmlDocument.

DumpCollection
Dumps the contents of a collection.

Examining the GC history

HistInit
Initializes the PSSCOR structures from the stress log save in the debuggee.

HistStats
Provides a number of GC statistics obtained from the stress log.

HistRoot
Provides information related to promotions and relocations of the specified root.

HistObj
Displays the chain of GC relocations that may have led to the address specified.

HistObjFind
Displays all entries that reference the specified object.

HistClear
Releases any resources used by the Hist commands.

Examining code and stacks

Threads
Lists all managed threads in the process.

CLRStack
Displays the stack of the current thread.

IP2MD
Displays the MethodDesc from a code address.

BPMD
Provides managed breakpoint support.

U
Provides an annotated disassembly of a managed method.

EEStack
Displays the stack of all threads in the running process.

GCInfo
Displays internal information about the GC.

EHInfo
Displays exception handling blocks in a jitted method.

ComState
Lists the COM apartment model for each thread.

Diagnostics Utilities

VerifyHeap
Checks the heap for signs of corruption.

DumpLog
Dumps the stress log to a file.

FindAppDomain
Print the AppDomain of a given object.

SaveModule
Saves a in-memory module to disk.

SaveAllModuels
Save all in-memory modules to disk.

GCHandles
Provide statistics about the GC handles in the process.

GCHandleLeaks
Tracks down GC handle leaks.

VMMap
List the virtual address space and the type of protection applied.

VMStat
Shows a sumamry view of the virtual address space.

ProcInfo
Lists process information.

StopOnException
Allows you to defined exceptions where the debugger should stop the process.

MinidumpMode
Not all PSSCOR commands can be used on minidumps, this command allows you to restrict commands that don't work on minidumps.

FindDebugTrue
Find AppDomains which have debug=true in their web.config files.

FindDebugModules
Prints out all modules that are Debug builds.

Analysis
Runs common analysis commands and save the output to a file.

CLRUsage
Prints the memory that .NET is using.

CheckCurrentException
Check if the current exception are the one specified, and saves the result in the pseudo register defined.

CurrentExceptionName
Print the name of the managed exception on the current stack.

VerifyObj
Checks a object for signs of corruption.

HeapStat
Prints heap statistics.

GCWhere
Displays the location in the GC heap of the specified argument.

ListNearObj
Displays objects preceding and succeeding the address specified.

Monday, March 1, 2010

New Infobright blog

I've decided to create a new blog, called Xharze's Infobright blog.
This blog are going to contain all Infobright related posts, instead of this one. Old posts and patches are still available here, but all new posts and patches are going to be posted on http://infobright.blogspot.com instead.

Thursday, February 25, 2010

Infobright ICE ini settings

The brighthouse.ini file contains all Infobright specific settings. But what settings are actually available, and what do they do? When I looked at the source code, I saw settings I had no idea existed.
The list below applies to ICE 3.3.1.

BufferingLevel
Default value: 2
Used when the loader heap can't hold the all the data, and defines the size of the buffer. The size can be calculated like this LoaderMainHeapSize / BufferingLevel.

CacheFolder
Default value: "cache"
As the name implies this defines the location of the Infobright cache folder.

CachingLevel
Default value: 1
Used to define the type of caching level used on the uncompressed heap. If the value are 1 or 2, Infobright removes data packs which aren't used from the heap, if it runs out of space. This is not done if the value are anything else than 1 or 2. If this isn't enough or another value are defined, it tries to release memory and it tries to compress the data and move it to the compressed heap.

LicenseFile
Default value: ""
This points to the Infobright license file, and are only used in the IEE evaluation edition.

ClusterSize
Default value: 2000
The maximum size of a data file, in MB.

ControlMessages
Default value: 0
Enables control messages, which usually needed when investigating performance. The value 2 enables it, values above 2, enables it and also produces timestamps.

HugeFileDir
Default value: ""
Only used on non-Windows machines. Used to specify the location of a memory mapped file containing the server main heap. If the setting aren't defined, no memory mapped file are used, instead it is placed directly in memory.

InternalMessages
Default value: false
Used to print FET - Function Execution Times - to a log file called development.log. Only used if compiled with --with-FET.

KNFolder
Default value: BH_RSI_Repository
The folder containing the Knowledge Grid data.

KNLevel
Default value: 99
If the value are 0, Knowledge Grid aren't used. values above 0 enables the Knowledge Grid, but doesn't seem to do anything else in ICE.

LoaderMainHeapSize
Default value: 320
Size of the memory heap in the loader process, in MB.

LoaderSaveThreadNumber
Default value: 16
Number of parallel load threads. Only used in IEE.

PushDown
Default value: true
Enables or disables Engine Condition Pushdown. Should not be disabled.

ServerCompressedHeapSize
Default value: 256
Size of the compressed memory heap in the server process, in MB.

ServerMainHeapSize
Default value: 600
Size of the uncompressed memory heap in the server process, in MB.

UseCharSet
Default value: false
Used to enable or disable UTF8 support. As this isn't fully implemented in the current version, 3.3.1, it shouldn't be enabled.

AllowMySQLQueryPath
Default value: false
Allows queries to fallback to standard MySQL execution if necessary.

Wednesday, February 17, 2010

Infobright join algorithms

After I looked through the different sorter algorithms here, I've decided to take a look at the joining algorithms. They're a bit more complicated, but I was a very informative process.
First we need to understand the difference between a simple and a complex join.


Simple Join
A simple join is a join which only uses two tables, and isn't a BETWEEN operation.

Complex Join
Is a join which uses more than two tables, or a BETWEEN operation.

Now that we know the difference, we can take a closer look at the 3 different join algorithms.
  • Hash join
  • Sort join
  • General join
Hash join
This algorithm can only be used if it is a simple join and the condition is a equal operation. It uses a temporary table to store the hash for all key values and the matching tuples for the first table.
If enough memory are available all the hash values are placed in the temporary table, otherwise only the values which fit in memory are placed in the table. Then the values in the second table are iterate to find matching key.
If all key values from the first table weren't placed in the hash table, the next portion of keys are placed in the table, and the second table are iterated again, this is done until all keys have been processed.

Sort join
This algorithm can only be used if it is a simple join and the condition are <=, <, > or >=.
It works by inserting all keys and dimensions, into two sorters, one for each table. Using the Knowledge Grid irrelevant values are removed, and then the keys are sorted. Both sorters are then traversed in parallel and matched. The final step is to check additional constraints before the rows are committed to the output.

General join
This is the general joiner algorithm, and it is used when no other algorithm can be used. It is also the slowest algorithm available, it iterates through all the join dimensions and remove tuples not matching the defined join conditions.

How to determine to algorithm used?
If you enable ControlMessage, you can easily see which algorithm are used. There will be a line like this:
Tuples after inner/outer join noOfdimensions sort/mix/hash/loop noOfTuplesAfterJoin
Sort is the sort join, mix is mixed algorithm (if one fails, and general are used instead), hash is the hash join and loop is the general join.

Infobright ICE importer IGNORE LINES patch

The ICE importer in any of the current versions of ICE, newest 3.3.1, doesn't support IGNORE LINES in the LOAD DATA statement. For example, this this data file would fail, as we can't tell it to skip the first line:

Test data
1;2;3
4;5;6

It is possible to issue the LOAD DATA statement with the IGNORE LINES option, but it is ignored by the loader.
It is very easy to implement in the loader, and the attached patched does exactly this.

The patch is tested in ICE 3.3.1, but should also work in earlier versions. It must be applied in the ICE source code root.

I've created a bug report here.

Patch file:
ignoreLines331.patch (4 KB)

Tuesday, February 16, 2010

Infobright ICE MySQL Query Cache patch

Infobright ICE doesn't currently use the MySQL Query Cache for queries executed by the Infobright engine.
The attached patch enables the MySQL Query Cache for Infobright queries also.

There are currently a bug report about this here.

Patch file for ICE 3.3.1, should also work for earlier version:
queryCache331.patch (1 KB)

Infobright ICE importer default escape character patch

In the current, 3.3.1, and earlier version of ICE there is a problem with the importer. It doesn't handle escape characters in strings correctly. This is because the default escape character in the importer is nothing (ASCII code: 0). This patch sets the default escape character to '\', which is the default in MySQL.
You'll only run into this problem if you haven't define an escape character in the LOAD DATA statement, and try to load a string containing a ".

Patch file for ICE 3.3.1, but should also work in earlier version, has to be applied in the source code root:
escapedCharacter331.patch

Infobright ICE RENAME TABLE support

Currently no version of ICE supports the RENAME TABLE statement, but I'm about to change this.
At least from version 3.3.0, the RENAME TABLE statement has been almost fully implemented, to only thing missing is connecting the rename function, to the statement. I have know idea why this hasn't been done a long time ago.
I've created a forum post, trying to answer this question, it can be found here.
The patch enables you to give ICE the following statement:
RENAME TABLE oldName TO newName
The attached patch are tested in version 3.3.1, but should also work in 3.3.0, and should be applied in the root ICE source directory.

Patch file:
renameTable331.patch (1 KB)

UPDATE:
If you issue the RENAME TABLE command in the current implementation, the table file structure are corrupted. Therefore do not try to rename a Infobright table without applying this patch. I've create a bug report here.

UPDATE 2:
I got a response to the forum post I created. The answer to why it wasn't implemented was that renaming tables in ICE aren't officially supported, and therefore this wasn't a bug. See the answer here.

ASP.NET LoadControl overloads

Yesterday, one my friends asked me a ASP.NET question about LoadControl. According to MSDN, http://msdn.microsoft.com/en-us/library/system.web.ui.templatecontrol.loadcontrol.aspx, the LoadControl method are overloaded.
He had created a user control, with a constructor he would like to initialize, he also had a empty constructor. I've created my own code which describes the situation. The user control looked something like this:
UserControl.asx
<%@ Control Language="C#" AutoEventWireup="true" CodeBehind="UserControl.ascx.cs" Inherits="LoadControl.UserControl1" %>
<p>TextBox:</p>
<asp:TextBox runat="server" ID="TextBox1" />
UserControl.asx.cs
public partial class UserControl1 : System.Web.UI.UserControl
   private string text = "TextBox";
   public UserControl1()
   public UserControl1(string text) {
      this.text = text;
   }
}
The page look something like this:
Default.aspx
<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Default.aspx.cs" Inherits="LoadControl._Default" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
   <head runat="server">
      <title></title>
   </head>
   <body>
      <form id="form1" runat="server">
         <div>
            <asp:Panel ID="Panel1" runat="server" />
         </div>
      </form>
  </body>
</html>
Default.aspx.cs
public partial class _Default : System.Web.UI.Page {
   protected void Page_Load(object sender, EventArgs e)  {
      Panel1.Controls.Add(LoadControl("~/UserControl.ascx"));
      Panel1.Controls.Add(LoadControl(typeof(UserControl1), new object[] { "Test TextBox"}));
   }
}
When the page was accessed it surprisingly look like this:
The second instance of the user control, was never added to the page. The reason for this is the way ASP.NET handles user controls. When you use LoadControl(string), .NET processes the acsx file. When you use LoadControl(Type, Object[]), .NET does not process the ascx file, instead it uses the controls Render method. If this method doesn't exists nothing are rendered. To get code above to work like it was intended a Render method have to be added, the method could look something like this:
protected override void Render(HtmlTextWriter writer) {
   writer.Write("<p>" + text + "</p>");
   writer.Write("<input type='text' />");
}
The important lesson here is that user controls only uses the acsx markup file, if the markup file are specified to load. If only the type are used, the user control knows nothing about the ascx file.

Infobright ICE importer FIELDS ESCAPED BY patch

None of the Infobright ICE versions support NULLs escaped anything other than the default escape character '\'. This is probably because the default way to handle NULLs in the ICE exporter and importer are an empty field. In this example fields are terminated by ';', and NULLs are indicated the ICE default way.
1;;2
2;2;2
The problem is that ICE importer also supports NULLs defined the default MySQL way:
1;\N;2
2;2;2
But not with any other escape character, like this:
1;~N;2
2;2;2
In ICE you can define a custom NULL indication by setting the BH_NULL session variable, but for some reason this is not handled in the importer.

There are currently a bug report concerning this here.

I've created a patch to solve this "problem", which means that you can define another escape character like this:
LOAD DATA INFILE '/tmp/outfile.csv' INTO TABLE ibtest FIELDS TERMINATED BY '~';

The patch has to be applied in the root of the ICE source code directory.
The patch is tested in ICE 3.3.1, but should also work in any earlier versions.

Patch file:
escapedBy331.patch (2 KB)

Saturday, February 13, 2010

Infobright ICE sorter algorithms

In the current version of ICE, 3.3.1, there are 4 types of sorting algorithms, with the internal names:
  • SorterLimit
  • SorterCounting
  • SorterOnePass
  • SorterMultiPass
Which one that are used depends on the Infobright memory settings and type of sort to be done.
Before going in depth about the different algorithms, we need to understand how Infobright determines the size of the keys to order by. The method is actually pretty smart, from the Knowledge Grid Infobright can determine the biggest value of a column, and that value are used to determine the key size. For example a column can be declared as a integer which is 4 bytes, but if the column only contains values less than 255 bytes, the key only has to be 1 byte long, less than 65535, 2 bytes, etc.
The size of a row are calculate the same way as the key length, meaning that we get the smallest possible buffer.

The memory used by the Infobright sorter, are defined by the ServerMainHeapSize in the brighthouse.ini configuration file. The memory allocated to sorting are defined in the following heap size intervals.

ServerMainHeapSize Memory available for sorting
Less than 0.5 GB 64 MB
0.5 - 1.2 GB 128 MB
1.2 - 2.5 GB 256 MB
2.5 - 5 GB 512 MB
5 - 10 GB 1 GB
More than 10 GB 2 GB

SorterLimit
This algorithm uses a single memory buffer.
The criterias for using this algorithm criteria are as follows:
  • The number of rows retrieved must be less than a third of the total number of rows.
  • The number of rows retrieved must be less than a third of the maximum rows allowed in the memory.
To fulfill these requirements you have to define a LIMIT clause.
This algorithm does the sorting on-the-fly, meaning that when the values are loaded into the sort buffer, it is sorted right away, the other algorithms sorts when the values are retrieved. This means that a much smaller buffer are used, but because the sorting occures on-the-fly, it takes longer to sort many rows. Therefore it makes sense that this algorithm are chosen only on small datasets.

SorterCounting
This is a Counting Sort algorithm, which uses two memory buffers.
The criterias for using this algorithm criteria are as follows:
  • The memory must be able to hold twice as many rows as the table contains.
  • The key size must be less than 3 bytes long.
  • If the key is 1 byte long, the total number of rows must be above 1024. And if its 2 bytes long, the number of rows must be above 256000.
As the Counting Sort algorithm are impractical for large ranges, it is only for low-cardinality keys.

SorterOnePass
This algorithm uses either a Bubble SortQuick Sort or a combination of both, and uses a single memory buffer.
The criteria for using this algorithm criteria are as follows:
  • The memory must be able to hold the all the rows in the table.
If we are ordering less than 20 values are Bubble Sort are used. If we are ordering more than 20 values, most of the values, are sorted using Quick Sort, but depending on the distribution of values in the table smaller intervals are sorted using Bubble Sort.

SorterMultiPass
This is uses same sorting algorithm as SorterOnePass, but uses multiple buffers.
The criteria for using this algorithm criteria are as follows:
  • The memory cannot hold all the rows in the memory.
The number of buffers used are defined by the available memory, buffer has the size of the available memory, so if three times the memory are needed, three buffers are created. When a buffer has been filled, it is sorted and save to disk. When a value are retrieved the current buffer are sorted, and then the values, both from disk and memory, are returned.
Because of the multiple buffers and sort passes, this is the most costly sorting algorithm.

How to determine which algorithm are used?
It is pretty easy to determine which algorithm that are going to be used. Just look at the key length, row size, the number of rows in the table, the number of rows to get and the available memory. If ControlMessages have been enabled in the brighthouse.ini configuration file. You can also look at the contents of bh.err log file, and look for the Sorter initialized line, to get the number of rows, the key size and total row size. The line could look like this:
2010-02-12 20:05:33 [1] Sorter initialized for 28 rows, 1+8 bytes each.
Meaning that the table contains 28 rows, the key is 1 byte long and the total row size are 1+8 bytes long. With the information in hand, look at the criterias above to figure which algorithm are chosen.

Friday, February 12, 2010

Infobright ICE 3.3.1 released follow up - The biggest little-known secret

Before updating my Infobright patches, I decided to to a closer a look at what actually changed from 3.3.0 to 3.3.1, and to my surprisem, a new very big feature has been implemented, namely UTF-8 support.

Why would they implement something like this, even partially, without telling the public?
Even a tool called CharSetMigrationTool have been included in the source distribution, don't know about the binary distributions. The tool seems to update the Knowledge Grid to support UTF-8.

Just by looking at the source code, it seems like UTF-8 support has been fully implemented with a few exceptions, like CMAPs for UTF8. CMAPs are explained in more detail in this forum post.

On the ICE forum I posted the question, has it been implemented or not, the post can be found here.

Other than that, no untold changes seems to be hiding in 3.3.1 release.

UPDATE: One of the Infobright developers replied to my forum post, and he confirms that UTF-8 support has been partially implemented in ICE 3.3.1. The next release of ICE should contain UTF-8 support for the ICE loader process. See the reply here.

Thursday, February 4, 2010

Infobright ICE 3.3.1 released

Today Infobright 3.3.1 has been released.
The most important change are that MySQL have been upgraded from 5.1.14 to 5.1.40 GA, which fixes a bunch of different bugs, both performance, query syntax, etc., in the MySQL engine.
Because of the MySQL upgrade it is necessary to run the MySQL Updater program, as the data file structure has been changed in MySQL 5.1.40 GA. Upgrade instructions are available here:
Windows Upgrade Instructions
Linux Upgrade Instructions
Tar Upgrade Instructions
There are also a known issue with the 3.3.1 version.
Please note: There is a known issue with 3.3.1. A server crash may occur for queries of the form:

SELECT distinct(a1), max(a2) FROM t1 GROUP BY a3;

Where a1, a2, a3 may or may not be the same. It is caused by the use of max and min and the distinct clause, and may be easily worked around by removing the distinct and adding an additional clause to the GROUP BY.

This issue has been resolved and will be included in the next release. A full list of known and fixed issues is included at the end of this document.
All the information about the new release can be found here and can be downloaded here.
The patches previously posted on this blog, will be updated to work in 3.3.1. The updated patches will be added to the old post.

Debugging .NET application startup

When you start a .NET application where it can't find all assembly dependencies, you'll get an error. But sometimes it isn't a very describing one, for example the last time, I got error placing kernel32.dll as the fault module, no more than this. This error doesn't say anything about the actual error, what to do?
I fired up WinDbg, with the SOS extension. Then I went to File -> Open Executable, and chose the application to start. As WinDbg breaks before the program are actually run, I pressed F5, to continue the application. Then WinDbg breaks later on because of an exception, with an output something like this:
ModLoad: 77650000 77716000   C:\Windows\system32\ADVAPI32.dll
ModLoad: 777a0000 77863000   C:\Windows\system32\RPCRT4.dll
ModLoad: 72d70000 72dd6000   C:\Windows\Microsoft.NET\Framework\v4.0.30109\mscoreei.dll
ModLoad: 765c0000 76619000   C:\Windows\system32\SHLWAPI.dll
ModLoad: 77cc0000 77d0b000   C:\Windows\system32\GDI32.dll
ModLoad: 76620000 766bd000   C:\Windows\system32\USER32.dll
ModLoad: 77870000 7791a000   C:\Windows\system32\msvcrt.dll
ModLoad: 766c0000 766de000   C:\Windows\system32\IMM32.DLL
ModLoad: 76410000 764d8000   C:\Windows\system32\MSCTF.dll
ModLoad: 76350000 76359000   C:\Windows\system32\LPK.DLL
ModLoad: 76390000 7640d000   C:\Windows\system32\USP10.dll
ModLoad: 750e0000 7527e000   C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6002.18005_none_5cb72f96088b0de0\comctl32.dll
ModLoad: 72250000 727e0000   C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
ModLoad: 73e90000 73f2b000   C:\Windows\WinSxS\x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.4053_none_d08d7da0442a985d\MSVCR80.dll
ModLoad: 768b0000 773c0000   C:\Windows\system32\shell32.dll
ModLoad: 773c0000 77505000   C:\Windows\system32\ole32.dll
ModLoad: 70c50000 71748000   C:\Windows\assembly\NativeImages_v2.0.50727_32\mscorlib\894183c0c47bd4772fbfad4c1a7e3b71\mscorlib.ni.dll
ModLoad: 60340000 60348000   C:\Windows\Microsoft.NET\Framework\v2.0.50727\culture.dll
ModLoad: 72ea0000 72eec000   image72ea0000
ModLoad: 00a70000 00abc000   image00a70000
ModLoad: 72ea0000 72eec000   C:\Windows\assembly\GAC_MSIL\mscorlib.resources\2.0.0.0_da_b77a5c561934e089\mscorlib.resources.dll
(1090.1160): C++ EH exception - code e06d7363 (first chance)
(1090.1160): C++ EH exception - code e06d7363 (first chance)
(1090.1160): C++ EH exception - code e06d7363 (first chance)
(1090.1160): C++ EH exception - code e06d7363 (first chance)
(1090.1160): C++ EH exception - code e06d7363 (first chance)
(1090.1160): CLR exception - code e0434f4d (first chance)
(1090.1160): CLR exception - code e0434f4d (!!! second chance !!!)
eax=002cfb50 ebx=e0434f4d ecx=00000001 edx=00000000 esi=002cfbd8 edi=000f79a8
eip=7651fbae esp=002cfb50 ebp=002cfba0 iopl=0         nv up ei pl nz ac pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000216
KERNEL32!RaiseException+0x58:
7651fbae c9              leave
I start by looking at the stack of the thread throwing the exception, using kb:
0:000> kb
ChildEBP RetAddr  Args to Child              
002cfba0 722ae774 e0434f4d 00000001 00000001 KERNEL32!RaiseException+0x58
002cfc00 7230dc9c 01f91444 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x2a8
002cfc38 723e5a54 ffffffff 00126d28 cdf37c13 mscorwks!UnwindAndContinueRethrowHelperAfterCatch+0x70
002cfc84 72d76210 722c3d8f 002cfca0 72e57f16 mscorwks!_CorExeMain+0x1ac
002cfc90 72e57f16 00000000 72d70000 002cfcb4 mscoreei!_CorExeMain+0x38
002cfca0 72e54de3 00000000 7652d0e9 7ffda000 mscoree!ShellShim__CorExeMain+0x99
002cfca8 7652d0e9 7ffda000 002cfcf4 77af19bb mscoree!_CorExeMain_Exported+0x8
002cfcb4 77af19bb 7ffda000 72824509 00000000 KERNEL32!BaseThreadInitThunk+0xe
002cfcf4 77af198e 72e54ddb 7ffda000 00000000 ntdll!__RtlUserThreadStart+0x23
002cfd0c 00000000 72e54ddb 7ffda000 00000000 ntdll!_RtlUserThreadStart+0x1b
Here I can see that it is a .NET exception cause the application crash. This means that I can use the SOS command !pe, to look close at the exception. First I load SOS, using .load sos, and then the !pe command.
0:000> .load sos
0:000> !pe
Exception object: 01f91444
Exception type: System.IO.FileNotFoundException
Message: The file or assembly 'xxx.dll', Version=1.0.3672.38321, Culture=neutral, PublicKeyToken=cfb1f46daa2d0af2' or one of its dependencies could not be found. The file could not be found.
InnerException: 
StackTrace (generated):

StackTraceString: 
HResult: 80070002
Now the real error message appears, and the problem can be solved.
I'm pretty sure that the error message are supposed to be shown in the Windows Event Log, and that it has before, but for some reason this was not the case in the situation above.

Wednesday, February 3, 2010

Infobright ICE importer LINES TERMINATED BY patch

As a continuation of my previous post, Infobright also doesn't allow importing with a user defined line terminator.There are currently a bug report here.
This patch fixes this, please note that the patch from this post, http://kimsnetblog.blogspot.com/2010/02/infobright-exporter-lines-terminated-by.html, have to be applied first.

This patch allows you to use the Infobright importer while defining a line terminator, like this:
LOAD DATA INFILE '/tmp/outfile.csv' INTO TABLE ibtest LINES TERMINATED BY '\t'
By default the Infobright importer supports \r\n and \n as line terminators, and figures it out automatically, but if another line terminator are used, importing the file will fail.

The patch has to be applied in the Infobright source root.

Patch file:
importLinesTerminated.patch (4 KB)

UPDATE:
Patch file for ICE 3.3.1:
importLinesTerminatedBy331.patch (4 KB)

Infobright ICE exporter LINES TERMINATED BY patch

In all current versions, newest 3.3.0, of Infobright you cannot define LINES TERMINATED BY when exporting data to a file. Well, you can define it but it isn't used when the Infobright exporter are used.
The default line terminator are \r\n, which works great, but sometimes you want to use another line terminator, which are not supported in any of the current versions of Infobright.

There are currently two bug reports concerning this, http://bugs.infobright.org/ticket/1235 and http://bugs.infobright.org/ticket/1072.

This patch fixes this, it works in Infobright 3.3.0 and allows you to define LINES TERMINATED BY like this:
SELECT * FROM ibtest INTO OUTFILE '/tmp/outfile.csv' LINES TERMINATED BY '\n'
The patch has to be applied at the root of the Infobright source.

Patch file:
exporterLinesTerminated.patch (8 KB)

UPDATE:
Patch file for ICE 3.3.1:
exporterLinesTerminated331.patch (8 KB)

Monday, January 25, 2010

Reflection on dynamic variables in .NET 4.0

One of the new things in .NET 4.0, are the ability to create dynamic variables, http://msdn.microsoft.com/en-us/library/dd264741(VS.100).aspx.
How does reflection handle this? I decided to go test it.
I started by testing how it handles the dynamic nature of the variable, like this:
dynamic dyn = "string";
Console.WriteLine(dyn.GetType());
dyn = 1;
Console.WriteLine(dyn.GetType());
dyn = 1L;
Console.WriteLine(dyn.GetType());
And happily it turns out we get the correct type after each assignment, the output will look like this:
System.String
System.Int32
System.Int64
The next thing I tested were how dynamic method parameters were handled. I created a method like this:
public void Test(dynamic dyn) { }
What type would reflection return for the dyn parameter?
The following code gets type parameter type of the method, and prints it to the console:
MethodInfo mi = typeof(Program).GetMethod("Test");
ParameterInfo[] pi = mi.GetParameters();
foreach (ParameterInfo info in pi)
   Console.WriteLine(info.ParameterType);
The output will be:
System.Object
Which also are the only possible assumption which can be made about the parameter, as it can be any type deriving from Object.
So how do you see, using reflection, if a variable are declared dynamic? Well at the time of this writing, it does not look like it is possible. This is suspicion are further strengthened by the following paragraph taken from the MSDN article linked in the beginning of the post:
Type dynamic behaves like type object in most circumstances. However, operations that contain expressions of type dynamic are not resolved or type checked by the compiler. The compiler packages together information about the operation, and that information is later used to evaluate the operation at run time. As part of the process, variables of type dynamic are compiled into variables of type object. Therefore, type dynamic exists only at compile time, not at run time.
As all variables of type dynamic are compiled into object types, then reflection returns the correct result by returning Object, when the variable hasn't be assigned another value.

Default Infobright ICE NULL string patch

One thing about Infobright that annoys me a lot, are their exporters default way to indicate NULL values, which is nothing, NULL are indicated by no character. This poses a viewability problem when fields are terminated by for example tabs, then you cannot easily see the null values.

To get around this problem you can use the session variable BH_NULL, and set it to fx. the default MySQL way, \N. But this has to be done for every session/connection.

As I did not want to do this, I created a trivial patch which sets the default NULL string to \N, as in standard MySQL.

The patch can be downloaded below, and has to be applied in the root of the Infobright source files, and works of Infobright 3.3.0.

Patch file:
DefaultNullStr.patch (1 KB)

UPDATE: The patch also works for ICE 3.3.1

Sunday, January 24, 2010

Default Infobright ICE exporter patch

When you export files in the current Infobright version, 3.3.0 and earlier. There are some things you should be aware of. The default exporter are the standard MySQL exporter, but this exporter are not compatible with the default Infobright importer.

For example if you do this, you may run into problems, because of the differences in the exporter and importer:
SELECT * FROM infobrightTable INTO OUTFILE '/tmp/outfile.txt';
LOAD DATA INFILE '/tmp/outfile.txt' INTO TABLE infobrightTable;
Currently there is a bug report about this here. Unfortunately, you'll have to create a user to access the contents.

As described in the bug report, and in other places, you'll have to set the BH_DATAFORMAT session variable to txt_variable, to use the Infobright exporter by default, otherwise you'll may run into format compability problems.

The 'default' exporter will only be used if you are accessing Infobright tables, therefore in my opinion the default exporter should be the Infobright exporter, defined by BH_DATAFORMAT=txt_variable, but at the moment it isn't.

To work around this I've created a very simple patch, which does exactly this. What this patch does is that instead of using the default MySQL exporter, when the BH_DATAFORMAT variable aren't defined, it uses the Infobright exporter instead.

The patch file can be download at the bottom this post, and works for the current version 3.3.0. It has to applied to the file src/storage/brighthouse/core/RCEngine_tools.cpp directly.

Patch file for 3.3.0:
DefaultTxtVariable.patch (1 KB)

UPDATE:
Patch file for 3.3.1:
DefaultTxtVariable331.patch (1 KB)

MySQL and Infobright

At my work place we have used the Infobright Community Edition, http://www.infobright.org/, for a while now, and with great success.
For those who don't know what Infobright is, I recommend that you take a look at this page, What is ICE?, ICE stands for Infobright Community Edition, which is the open source version.
One of Infobrights great strengths are compression, which is top-notch. And for certain analytic queries it performs very, very good.
But there are certainly room for improvements, fx. as Infobright uses its own optimizer, not the default MySQL optimizer, certain queries aren't optimized very well yet, and some aren't supported at all. Unsupported queries will fall back to the standard MySQL optimizer, which doesn't work well with Infobright tables.
Also they are very slow to implement contributed patches, because of this, I've decided to post patches in this blog. Patches which are committed to they're bug tracker, but also patches which are developed specifically for use at my company.

Saturday, January 23, 2010

New MySQL .NET connection string parameter

From MySQL .NET Connector 6.0 a new connection string parameter has been added. The new parameter are called keepAlive, and are used to indicated server ping intervals in seconds.
This parameter are used to ensure database connection aren't dropped, fx. by firewalls due to inactivity. If the connection are dropped, the connection timeout are not obeyed.
A longer discussion and explanation can be found in this bug report, http://bugs.mysql.com/bug.php?id=40684. Currently the parameter aren't documented anywhere in the online .NET Connector documentation.