KNIME: the four types of joins explained

  •  
  •  
  •  
  •  
  •  
  •  

There are four types of joins you can do in KNIME; they are the inner join, left outer join, right outer join and full outer join.  I’ll go over what they mean and follow up with an example for each.

 

There are some places that claim there are over twenty types of joins.  In my view, those extras are simply variations on these four with some added conditions.

 

The 4 types of joins

 

The inner join is what most programs default to when you select a join, and is sometimes referred to as just a join.  This selects only the records that match in both A and B.

 

The left outer join, or sometimes just left join, selects all of the records of the left, or first database in the join AND the records in the right, or second database in the join, for which there is a match on your key.

 

The right outer join, or just right join, selects all of the records of the right, or second database in the join AND the records in the left, or first database in the join for which there is a match on your key.

 

The full outer join, or just outer join, selects all of the records of the left and right databases and matches when possible.

 

Examples of each

 

We have two tables, Table A is customer demographics info, and Table B is sales on a given day.  The key that we will match on is CustomerID.

 

Inner Join

The Inner join gives us all the demographic info on customers who made a purchase that day when we had demographic info for them.  If someone in our database did not purchase anything, or the sale did not have demographic info, we get nothing.

Notice how there are repeat entries for customer 2.  They bought two items, so they will get two rows.  These repeated rows can be the bane of new analysts or people who forgot to consider multiple entries.

 

Left Join

Left outer join gives us all of the customer demographic info, plus the info on sales where we had the customer demographics.  Any sales where we did not have demographic info would be excluded.  This might be useful for evaluating why some customers did, or did not buy.

 

Right join

Right outer join gives us all of the sales info, plus the info on customer demographics where they had sales.  Any customers who did not buy anything would be excluded.  This might be useful to identify which items might be bringing in new customers that we do not have info for.

 

Outer Join

Full outer joins give you all of the info for both tables, matched where possible.

 

This sounds simple enough, and it is, but wait until you have to join more than two tables together.  That is the sort of thing where a visual workflow really comes in handy.

Leave a Reply

Your email address will not be published. Required fields are marked *