Why do we need joins?
The ability to join data tables together is a foundation of any analytics platform, KNIME included. You need to have some way to not only join two or more tables together, but to ensure the records are matched based on some key, perhaps on a specific item or customer.
This is where KNIME can come in handy. The join is incredibly simple. Load two data sources, then connect them via a join node. It really is that simple. If you need to join another data source…well, then just use another join node.
Excel on the other hand, has serious issues with joins. There are two common ways to do a join in Excel, Vlookup and Index-Match are both popular. However, both are error prone, not intuitive to read or re-create what someone else wrote, and extremely slow with large amounts of data. 20,000 records will take forever to run, and 200,000 or more may crash your computer entirely.
The four types of joins
There are four primary types of joins you will see in analysis. While some people claim there are upwards of twenty, I believe that all of the others are simply added conditions on these four.
Inner joins only give you data when there is a match.
The left join gives you all of your first table and the corresponding info from the second table when it matches something in the first.
The right join gives you all of the second table and corresponding info from the first when it matches something in the second.
The outer join, or full join gives you everything, matched where appropriate. It sounds simple, and it is, but you need to master this.
Anyone can join two tables, but what do you do when you need to join 3? Or what about 5? It is the same process, just with a lot more to keep track of. That is why you need to understand exactly which type of join is appropriate.
Performing joins in KNIME
This video is a tutorial on how to perform a join in KNIME.
The files for this exercise are:
I hope you enjoy and get started on performing joins of your own!