Several times in my daily work within Salesforce environments I face the requirement of integrate a batch job with Salesforce. For this purpose there are many ETL such us, Informatica, Oracle Data Integrator, Talend,… But with this we cover half of the equation. Where should I deploy my batch process? Informatica cloud has in own infrastructure but sometimes is not easy (or cheap) to have this infrastructure. So in this post we are going to deploy our process in a cloud Paas, Heroku in our case, and we will use Talend Open Studio for Data Integration to build the process because it is free. So we will have quicky our process ready to run and for very few money ;)
Heroku is a PaaS which is the acronym of Platform as a Service and enables you to run your own applications. In this post we are going to use the Java build pack due to the hability of Talend for export the batch processes as a jar file. But many other languages cound be used in Heroku. Please refer to this link if you would like to know more about Heroku
The fundamental part of Heroku are the Dynos which are system processes ables to run webservers and batch processes. In our example we are going to create a
worker dyno to run the Talend process and a
web dyno to hold a REST web service that when is invoked will launch the Talend job, in other words, it will call to the worker dyno.
As I have mentioned before we are going to use the java
buildpack, this buildpack relies on Maven to deploy into heroku the dynos. So we need to be familiarized with maven and java concept before we start. The Maven main file is the
pom.xml where we will define several things like dependencies, repositories, build cycle,… For more information please refer to the Maven documentation
Coding in Heroku
Before we start to coding our application we need some dependencies. The piece we are going to use to communicate the dynos among them is the
rabbitmq queue where the web dyno will publish messages and the worker dyno will be suscribed to starting launch the batch when a message arrives. Other of the depencencies that we will use in this post is and embed
Tomcat to serve the
Jersey web services. In order to add this dependencies into our project we will need to add them to the pom.xml file.
The full pom.xml is in the GitHub repository.
The web service is very simple an it will exponse a simple method that will publish a message in the rabbitMQ queue. Here is the code that will publish in the rabbitMQ queue.
Finally we will need to consume the published messages in the queue by the worker and when this will happend the Talend job will be launched. Here is the code snipplet in charge of the consumption part in the worker.
The glue that put all pieces together is the
Procfile file where we will define the workers and its type. So before we need two runnables processes one for the web dyno and other for worker dyno. We can configure in the
pom.xml to generate executable files from a java class.
So finally our
Procfile looks like this:
Talend Open Studio for Data Integration is an ETL (Extract, Transform and Load) and has a free version. In this tool we can create batch processes to perform load of data in Salesforce. In this post we are not going to go very deep into the Talend process. Is enaught to know that it takes a CSV and loads the data of the CSV into Salesforce. An interesting feature of Talend is that in can be exported as a runnable jar. So this jar will be imported into the Heroku project to do the job when will be requested.
A Talend Job look like this:
Maven has a central repository where you can find and reference the most common jars, as for instance the RabbitMQ that we have used before, but as you can understand our personal jars won’t be found in this central repository. So for upload the Jar with the talend job, and other jars that the talend uses as reference, we can create a locar repo. Again we must define this local repo in the
pom.xml with this snipplet of code
And then we can reference the dependency in the local repository
The talend job that we have build for this post is very simple, it only will read from CSV file that contains one single Account and using the Salesforce Connector of Talend it will upload into a Salesforce org. In order that this job works it needs three environment variables. This variables are the
password and path that will include the csv file, the name of this last variable is
files_path. To export the variables into our heroku project we need to run this command from the heroku CLI:
Early in this post we comment that for the communication between dynos we will use RabbitMQ. RabbitMQ has an addon for Heroku so before we can run our example we have to add this
Addon. Here is the code that we need to launch with the Heroku CLI:
For more info about the RabbitMQ addon you can visit this page CloudAMQP.
Now the last step before we can run the example is to start the dyno with the worker. In Heroku the web dynos are automatically scaled to 1 dyno, but for the workers we need to do it manually. This is as simple as run this command.
And finally we can call the REST service that will start the Talend process. To do this just open your favourite browse and go to this URL
You can see a video running this example here at YouTube
All the source code of this post is in this GitHub repository
Share onTwitter Facebook Google+ LinkedIn
Leave a Comment
Your email address will not be published. Required fields are marked *