Nithril's blog2018-03-04T18:02:37+00:00http://nithril.github.ioNicolas LabrotElasticsearch Source size impact2018-02-15T10:06:01+00:00http://nithril.github.io/benchmark/2018/02/15/elasticsearch-sources<blockquote>
<p>The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.</p>
</blockquote>
<p><code class="highlighter-rouge">_source</code> is parsed and loaded in memory during indexation and most of the time during the request.
In order to construct the result list, the coordinator node fetches the documents, they are transferred on the network, loaded in memory, parsed, aggregated, then returned.</p>
<p>This quicky will only focus on the impact of the <code class="highlighter-rouge">_source</code> size on the performance, not on the memory and the network pressure.</p>
<!--more-->
<h1 id="protocol">Protocol</h1>
<p>For the purpose of this quicky I will not benchmark ES but I will microbenchmark the method used by ES to parse the JSON (<code class="highlighter-rouge">org.elasticsearch.search.lookup.SourceLookup#sourceAsMapAndType</code>)
then I will compare the results to Jackson and then to msgpack.</p>
<p>The input dataset is a JSON file of 10MB composed of 2 fields: <code class="highlighter-rouge">isbn</code> and <code class="highlighter-rouge">content</code>, <code class="highlighter-rouge">content</code> taking 99.99% of the size.</p>
<p>You may ask why such an unbalanced json: it was the performance issue I had to analyze ;).</p>
<p>The code is fairly simple:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
</pre></td><td class="code"><pre><span class="nd">@State</span><span class="o">(</span><span class="n">Scope</span><span class="o">.</span><span class="na">Thread</span><span class="o">)</span>
<span class="nd">@BenchmarkMode</span><span class="o">(</span><span class="n">Mode</span><span class="o">.</span><span class="na">AverageTime</span><span class="o">)</span>
<span class="nd">@OutputTimeUnit</span><span class="o">(</span><span class="n">TimeUnit</span><span class="o">.</span><span class="na">MILLISECONDS</span><span class="o">)</span>
<span class="nd">@Warmup</span><span class="o">(</span><span class="n">iterations</span> <span class="o">=</span> <span class="mi">5</span><span class="o">)</span>
<span class="nd">@Measurement</span><span class="o">(</span><span class="n">iterations</span> <span class="o">=</span> <span class="mi">5</span><span class="o">)</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">ParseTest</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">BytesArray</span> <span class="n">jsonBytesArray</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">ObjectMapper</span> <span class="n">jsonMapper</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">ObjectMapper</span> <span class="n">msgPackMapper</span><span class="o">;</span>
<span class="kd">private</span> <span class="kt">byte</span><span class="o">[]</span> <span class="n">msgPackByte</span><span class="o">;</span>
<span class="kd">private</span> <span class="kt">byte</span><span class="o">[]</span> <span class="n">jsonBytes</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">RunnerException</span> <span class="o">{</span>
<span class="n">Options</span> <span class="n">opt</span> <span class="o">=</span> <span class="k">new</span> <span class="n">OptionsBuilder</span><span class="o">()</span>
<span class="o">.</span><span class="na">include</span><span class="o">(</span><span class="n">ParseTest</span><span class="o">.</span><span class="na">class</span><span class="o">.</span><span class="na">getSimpleName</span><span class="o">())</span>
<span class="o">.</span><span class="na">forks</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="k">new</span> <span class="nf">Runner</span><span class="o">(</span><span class="n">opt</span><span class="o">).</span><span class="na">run</span><span class="o">();</span>
<span class="o">}</span>
<span class="nd">@Setup</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">prepare</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="n">jsonMapper</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ObjectMapper</span><span class="o">();</span>
<span class="n">msgPackMapper</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ObjectMapper</span><span class="o">(</span><span class="k">new</span> <span class="n">MessagePackFactory</span><span class="o">());</span>
<span class="n">jsonBytes</span> <span class="o">=</span> <span class="n">Resources</span><span class="o">.</span><span class="na">toByteArray</span><span class="o">(</span><span class="n">getClass</span><span class="o">().</span><span class="na">getResource</span><span class="o">(</span><span class="s">"/foo.txt"</span><span class="o">));</span>
<span class="n">jsonBytesArray</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BytesArray</span><span class="o">(</span><span class="n">jsonBytes</span><span class="o">);</span>
<span class="n">msgPackByte</span> <span class="o">=</span> <span class="n">msgPackMapper</span><span class="o">.</span><span class="na">writeValueAsBytes</span><span class="o">(</span><span class="n">jsonMapper</span><span class="o">.</span><span class="na">readValue</span><span class="o">(</span><span class="n">jsonBytes</span><span class="o">,</span> <span class="n">Map</span><span class="o">.</span><span class="na">class</span><span class="o">));</span>
<span class="o">}</span>
<span class="nd">@Benchmark</span>
<span class="kd">public</span> <span class="n">Tuple</span><span class="o"><</span><span class="n">XContentType</span><span class="o">,</span> <span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">>></span> <span class="nf">elasticBench</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">SourceLookup</span><span class="o">.</span><span class="na">sourceAsMapAndType</span><span class="o">(</span><span class="n">jsonBytesArray</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Benchmark</span>
<span class="kd">public</span> <span class="n">Map</span> <span class="nf">jacksonBench</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">jsonMapper</span><span class="o">.</span><span class="na">readValue</span><span class="o">(</span><span class="n">jsonBytesArray</span><span class="o">.</span><span class="na">array</span><span class="o">(),</span> <span class="n">Map</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Benchmark</span>
<span class="kd">public</span> <span class="n">Map</span> <span class="nf">msgpackBench</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">msgPackMapper</span><span class="o">.</span><span class="na">readValue</span><span class="o">(</span><span class="n">msgPackByte</span><span class="o">,</span> <span class="n">Map</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h1 id="results">Results</h1>
<p>Ran on a core i5.</p>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">ms/op</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">elasticBench</td>
<td style="text-align: left">27.826 ± 1.002 ms/op</td>
</tr>
<tr>
<td style="text-align: left">jacksonBench</td>
<td style="text-align: left">27.423 ± 3.140 ms/op</td>
</tr>
<tr>
<td style="text-align: left">msgpackBench</td>
<td style="text-align: left">11.442 ± 2.181 ms/op</td>
</tr>
</tbody>
</table>
<p><code class="highlighter-rouge">elasticBench</code> and <code class="highlighter-rouge">jacksonBench</code> results are side by side. It is not a surprise as ES is using internally Jackson.
<code class="highlighter-rouge">msgpackBench</code> is 2.4 times more efficient than <code class="highlighter-rouge">elasticBench</code>. Most of the <code class="highlighter-rouge">elasticBench</code>
deserialization (and so jackson) time is taken by the UTF8 decoder whereas <code class="highlighter-rouge">msgpack</code> format is more efficient.</p>
<p>The interesting part is the throughput and so the impact of the <code class="highlighter-rouge">_source</code> size on the search timing.</p>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">elasticBench</td>
<td style="text-align: left">359 MB/s</td>
</tr>
<tr>
<td style="text-align: left">msgpackBench</td>
<td style="text-align: left">873 MB/s</td>
</tr>
</tbody>
</table>
<p>With a 10MB _source, a search request returning 20 documents will take at least 556ms, and only for the response
building. Without doubt the performance issue I had to analyze was caused by the <code class="highlighter-rouge">_source</code> size.</p>
<p><code class="highlighter-rouge">_source</code> handling may not be neutral and before ingesting large field, you should consider the impact on the performance by benchmarking.
on real use case scenarii.</p>
<p>The purpose is to avoid the case where the <code class="highlighter-rouge">_source</code> handling requires a significant percentage of the total ES processing time.
And even if large document are not so common in an index, the impact will be visible on the percentile.</p>
<p>This quicky does not address the memory and the network pressure:</p>
<ul>
<li>_source is stored in a Lucene stored field (compressed): impact on the mapped file</li>
<li>20 documents of 10MB will take at least 200MB (and even 400MB with UTF-16): loaded in memory then transferred through the network</li>
</ul>
<p>Note that by using a more efficient storage format, ie. msgpack, ES may improve the efficiency 2.4 times. A technically low hanging fruit, but
potentially high because of the migration.</p>
<p>There are some workarounds:</p>
<ul>
<li>Exclude <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude">the large field from the _source</a> with the associated downside.</li>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-store.html">Store</a> the fields and never returns the <code class="highlighter-rouge">_source</code></li>
</ul>
<p>And some open issues to address the point:</p>
<ul>
<li><a href="https://github.com/elastic/elasticsearch/issues/9034">Better storage of <code class="highlighter-rouge">_source</code></a></li>
<li><a href="https://github.com/elastic/elasticsearch/issues/25168">Memory efficient source filtering</a></li>
</ul>
Analysis Takipi Benchmark 'How Misusing Streams Can Make Your Code 5 Times Slower'2016-01-15T10:06:01+00:00http://nithril.github.io/benchmark/2016/01/15/analysis-benchmark-how-misusing-java-8-lambdas-and-streams-can-make-your-code-5-times-slower<p>The Takipi benchmark <a href="http://blog.takipi.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower/">‘How Misusing Streams Can Make Your Code 5 Times Slower’</a> contains interesting
but unexplained results:</p>
<ul>
<li>The first one is the autoboxing/unboxing issue as stated by <a href="http://blog.takipi.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower/#comment-2377268130">Sergey Kuksenko comment</a>.</li>
<li>The second one is the difference between the “lambda” and “stream” benchmark: the first one is 5 times slower than the last one whereas the code is quite similar.</li>
</ul>
<!--more-->
<h1 id="first-the-results-on-my-environment">First the results on my environment</h1>
<h2 id="the-original-code">The original code</h2>
<p>The code is available on <a href="https://github.com/takipi/loops-jmh-playground/">Takipi github</a>. I have taken the code from both the <code class="highlighter-rouge">fixes</code> branch (optimized version)
and the <code class="highlighter-rouge">master</code> branch (with the boxing issue). The modified code of this article is available <a href="https://github.com/nithril/loops-jmh-playground/tree/fixes">on github</a></p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre><span class="kd">public</span> <span class="kt">int</span> <span class="nf">streamBoxingMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="nl">Integer:</span><span class="o">:</span><span class="n">max</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">lambdaBoxingMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-></span> <span class="n">Integer</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">));</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">streamMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">mapToInt</span><span class="o">(</span><span class="nl">Integer:</span><span class="o">:</span><span class="n">intValue</span><span class="o">).</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="nl">Integer:</span><span class="o">:</span><span class="n">max</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">lambdaMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">mapToInt</span><span class="o">(</span><span class="nl">Integer:</span><span class="o">:</span><span class="n">intValue</span><span class="o">).</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-></span> <span class="n">Integer</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">));</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="the-results">The results</h2>
<p>Core i5, 3Ghz, JDK 8u66</p>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">ms/op</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">forMax2Integer</td>
<td style="text-align: left">0.094 ± 0.002</td>
</tr>
<tr>
<td style="text-align: left">lambdaBoxingMaxInteger</td>
<td style="text-align: left">0.500 ± 0.018</td>
</tr>
<tr>
<td style="text-align: left">lambdaMaxInteger</td>
<td style="text-align: left">0.494 ± 0.253</td>
</tr>
<tr>
<td style="text-align: left">streamBoxingMaxInteger</td>
<td style="text-align: left">0.503 ± 0.005</td>
</tr>
<tr>
<td style="text-align: left">streamMaxInteger</td>
<td style="text-align: left">0.107 ± 0.001</td>
</tr>
</tbody>
</table>
<h1 id="autoboxing--unboxing">Autoboxing / Unboxing</h1>
<p><code class="highlighter-rouge">static int Integer#max(int a, int b)</code> takes <code class="highlighter-rouge">int</code> primitive arguments and return a <code class="highlighter-rouge">int</code> primitive.
On the other side, the stream consumes a list of Integer. Thus the result of <code class="highlighter-rouge">Integer#max</code> is boxed from an <code class="highlighter-rouge">int</code> to an <code class="highlighter-rouge">Integer</code>, ie. an implicit
<code class="highlighter-rouge">Integer#valueOf</code> is inserted by the compiler.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre> <span class="n">LINENUMBER</span> <span class="mi">191</span> <span class="n">L0</span>
<span class="n">ALOAD</span> <span class="mi">0</span>
<span class="n">INVOKEVIRTUAL</span> <span class="n">java</span><span class="o">/</span><span class="n">lang</span><span class="o">/</span><span class="n">Integer</span><span class="o">.</span><span class="na">intValue</span> <span class="o">()</span><span class="n">I</span>
<span class="n">ALOAD</span> <span class="mi">1</span>
<span class="n">INVOKEVIRTUAL</span> <span class="n">java</span><span class="o">/</span><span class="n">lang</span><span class="o">/</span><span class="n">Integer</span><span class="o">.</span><span class="na">intValue</span> <span class="o">()</span><span class="n">I</span>
<span class="n">INVOKESTATIC</span> <span class="n">java</span><span class="o">/</span><span class="n">lang</span><span class="o">/</span><span class="n">Integer</span><span class="o">.</span><span class="na">max</span> <span class="o">(</span><span class="n">II</span><span class="o">)</span><span class="n">I</span>
<span class="n">INVOKESTATIC</span> <span class="n">java</span><span class="o">/</span><span class="n">lang</span><span class="o">/</span><span class="n">Integer</span><span class="o">.</span><span class="na">valueOf</span> <span class="o">(</span><span class="n">I</span><span class="o">)</span><span class="n">Ljava</span><span class="o">/</span><span class="n">lang</span><span class="o">/</span><span class="n">Integer</span><span class="o">;</span></pre></td></tr></tbody></table></code></pre></figure>
<p>As suggested by <a href="http://blog.takipi.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower/#comment-2377268130">Sergey Kuksenko comment</a>, to get rid of the
autoboxing, the max function must operate on <code class="highlighter-rouge">Integer</code>.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="kd">private</span> <span class="kd">static</span> <span class="n">Integer</span> <span class="nf">max</span><span class="o">(</span><span class="n">Integer</span> <span class="n">a</span> <span class="o">,</span> <span class="n">Integer</span> <span class="n">b</span><span class="o">){</span>
<span class="k">return</span> <span class="o">(</span><span class="n">a</span> <span class="o">>=</span> <span class="n">b</span><span class="o">)</span> <span class="o">?</span> <span class="n">a</span> <span class="o">:</span> <span class="n">b</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">streamBoxingMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="nl">LoopBenchmarkMain:</span><span class="o">:</span><span class="n">max</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">lambdaBoxingMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-></span> <span class="n">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">));</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="results">Results</h2>
<p>Results are far better as soon as you are aware of the autoboxing issue:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">Before ms/op</th>
<th style="text-align: left">After ms/op</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">lambdaBoxingMaxInteger</td>
<td style="text-align: left">0.500 ± 0.018</td>
<td style="text-align: left">0.266 ± 0.268</td>
</tr>
<tr>
<td style="text-align: left">streamBoxingMaxInteger</td>
<td style="text-align: left">0.503 ± 0.005</td>
<td style="text-align: left">0.085 ± 0.013</td>
</tr>
</tbody>
</table>
<p>When working with boxed primitive and to avoid this issue, it may be better to use an <code class="highlighter-rouge">IntStream</code> using for example the <code class="highlighter-rouge">mapToInt</code> transformation.</p>
<blockquote>
<p>IntStream Stream#mapToInt(ToIntFunction<? super T> mapper)
Returns an IntStream consisting of the results of applying the given function to the elements of this stream.</p>
</blockquote>
<h1 id="difference-between-the-lambda-and-stream-benchmark">Difference between the “lambda” and “stream” benchmark</h1>
<h2 id="analysis">Analysis</h2>
<p>These two codes are very similar once compiled:</p>
<ul>
<li>streamMaxInteger</li>
</ul>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">mapToInt</span><span class="o">(</span><span class="nl">Integer:</span><span class="o">:</span><span class="n">intValue</span><span class="o">).</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="nl">Integer:</span><span class="o">:</span><span class="n">max</span><span class="o">);</span></pre></td></tr></tbody></table></code></pre></figure>
<ul>
<li>lambdaMaxInteger</li>
</ul>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre> <span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">mapToInt</span><span class="o">(</span><span class="nl">Integer:</span><span class="o">:</span><span class="n">intValue</span><span class="o">).</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-></span> <span class="n">Integer</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">));</span></pre></td></tr></tbody></table></code></pre></figure>
<p>The difference is in the <code class="highlighter-rouge">reduce</code> function:</p>
<ul>
<li><code class="highlighter-rouge">streamMaxInteger</code> contains a reference to a static method. The compiled code use an <code class="highlighter-rouge">INVOKESTATIC</code> to the <code class="highlighter-rouge">Integer#max</code> method.</li>
<li><code class="highlighter-rouge">lambdaMaxInteger</code> uses a lambda. Once compiled this lambda is converted to a static method similar to:</li>
</ul>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="kd">private</span> <span class="kd">static</span> <span class="n">Integer</span> <span class="n">lambda$lambdaMaxInteger</span><span class="err">$</span><span class="mi">0</span><span class="o">(</span><span class="n">Integer</span> <span class="n">a</span> <span class="o">,</span> <span class="n">Integer</span> <span class="n">b</span><span class="o">){</span>
<span class="k">return</span> <span class="n">Integer</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span><span class="n">b</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">integers</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">mapToInt</span><span class="o">(</span><span class="nl">Integer:</span><span class="o">:</span><span class="n">intValue</span><span class="o">).</span><span class="na">reduce</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">,</span> <span class="nl">LoopBenchmarkMain:</span><span class="o">:</span><span class="n">lambda$lambdaMaxInteger</span><span class="err">$</span><span class="mi">0</span><span class="o">);</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Once compiled both use <code class="highlighter-rouge">INVOKESTATIC</code> but <code class="highlighter-rouge">lambdaMaxInteger</code> has a call depth deeper by 1 than <code class="highlighter-rouge">streamMaxInteger</code>. The performance difference
could be explained if the call to <code class="highlighter-rouge">lambda$lambdaMaxInteger$0</code> is not inlined</p>
<p>In order to analyse the JVM inlining, the bench is launched using <a href="http://stas-blogspot.blogspot.fr/2011/07/most-complete-list-of-xx-options-for.html#UnlockDiagnosticVMOptions"><code class="highlighter-rouge">-XX:+UnlockDiagnosticVMOptions</code></a>
and <a href="http://stas-blogspot.blogspot.fr/2011/07/most-complete-list-of-xx-options-for.html#PrintInlining"><code class="highlighter-rouge">-XX:+PrintInlining</code></a> arguments.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="err">@</span> <span class="mi">32</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">ArrayList</span><span class="n">$ArrayListSpliterator</span><span class="o">::</span><span class="n">forEachRemaining</span> <span class="o">(</span><span class="mi">129</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">@</span> <span class="mi">51</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">ArrayList</span><span class="o">::</span><span class="n">access</span><span class="err">$</span><span class="mi">100</span> <span class="o">(</span><span class="mi">5</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">accessor</span>
<span class="err">@</span> <span class="mi">99</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">stream</span><span class="o">.</span><span class="na">ReferencePipeline</span><span class="err">$</span><span class="mi">4</span><span class="err">$</span><span class="mi">1</span><span class="o">::</span><span class="n">accept</span> <span class="o">(</span><span class="mi">23</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">@</span> <span class="mi">12</span> <span class="n">com</span><span class="o">.</span><span class="na">takipi</span><span class="o">.</span><span class="na">oss</span><span class="o">.</span><span class="na">benchmarks</span><span class="o">.</span><span class="na">jmh</span><span class="o">.</span><span class="na">loops</span><span class="o">.</span><span class="na">OptimizedLoopBenchmarkMain</span><span class="err">$</span><span class="n">$Lambda</span><span class="err">$</span><span class="mi">1</span><span class="o">/</span><span class="mi">1699549388</span><span class="o">::</span><span class="n">applyAsInt</span> <span class="o">(</span><span class="mi">8</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">\</span><span class="o">-></span> <span class="n">TypeProfile</span> <span class="o">(</span><span class="mi">12880</span><span class="o">/</span><span class="mi">12880</span> <span class="n">counts</span><span class="o">)</span> <span class="o">=</span> <span class="n">com</span><span class="o">/</span><span class="n">takipi</span><span class="o">/</span><span class="n">oss</span><span class="o">/</span><span class="n">benchmarks</span><span class="o">/</span><span class="n">jmh</span><span class="o">/</span><span class="n">loops</span><span class="o">/</span><span class="n">OptimizedLoopBenchmarkMain</span><span class="err">$</span><span class="n">$Lambda</span><span class="err">$</span><span class="mi">1</span>
<span class="err">@</span> <span class="mi">4</span> <span class="n">java</span><span class="o">.</span><span class="na">lang</span><span class="o">.</span><span class="na">Integer</span><span class="o">::</span><span class="n">intValue</span> <span class="o">(</span><span class="mi">5</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">accessor</span>
<span class="err">@</span> <span class="mi">17</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">stream</span><span class="o">.</span><span class="na">ReduceOps</span><span class="err">$</span><span class="mi">5</span><span class="nl">ReducingSink:</span><span class="o">:</span><span class="n">accept</span> <span class="o">(</span><span class="mi">19</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">\</span><span class="o">-></span> <span class="n">TypeProfile</span> <span class="o">(</span><span class="mi">12880</span><span class="o">/</span><span class="mi">12880</span> <span class="n">counts</span><span class="o">)</span> <span class="o">=</span> <span class="n">java</span><span class="o">/</span><span class="n">util</span><span class="o">/</span><span class="n">stream</span><span class="o">/</span><span class="n">ReduceOps</span><span class="err">$</span><span class="mi">5</span><span class="n">ReducingSink</span>
<span class="err">@</span> <span class="mi">10</span> <span class="n">com</span><span class="o">.</span><span class="na">takipi</span><span class="o">.</span><span class="na">oss</span><span class="o">.</span><span class="na">benchmarks</span><span class="o">.</span><span class="na">jmh</span><span class="o">.</span><span class="na">loops</span><span class="o">.</span><span class="na">OptimizedLoopBenchmarkMain</span><span class="err">$</span><span class="n">$Lambda</span><span class="err">$</span><span class="mi">2</span><span class="o">/</span><span class="mi">813872125</span><span class="o">::</span><span class="n">applyAsInt</span> <span class="o">(</span><span class="mi">6</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">\</span><span class="o">-></span> <span class="n">TypeProfile</span> <span class="o">(</span><span class="mi">13602</span><span class="o">/</span><span class="mi">13602</span> <span class="n">counts</span><span class="o">)</span> <span class="o">=</span> <span class="n">com</span><span class="o">/</span><span class="n">takipi</span><span class="o">/</span><span class="n">oss</span><span class="o">/</span><span class="n">benchmarks</span><span class="o">/</span><span class="n">jmh</span><span class="o">/</span><span class="n">loops</span><span class="o">/</span><span class="n">OptimizedLoopBenchmarkMain</span><span class="err">$</span><span class="n">$Lambda</span><span class="err">$</span><span class="mi">2</span>
<span class="err">@</span> <span class="mi">2</span> <span class="n">com</span><span class="o">.</span><span class="na">takipi</span><span class="o">.</span><span class="na">oss</span><span class="o">.</span><span class="na">benchmarks</span><span class="o">.</span><span class="na">jmh</span><span class="o">.</span><span class="na">loops</span><span class="o">.</span><span class="na">OptimizedLoopBenchmarkMain</span><span class="o">::</span><span class="n">lambda$lambdaMaxInteger</span><span class="err">$</span><span class="mi">0</span> <span class="o">(</span><span class="mi">6</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inline</span> <span class="o">(</span><span class="n">hot</span><span class="o">)</span>
<span class="err">@</span> <span class="mi">2</span> <span class="n">java</span><span class="o">.</span><span class="na">lang</span><span class="o">.</span><span class="na">Integer</span><span class="o">::</span><span class="n">max</span> <span class="o">(</span><span class="mi">6</span> <span class="n">bytes</span><span class="o">)</span> <span class="n">inlining</span> <span class="n">too</span> <span class="n">deep</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Gotcha, <strong><code class="highlighter-rouge">@ 2 java.lang.Integer::max (6 bytes) inlining too deep</code></strong>, the issue and the performance difference is not correlated to lambda/stream misuse.
<a href="http://blog.takipi.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower/#comment-2379774095">Yan Bonnel is true</a>.</p>
<blockquote>
<p>For lambda, try change Integer.max by Math.max. I think you hit a limit of jit so it doesn’t inline code?</p>
</blockquote>
<h2 id="results-1">Results</h2>
<p>Let increase the max inlining level to 10 from 9 <code class="highlighter-rouge">-XX:MaxInlineLevel=10</code></p>
<p>Results are again far better:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">Before ms/op</th>
<th>After ms/op</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">lambdaMaxInteger</td>
<td style="text-align: left">0.494 ± 0.253</td>
<td>0.109 ± 0.025</td>
</tr>
<tr>
<td style="text-align: left">streamMaxInteger</td>
<td style="text-align: left">0.107 ± 0.001</td>
<td>0.107 ± 0.001</td>
</tr>
</tbody>
</table>
<h1 id="conclusion">Conclusion</h1>
<table>
<thead>
<tr>
<th style="text-align: left">Benchmark</th>
<th style="text-align: left">Before ms/op</th>
<th>After ms/op</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">forMax2Integer</td>
<td style="text-align: left">0.094 ± 0.002</td>
<td>0.094 ± 0.002</td>
</tr>
<tr>
<td style="text-align: left">lambdaBoxingMaxInteger</td>
<td style="text-align: left">0.500 ± 0.018</td>
<td><span style="color: green;"><strong>0.080 ± 0.001</strong></span></td>
</tr>
<tr>
<td style="text-align: left">lambdaMaxInteger</td>
<td style="text-align: left">0.494 ± 0.253</td>
<td><span style="color: green;"><strong>0.108 ± 0.003</strong></span></td>
</tr>
<tr>
<td style="text-align: left">streamBoxingMaxInteger</td>
<td style="text-align: left">0.503 ± 0.005</td>
<td><span style="color: green;"><strong>0.080 ± 0.001</strong></span></td>
</tr>
<tr>
<td style="text-align: left">streamMaxInteger</td>
<td style="text-align: left">0.107 ± 0.001</td>
<td>0.107 ± 0.001</td>
</tr>
</tbody>
</table>
<p>The title <code class="highlighter-rouge">How Misusing Streams</code> is in my PoV wrong. These issues are not related to lambda/stream but to some Java subtlety.
The autoboxing performance issue could have also occurred with a for statement:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre> <span class="kd">public</span> <span class="kt">int</span> <span class="nf">forMaxInteger</span><span class="o">()</span> <span class="o">{</span>
<span class="n">Integer</span> <span class="n">max</span> <span class="o">=</span> <span class="n">Integer</span><span class="o">.</span><span class="na">MIN_VALUE</span><span class="o">;</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">size</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="n">max</span> <span class="o">=</span> <span class="n">Integer</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">max</span><span class="o">,</span> <span class="n">integers</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">i</span><span class="o">));</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">max</span><span class="o">;</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<p>In my opinion a better title would be <code class="highlighter-rouge">Benchmark: How Misusing Autoboxing Can Make Your Code 5 Times Slower</code>.</p>
<p>Java is a subtle language and optimizing could be complex and benchmarking could be hard.
Java library does not implement for wrapper classes the counterpart methods which operate on primitives.
It relies on autoboxing and unboxing and the Java compiler and JIT could not be able to fully optimize the code.</p>
<p>Anyway for this use case (max of a list), stream and lambda are not slower than a <code class="highlighter-rouge">for</code> statement.
On the contrary they seem faster. I do not have yet analyse why, maybe in a part 2.</p>
Quicky: Query Benchmark Using jOOQ / Hibernate / JDBC2015-10-10T10:18:46+00:00http://nithril.github.io/jpa/2015/10/10/quicky-join-query-using-jooq-hibernate-jdbc<p>This quicky tests how <a href="http://jooq.org/">jOOQ</a>, <a href="http://hibernate.org/">Hibernate</a> and JDBC perform against each other on a simple query / scenario
involing Plain Old SQL, jOOQ, Hibernate Named Query and Spring Data JPA.</p>
<!--more-->
<h1 id="edit-20151017">EDIT 2015/10/17</h1>
<p>Yourkit profiling confirms my feeling about jOOQ results. jOOQ spent time creating the query, allocating new <code class="highlighter-rouge">Record</code> and processing the result set. The code is far far from straight.</p>
<p><img src="/assets/2015-10-10-quicky-join-query-using-jooq-hibernate-jdbc/profiling.png" alt="Profiling" /></p>
<h1 id="source">Source</h1>
<p>Sources are available <a href="https://github.com/nithril/sandbox-query-benchmark-jooq-hibernate-jdbc/tree/article-quicky-query-benchmark-jooq-hibernate-jdbc">here</a>.</p>
<h1 id="the-database">The database</h1>
<p>The database used is <code class="highlighter-rouge">H2 1.4.188</code>. The DB schema contains an <code class="highlighter-rouge">AUTHOR</code> table with a one to many relation to a <code class="highlighter-rouge">BOOK</code> table. For simplicity, an author has at least one book.</p>
<p>The query involves a left outer join on <code class="highlighter-rouge">BOOK</code> from <code class="highlighter-rouge">AUTHOR</code>.</p>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="k">SELECT</span> <span class="n">AUTHOR</span><span class="p">.</span><span class="o">*</span><span class="p">,</span> <span class="n">BOOK</span><span class="p">.</span><span class="o">*</span> <span class="k">FROM</span> <span class="n">AUTHOR</span> <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="n">BOOK</span> <span class="k">ON</span> <span class="n">AUTHOR</span><span class="p">.</span><span class="n">ID</span> <span class="o">=</span> <span class="n">BOOK</span><span class="p">.</span><span class="n">AUTHOR_ID</span></pre></td></tr></tbody></table></code></pre></figure>
<p>All query must returns a POJO containing the author associated to its books</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="kd">public</span> <span class="kd">class</span> <span class="nc">AuthorWithBooks</span> <span class="o">{</span>
<span class="kd">private</span> <span class="n">Author</span> <span class="n">author</span><span class="o">;</span>
<span class="kd">private</span> <span class="n">List</span><span class="o"><</span><span class="n">Book</span><span class="o">></span> <span class="n">books</span><span class="o">;</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<p>The DB is fed with 100 authors with a mean of 5 books per author.</p>
<h1 id="jdbc--jooq">JDBC / jOOQ</h1>
<h2 id="plain-old-jdbc">Plain Old JDBC</h2>
<p>The mapping is done by hand without <code class="highlighter-rouge">Stream</code>:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre><span class="nd">@Transactional</span><span class="o">(</span><span class="n">readOnly</span> <span class="o">=</span> <span class="kc">true</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">Collection</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">findAuthorsWithBooksJdbc</span><span class="o">()</span> <span class="o">{</span>
<span class="n">Map</span><span class="o"><</span><span class="n">Long</span><span class="o">,</span> <span class="n">AuthorWithBooks</span><span class="o">></span> <span class="n">booksMap</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><>();</span>
<span class="n">jdbcTemplate</span><span class="o">.</span><span class="na">query</span><span class="o">(</span><span class="s">"SELECT AUTHOR.*, BOOK.* FROM AUTHOR LEFT OUTER JOIN BOOK ON AUTHOR.ID = BOOK.AUTHOR_ID"</span><span class="o">,</span> <span class="n">r</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">Long</span> <span class="n">authorId</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">"AUTHOR.ID"</span><span class="o">);</span>
<span class="n">AuthorWithBooks</span> <span class="n">authorWithBooks</span> <span class="o">=</span> <span class="n">booksMap</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">authorId</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">authorWithBooks</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">authorWithBooks</span> <span class="o">=</span> <span class="k">new</span> <span class="n">AuthorWithBooks</span><span class="o">();</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">setAuthor</span><span class="o">(</span><span class="k">new</span> <span class="n">Author</span><span class="o">(</span><span class="n">authorId</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="s">"AUTHOR.NAME"</span><span class="o">)));</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">setBooks</span><span class="o">(</span><span class="k">new</span> <span class="n">ArrayList</span><span class="o"><>());</span>
<span class="n">booksMap</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">authorId</span><span class="o">,</span> <span class="n">authorWithBooks</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">Book</span> <span class="n">book</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Book</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">getLong</span><span class="o">(</span><span class="s">"BOOK.ID"</span><span class="o">),</span> <span class="n">r</span><span class="o">.</span><span class="na">getString</span><span class="o">(</span><span class="s">"BOOK.TITLE"</span><span class="o">),</span> <span class="n">authorId</span><span class="o">);</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">getBooks</span><span class="o">().</span><span class="na">add</span><span class="o">(</span><span class="n">book</span><span class="o">);</span>
<span class="o">});</span>
<span class="k">return</span> <span class="n">booksMap</span><span class="o">.</span><span class="na">values</span><span class="o">();</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="jooq">jOOQ</h2>
<h3 id="jooq-into-group">jOOQ Into Group</h3>
<p>jOOQ <code class="highlighter-rouge">intoGroups</code> function return a Map with the result grouped by the given key table (here Author).
The returned map contains instances of <a href="http://www.jOOQ.org/javadoc/3.7.x/org/jOOQ/Record.html">Record</a>,
a database result row which is not a pojo but an array of object wrapped into an adapter class. <code class="highlighter-rouge">Record</code> instance are converted to POJO
using the jOOQ <a href="http://www.jOOQ.org/javadoc/3.7.x/index.html?org/jOOQ/RecordMapper.html">RecordMapper</a>.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="nd">@Transactional</span><span class="o">(</span><span class="n">readOnly</span> <span class="o">=</span> <span class="kc">true</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">Collection</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">findAuthorsWithBooksjOOQIntoGroup</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">dslContext</span><span class="o">.</span><span class="na">select</span><span class="o">()</span>
<span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">AUTHOR</span><span class="o">.</span><span class="na">leftOuterJoin</span><span class="o">(</span><span class="n">BOOK</span><span class="o">).</span><span class="na">on</span><span class="o">(</span><span class="n">BOOK</span><span class="o">.</span><span class="na">AUTHOR_ID</span><span class="o">.</span><span class="na">equal</span><span class="o">(</span><span class="n">AUTHOR</span><span class="o">.</span><span class="na">ID</span><span class="o">)))</span>
<span class="o">.</span><span class="na">fetch</span><span class="o">().</span><span class="na">intoGroups</span><span class="o">(</span><span class="n">TAuthor</span><span class="o">.</span><span class="na">AUTHOR</span><span class="o">)</span>
<span class="o">.</span><span class="na">entrySet</span><span class="o">()</span>
<span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">e</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">Author</span> <span class="n">author</span> <span class="o">=</span> <span class="n">authorRepository</span><span class="o">.</span><span class="na">mapper</span><span class="o">().</span><span class="na">map</span><span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getKey</span><span class="o">());</span>
<span class="n">List</span><span class="o"><</span><span class="n">Book</span><span class="o">></span> <span class="n">books</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="na">getValue</span><span class="o">().</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">r</span> <span class="o">-></span> <span class="n">bookRepository</span><span class="o">.</span><span class="na">mapper</span><span class="o">().</span><span class="na">map</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">into</span><span class="o">(</span><span class="n">TBook</span><span class="o">.</span><span class="na">BOOK</span><span class="o">))).</span><span class="na">collect</span><span class="o">(</span><span class="n">Collectors</span><span class="o">.</span><span class="na">toList</span><span class="o">());</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">AuthorWithBooks</span><span class="o">(</span><span class="n">author</span><span class="o">,</span> <span class="n">books</span><span class="o">);</span>
<span class="o">}).</span><span class="na">collect</span><span class="o">(</span><span class="n">Collectors</span><span class="o">.</span><span class="na">toList</span><span class="o">());</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h3 id="jooq-with-hand-made-group-by--mapping">jOOQ with hand made group by / mapping</h3>
<p>This function will allow to test the cost of jOOQ <code class="highlighter-rouge">groupBy</code> and mapper. The group by is done by hand without <code class="highlighter-rouge">Stream</code>
using the same code as the <code class="highlighter-rouge">Plain Old JDBC</code> one.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre><span class="nd">@Transactional</span><span class="o">(</span><span class="n">readOnly</span> <span class="o">=</span> <span class="kc">true</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">Collection</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">findAuthorsWithBooksjOOQOldFashionGroupBy</span><span class="o">()</span> <span class="o">{</span>
<span class="n">Result</span><span class="o"><</span><span class="n">Record</span><span class="o">></span> <span class="n">records</span> <span class="o">=</span> <span class="n">dslContext</span><span class="o">.</span><span class="na">select</span><span class="o">()</span>
<span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">AUTHOR</span><span class="o">.</span><span class="na">leftOuterJoin</span><span class="o">(</span><span class="n">BOOK</span><span class="o">).</span><span class="na">on</span><span class="o">(</span><span class="n">BOOK</span><span class="o">.</span><span class="na">AUTHOR_ID</span><span class="o">.</span><span class="na">equal</span><span class="o">(</span><span class="n">AUTHOR</span><span class="o">.</span><span class="na">ID</span><span class="o">)))</span>
<span class="o">.</span><span class="na">fetch</span><span class="o">();</span>
<span class="n">Map</span><span class="o"><</span><span class="n">Long</span><span class="o">,</span> <span class="n">AuthorWithBooks</span><span class="o">></span> <span class="n">booksMap</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o"><>();</span>
<span class="n">records</span><span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">forEach</span><span class="o">(</span><span class="n">r</span> <span class="o">-></span> <span class="o">{</span>
<span class="n">Long</span> <span class="n">authorId</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="na">getValue</span><span class="o">(</span><span class="n">TAuthor</span><span class="o">.</span><span class="na">AUTHOR</span><span class="o">.</span><span class="na">ID</span><span class="o">);</span>
<span class="n">AuthorWithBooks</span> <span class="n">authorWithBooks</span> <span class="o">=</span> <span class="n">booksMap</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">authorId</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">authorWithBooks</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">authorWithBooks</span> <span class="o">=</span> <span class="k">new</span> <span class="n">AuthorWithBooks</span><span class="o">();</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">setAuthor</span><span class="o">(</span><span class="k">new</span> <span class="n">Author</span><span class="o">(</span><span class="n">authorId</span><span class="o">,</span> <span class="n">r</span><span class="o">.</span><span class="na">getValue</span><span class="o">(</span><span class="n">TAuthor</span><span class="o">.</span><span class="na">AUTHOR</span><span class="o">.</span><span class="na">NAME</span><span class="o">)));</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">setBooks</span><span class="o">(</span><span class="k">new</span> <span class="n">ArrayList</span><span class="o"><>());</span>
<span class="n">booksMap</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">authorId</span><span class="o">,</span> <span class="n">authorWithBooks</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">Book</span> <span class="n">book</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Book</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">getValue</span><span class="o">(</span><span class="n">TBook</span><span class="o">.</span><span class="na">BOOK</span><span class="o">.</span><span class="na">ID</span><span class="o">),</span> <span class="n">r</span><span class="o">.</span><span class="na">getValue</span><span class="o">(</span><span class="n">TBook</span><span class="o">.</span><span class="na">BOOK</span><span class="o">.</span><span class="na">TITLE</span><span class="o">),</span> <span class="n">authorId</span><span class="o">);</span>
<span class="n">authorWithBooks</span><span class="o">.</span><span class="na">getBooks</span><span class="o">().</span><span class="na">add</span><span class="o">(</span><span class="n">book</span><span class="o">);</span>
<span class="o">});</span>
<span class="k">return</span> <span class="n">booksMap</span><span class="o">.</span><span class="na">values</span><span class="o">();</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h1 id="jpa">JPA</h1>
<p>Because of the join, JPA will return a list of author, with an author entry per returned row. This list will contain duplicate author entry.
All JPQ queries are using the below function to transform a list of duplicated list of <code class="highlighter-rouge">Author</code> to a list of distinct <code class="highlighter-rouge">AuthorWithBooks</code>:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="kd">private</span> <span class="n">List</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">toAuthor</span><span class="o">(</span><span class="n">List</span><span class="o"><</span><span class="n">Author</span><span class="o">></span> <span class="n">authors</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">authors</span><span class="o">.</span><span class="na">stream</span><span class="o">()</span>
<span class="o">.</span><span class="na">distinct</span><span class="o">()</span>
<span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">author</span> <span class="o">-></span> <span class="k">new</span> <span class="n">AuthorWithBooks</span><span class="o">(</span><span class="n">author</span><span class="o">,</span> <span class="n">author</span><span class="o">.</span><span class="na">getBooks</span><span class="o">())).</span><span class="na">collect</span><span class="o">(</span><span class="n">Collectors</span><span class="o">.</span><span class="na">toList</span><span class="o">());</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="hibernate-named-query">Hibernate Named Query</h2>
<p>The named query set on <code class="highlighter-rouge">Author</code> entity:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="nd">@NamedQueries</span><span class="o">(</span>
<span class="nd">@NamedQuery</span><span class="o">(</span><span class="n">name</span> <span class="o">=</span> <span class="s">"Author.findAllWithBooks"</span> <span class="o">,</span> <span class="n">query</span> <span class="o">=</span> <span class="s">"FROM Author a LEFT JOIN FETCH a.books"</span><span class="o">)</span>
<span class="o">)</span></pre></td></tr></tbody></table></code></pre></figure>
<p>The associated query:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nd">@Transactional</span><span class="o">(</span><span class="n">readOnly</span> <span class="o">=</span> <span class="kc">true</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">List</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">findAuthorsWithBooksUsingNamedQuery</span><span class="o">()</span> <span class="o">{</span>
<span class="n">TypedQuery</span><span class="o"><</span><span class="n">Author</span><span class="o">></span> <span class="n">query</span> <span class="o">=</span> <span class="n">entityManager</span><span class="o">.</span><span class="na">createNamedQuery</span><span class="o">(</span><span class="s">"Author.findAllWithBooks"</span><span class="o">,</span> <span class="n">Author</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="k">return</span> <span class="nf">toAuthor</span><span class="o">(</span><span class="n">query</span><span class="o">.</span><span class="na">getResultList</span><span class="o">());</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="spring-data-jpa">Spring Data JPA</h2>
<p>The method from the repository interface:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre><span class="nd">@Query</span><span class="o">(</span><span class="s">"FROM Author a LEFT JOIN FETCH a.books"</span><span class="o">)</span>
<span class="n">List</span><span class="o"><</span><span class="n">Author</span><span class="o">></span> <span class="nf">findAllWithBooks</span><span class="o">();</span></pre></td></tr></tbody></table></code></pre></figure>
<p>The method from the query service:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="nd">@Transactional</span><span class="o">(</span><span class="n">readOnly</span> <span class="o">=</span> <span class="kc">true</span><span class="o">)</span>
<span class="kd">public</span> <span class="n">List</span><span class="o"><</span><span class="n">AuthorWithBooks</span><span class="o">></span> <span class="nf">findAuthorsWithBooksUsingSpringData</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">toAuthor</span><span class="o">(</span><span class="n">authorRepository</span><span class="o">.</span><span class="na">findAllWithBooks</span><span class="o">());</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h1 id="results">Results</h1>
<ul>
<li>Out of the box, JPA based queries are the cleanest, only a distinct statement is needed to filter the result.</li>
<li>The JDBC query and the result mapping are not surprising: plain SQL and the result mapping must be done by hand.</li>
<li>The jOOQ query is neat, thanks to the DSL. On my use case the results must be mapped from Record to Pojo which add a litle burden to the code. My use case is maybe border line and the <code class="highlighter-rouge">Record</code> is the first class result.</li>
</ul>
<p>The benchmark is done using <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a>:</p>
<ul>
<li>25s of warmup, 25s of measure.</li>
<li>1 thread (core i5@3.1GHz)</li>
</ul>
<p>Reference scenario involve a nop mapping.</p>
<table>
<thead>
<tr>
<th style="text-align: left">Scenario</th>
<th style="text-align: left">ops/s</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Reference</td>
<td style="text-align: left">12446.552 ± 328.210</td>
</tr>
<tr>
<td style="text-align: left">Plain Jdbc</td>
<td style="text-align: left">11887.212 ± 254.889</td>
</tr>
<tr>
<td style="text-align: left">Hibernate Named Query</td>
<td style="text-align: left">1015.088 ± 16.014</td>
</tr>
<tr>
<td style="text-align: left">Hibernate Spring Data</td>
<td style="text-align: left">1017.145 ± 17.038</td>
</tr>
<tr>
<td style="text-align: left">jOOQ IntoGroup</td>
<td style="text-align: left">1186.168 ± 11.805</td>
</tr>
<tr>
<td style="text-align: left">jOOQ hand made groupBy</td>
<td style="text-align: left">3217.562 ± 31.897</td>
</tr>
</tbody>
</table>
<p>I’m not expecting such a difference between plain JDBC and the others (that’s suspicious, a factor of 3 would have been acceptable).
I’m not even expecting such a difference between plain JDBC and jOOQ and especially when using jOOQ groupBy and mapper.</p>
<p>My benchmark may be wrong, I miss THE fetch method to used, or the jOOQ code path is less straight than I expected as it seems to involve a bunch of objects allocation per row (Record, Pojo) and (in my case) the use of two mappers.
Whatever, comments/pull request are welcome to improve this quicky.</p>
Fair Consuming With RabbitMQ2015-07-05T10:18:46+00:00http://nithril.github.io/amqp/2015/07/05/fair-consuming-with-rabbitmq<p><img style="float: left;margin-right:20px;" src="/assets/2015-07-05-fair-consuming-with-rabbitmq/rabbitmq_logo.png" /></p>
<p>This article will present a pattern to achieve fair consuming with RabbitMQ,
an AMQP implementation using a deficit weighted round robin scheduler.</p>
<p>This article is not about RabbitMQ <a href="https://www.rabbitmq.com/priority.html">Priority Queue Support</a>.
A consumer bounds to a RabbitMQ priority queue will always consume first the messages with the highest priority.
It doesn’t ensure that a message with a low priority will be processed, high priority messages may predate the processing slots.</p>
<!--more-->
<h1 id="toc">ToC</h1>
<ul id="markdown-toc">
<li><a href="#toc" id="markdown-toc-toc">ToC</a></li>
<li><a href="#the-pattern" id="markdown-toc-the-pattern">The Pattern</a></li>
<li><a href="#implementation" id="markdown-toc-implementation">Implementation</a> <ul>
<li><a href="#slot" id="markdown-toc-slot">Slot</a></li>
<li><a href="#the-consumer" id="markdown-toc-the-consumer">The consumer</a></li>
<li><a href="#the-main-loop" id="markdown-toc-the-main-loop">The main loop</a></li>
</ul>
</li>
<li><a href="#rabbitmq-consumer-prefetch" id="markdown-toc-rabbitmq-consumer-prefetch">RabbitMQ Consumer Prefetch</a></li>
<li><a href="#results" id="markdown-toc-results">Results</a></li>
</ul>
<h1 id="the-pattern">The Pattern</h1>
<p>We will implement the <a href="https://en.wikipedia.org/wiki/Deficit_round_robin">Deficit Weighted Round Robin</a> algorithm (DWRR).
This algorithm is simple and effective:</p>
<ul>
<li>It does not involve knowledge of the actual queues content</li>
<li>On the long term (not so long in human time), it allows to reach the targeted flow rate.</li>
</ul>
<p>It involves knowledge of the past content: from which the word <code class="highlighter-rouge">Deficit</code>, past contents increase the deficit and thus decrease the scheduling priority.</p>
<p>DWRR is a scheduling algorithm for the network scheduler: the unit of resource is the network bandwidth. Packets are prioritized according to their size and the incoming/outgoing flow slot.
In our case, the unit of resource will be the processing time. The higher the processing time ratio is, the higher the message rate would be. This value is the <code class="highlighter-rouge">quantum</code>,
it defines how much processing time ratio we will allocate per queue/slot. A quantum is noted <code class="highlighter-rouge">Q</code>.</p>
<p>The scheduling is done on queues. Given a priority range of <code class="highlighter-rouge">[1..N]</code>, it will involve <code class="highlighter-rouge">Qe[i]</code> queues. We will assign a quantum <code class="highlighter-rouge">Q[i]</code> to a queue <code class="highlighter-rouge">Qe[i]</code>.
Quantum may be normalized to a ratio <code class="highlighter-rouge">Ratio[i] = Q[i] / Sum(Q[1..N])</code>. This ratio is the part of resource allocated for a queue.</p>
<p>Pasts messages increase the deficit. Thus mesages must be weighted: a processing time cost must be computed per message.
It can be constant or dynamic and it’s depends on the resource type:</p>
<ul>
<li>Constant if the message processing time is constant</li>
<li>Dynamic if per message it vary significantly</li>
</ul>
<p>For this article I will use a fixed weight.</p>
<p><strong>Example</strong></p>
<p>We have two queues: Q1, Q2.
Q1 has a quantum of 10 and Q2 a quantum of 1. Q1 should have a processing rate 10 times faster than Q2.
Taken a weight of 1, when Q1 deque 1 message per iteration, Q2 should wait 10 iteration to deque one.</p>
<h1 id="implementation">Implementation</h1>
<p>The code is available on <a href="https://github.com/nithril/article-fair-consuming-with-rabbitmq">github</a>.</p>
<h2 id="slot">Slot</h2>
<p>Per queue we will define a consumer slot. RabbitMQ will push messages to each consumer according to deque rate.<br />
First we define the slot. It contains the <code class="highlighter-rouge">quantum</code>, the <code class="highlighter-rouge">deficit</code> and the RabbitMQ queue consumer.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre><span class="kd">public</span> <span class="kd">class</span> <span class="nc">DwrrSlot</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">quantum</span><span class="o">;</span>
<span class="kd">private</span> <span class="kt">int</span> <span class="n">deficit</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">DwrrBlockingQueueConsumer</span> <span class="n">consumer</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">DwrrSlot</span><span class="o">(</span><span class="n">DwrrBlockingQueueConsumer</span> <span class="n">consumer</span><span class="o">,</span> <span class="kt">int</span> <span class="n">quantum</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">consumer</span> <span class="o">=</span> <span class="n">consumer</span><span class="o">;</span>
<span class="k">this</span><span class="o">.</span><span class="na">quantum</span> <span class="o">=</span> <span class="n">quantum</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">reset</span><span class="o">(){</span>
<span class="n">deficit</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">reduceDeficit</span><span class="o">(){</span>
<span class="n">deficit</span> <span class="o">=</span> <span class="n">deficit</span> <span class="o">+</span> <span class="n">quantum</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">increaseDeficit</span><span class="o">(</span><span class="kt">int</span> <span class="n">cost</span><span class="o">){</span>
<span class="n">deficit</span> <span class="o">=</span> <span class="n">deficit</span> <span class="o">-</span> <span class="n">cost</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="the-consumer">The consumer</h2>
<p>This class subscribes to a RabbitMQ queue and stores delivered messages into a collection (<code class="highlighter-rouge">deliveries</code>).
The <code class="highlighter-rouge">consumerToken</code> semaphore releases a consumer token for each delivered message. Thus the main loop thread
can be notified when a new message is available.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code"><pre><span class="kd">public</span> <span class="kd">class</span> <span class="nc">DwrrBlockingQueueConsumer</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">BlockingDeque</span><span class="o"><</span><span class="n">QueueingConsumer</span><span class="o">.</span><span class="na">Delivery</span><span class="o">></span> <span class="n">deliveries</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">queue</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">Channel</span> <span class="n">channel</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">Semaphore</span> <span class="n">consumerToken</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">InternalConsumer</span> <span class="n">consumer</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">DwrrBlockingQueueConsumer</span><span class="o">(</span><span class="n">String</span> <span class="n">queue</span><span class="o">,</span> <span class="n">Channel</span> <span class="n">channel</span><span class="o">,</span> <span class="n">Semaphore</span> <span class="n">consumerToken</span><span class="o">,</span> <span class="kt">int</span> <span class="n">prefetch</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">queue</span> <span class="o">=</span> <span class="n">queue</span><span class="o">;</span>
<span class="k">this</span><span class="o">.</span><span class="na">channel</span> <span class="o">=</span> <span class="n">channel</span><span class="o">;</span>
<span class="k">this</span><span class="o">.</span><span class="na">consumerToken</span> <span class="o">=</span> <span class="n">consumerToken</span><span class="o">;</span>
<span class="k">this</span><span class="o">.</span><span class="na">deliveries</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LinkedBlockingDeque</span><span class="o"><>(</span><span class="n">prefetch</span><span class="o">);</span>
<span class="k">this</span><span class="o">.</span><span class="na">consumer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">InternalConsumer</span><span class="o">(</span><span class="n">channel</span><span class="o">);</span>
<span class="o">}</span>
<span class="cm">/**
* Start to consume messages from the queue
* @throws IOException
*/</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">start</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="c1">//Subscribe to the queue and enable the acknowledgement</span>
<span class="n">channel</span><span class="o">.</span><span class="na">basicConsume</span><span class="o">(</span><span class="n">queue</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">consumer</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">BlockingDeque</span><span class="o"><</span><span class="n">QueueingConsumer</span><span class="o">.</span><span class="na">Delivery</span><span class="o">></span> <span class="nf">getDeliveries</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">deliveries</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="kd">class</span> <span class="nc">InternalConsumer</span> <span class="kd">extends</span> <span class="n">DefaultConsumer</span> <span class="o">{</span>
<span class="kd">public</span> <span class="nf">InternalConsumer</span><span class="o">(</span><span class="n">Channel</span> <span class="n">channel</span><span class="o">)</span> <span class="o">{</span>
<span class="kd">super</span><span class="o">(</span><span class="n">channel</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">handleDelivery</span><span class="o">(</span><span class="n">String</span> <span class="n">consumerTag</span><span class="o">,</span> <span class="n">Envelope</span> <span class="n">envelope</span><span class="o">,</span> <span class="n">AMQP</span><span class="o">.</span><span class="na">BasicProperties</span> <span class="n">properties</span><span class="o">,</span> <span class="kt">byte</span><span class="o">[]</span> <span class="n">body</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">//Queue the message</span>
<span class="n">deliveries</span><span class="o">.</span><span class="na">offer</span><span class="o">(</span><span class="k">new</span> <span class="n">QueueingConsumer</span><span class="o">.</span><span class="na">Delivery</span><span class="o">(</span><span class="n">envelope</span><span class="o">,</span> <span class="n">properties</span><span class="o">,</span> <span class="n">body</span><span class="o">));</span>
<span class="c1">//Release a consumer token</span>
<span class="n">consumerToken</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<h2 id="the-main-loop">The main loop</h2>
<p>The <code class="highlighter-rouge">consumerToken#tryAcquire</code> allows to park the main loop thread if there is no available message.
The release of a <code class="highlighter-rouge">consumerToken</code> by <code class="highlighter-rouge">DwrrBlockingQueueConsumer.InternalConsumer#handleDelivery</code> will wake up the main loop thread with a minimal delay.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
</pre></td><td class="rouge-code"><pre><span class="k">while</span> <span class="o">(</span><span class="kc">true</span><span class="o">)</span> <span class="o">{</span>
<span class="kt">int</span> <span class="n">processedMessageCounter</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="c1">//Try to acquire a consumer token, wait for 5 seconds</span>
<span class="k">if</span> <span class="o">(</span><span class="n">consumerToken</span><span class="o">.</span><span class="na">tryAcquire</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">5</span><span class="o">,</span> <span class="n">TimeUnit</span><span class="o">.</span><span class="na">SECONDS</span><span class="o">))</span> <span class="o">{</span>
<span class="c1">//We got a token, a delivery is available</span>
<span class="k">for</span> <span class="o">(</span><span class="n">DwrrSlot</span> <span class="n">slot</span> <span class="o">:</span> <span class="n">slots</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">//The deficit is reduced per iteration</span>
<span class="n">slot</span><span class="o">.</span><span class="na">reduceDeficit</span><span class="o">();</span>
<span class="c1">//Loop until the slot does not contain any deliveries or until the slot deficit is bellow the message weight</span>
<span class="k">while</span> <span class="o">(!</span><span class="n">slot</span><span class="o">.</span><span class="na">getConsumer</span><span class="o">().</span><span class="na">getDeliveries</span><span class="o">().</span><span class="na">isEmpty</span><span class="o">()</span> <span class="o">&&</span> <span class="n">slot</span><span class="o">.</span><span class="na">getDeficit</span><span class="o">()</span> <span class="o">>=</span> <span class="n">MESSAGE_WEIGHT</span><span class="o">)</span> <span class="o">{</span>
<span class="n">QueueingConsumer</span><span class="o">.</span><span class="na">Delivery</span> <span class="n">delivery</span> <span class="o">=</span> <span class="n">slot</span><span class="o">.</span><span class="na">getConsumer</span><span class="o">().</span><span class="na">getDeliveries</span><span class="o">().</span><span class="na">poll</span><span class="o">();</span>
<span class="n">slot</span><span class="o">.</span><span class="na">increaseDeficit</span><span class="o">(</span><span class="n">MESSAGE_WEIGHT</span><span class="o">);</span>
<span class="c1">//Simulate processing time</span>
<span class="n">Thread</span><span class="o">.</span><span class="na">sleep</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">1000</span><span class="o">);</span>
<span class="c1">//Finally ack the message, so RabbitMQ will push a new one to the consumer</span>
<span class="n">channel</span><span class="o">.</span><span class="na">basicAck</span><span class="o">(</span><span class="n">delivery</span><span class="o">.</span><span class="na">getEnvelope</span><span class="o">().</span><span class="na">getDeliveryTag</span><span class="o">(),</span> <span class="kc">false</span><span class="o">);</span>
<span class="c1">//Increment the number of processed message</span>
<span class="n">processedMessageCounter</span><span class="o">++;</span>
<span class="o">}</span>
<span class="c1">//If the slot does not contain any deliveries, reset the deficit</span>
<span class="k">if</span> <span class="o">(</span><span class="n">slot</span><span class="o">.</span><span class="na">getConsumer</span><span class="o">().</span><span class="na">getDeliveries</span><span class="o">().</span><span class="na">isEmpty</span><span class="o">())</span> <span class="o">{</span>
<span class="n">slot</span><span class="o">.</span><span class="na">reset</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">processedMessageCounter</span> <span class="o">></span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">//If we have processed message, we must acquire the number of processed message minus the first acquire</span>
<span class="n">consumerToken</span><span class="o">.</span><span class="na">acquire</span><span class="o">(</span><span class="n">processedMessageCounter</span> <span class="o">-</span> <span class="mi">1</span><span class="o">);</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="c1">//If we do not have processed message (because all deficit where < weight) we must release the token</span>
<span class="n">consumerToken</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="n">LOG</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"No message"</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>
<hr />
<p>That’s it. Pretty straightforward.</p>
<h1 id="rabbitmq-consumer-prefetch">RabbitMQ Consumer Prefetch</h1>
<p>The AMQP supports an acknowledgement feature. When activated, the consumer must acknowledge (to the broker) all messages it receives.
Until the message is unacked, RabbitMQ will not push a new message to the consumer. The consumer prefetch (aka QoS) configures how much unacked messages a consumer can hold.</p>
<p><a href="https://www.rabbitmq.com/consumer-prefetch.html">Consumer prefetch</a> is an important RabbitMQ concept.</p>
<blockquote>
<p>AMQP specifies the basic.qos method to allow you to limit the number of unacknowledged messages on a channel (or connection) when consuming (aka “prefetch count”).</p>
</blockquote>
<p>RabbitMQ will push message to the consumer until the number of unacked messages is reached.</p>
<p>The prefetch value must be set according to the processing speed. The priority ratio will be flattened if the processing rate is greater than the RabbitMQ push rate because consumers will wait for messages most of the time</p>
<blockquote>
<p>The goal is to keep the consumers saturated with work, but to minimise the client’s buffer size so that more messages stay in Rabbit’s queue and are thus available for new consumers or to just be sent out to consumers as they become free.</p>
</blockquote>
<p>See this in depth article <a href="https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/"><em>Some queuing theory: throughput, latency and bandwidth</em></a></p>
<h1 id="results">Results</h1>
<p><strong>10 queues, a message weight of 4, each queue contains 40000 messages. The message processing time is set to 100µs</strong></p>
<p><code class="highlighter-rouge">p9</code> deque rate is ten time faster than <code class="highlighter-rouge">p0</code> one. This ratio is equals to the quantum one.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Queue</th>
<th>Qantum</th>
<th>Consumed</th>
<th>Rate</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">p0</td>
<td>4</td>
<td>3741</td>
<td>136.29408</td>
</tr>
<tr>
<td style="text-align: center">p1</td>
<td>8</td>
<td>7377</td>
<td>268.76276</td>
</tr>
<tr>
<td style="text-align: center">p2</td>
<td>12</td>
<td>11031</td>
<td>401.8872</td>
</tr>
<tr>
<td style="text-align: center">p3</td>
<td>16</td>
<td>14556</td>
<td>530.3119</td>
</tr>
<tr>
<td style="text-align: center">p4</td>
<td>20</td>
<td>18195</td>
<td>662.88983</td>
</tr>
<tr>
<td style="text-align: center">p5</td>
<td>24</td>
<td>21828</td>
<td>795.2492</td>
</tr>
<tr>
<td style="text-align: center">p6</td>
<td>28</td>
<td>25408</td>
<td>925.6777</td>
</tr>
<tr>
<td style="text-align: center">p7</td>
<td>32</td>
<td>29024</td>
<td>1057.4176</td>
</tr>
<tr>
<td style="text-align: center">p8</td>
<td>36</td>
<td>32605</td>
<td>1187.8826</td>
</tr>
<tr>
<td style="text-align: center">p9</td>
<td>40</td>
<td>36250</td>
<td>1320.6791</td>
</tr>
</tbody>
</table>
<p><strong>10 queues, a message weight of 4, each queue contains 40000 messages. The message processing time is set to 1µs</strong></p>
<p>The rate is flattened. My RabbitMQ instance cannot sustain the processing time rate.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Queue</th>
<th>Qantum</th>
<th>Consumed</th>
<th>Rate</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">p0</td>
<td>4</td>
<td>19546</td>
<td>1970.959</td>
</tr>
<tr>
<td style="text-align: center">p1</td>
<td>8</td>
<td>19323</td>
<td>1948.4723</td>
</tr>
<tr>
<td style="text-align: center">p2</td>
<td>12</td>
<td>20445</td>
<td>2061.6113</td>
</tr>
<tr>
<td style="text-align: center">p3</td>
<td>16</td>
<td>19750</td>
<td>1991.5297</td>
</tr>
<tr>
<td style="text-align: center">p4</td>
<td>20</td>
<td>20075</td>
<td>2024.3018</td>
</tr>
<tr>
<td style="text-align: center">p5</td>
<td>24</td>
<td>19780</td>
<td>1994.5548</td>
</tr>
<tr>
<td style="text-align: center">p6</td>
<td>28</td>
<td>20040</td>
<td>2020.7725</td>
</tr>
<tr>
<td style="text-align: center">p7</td>
<td>32</td>
<td>20227</td>
<td>2039.6289</td>
</tr>
<tr>
<td style="text-align: center">p8</td>
<td>36</td>
<td>20643</td>
<td>2081.5771</td>
</tr>
<tr>
<td style="text-align: center">p9</td>
<td>40</td>
<td>20188</td>
<td>2035.6963</td>
</tr>
</tbody>
</table>
Continuous Release With Maven2015-06-28T10:18:46+00:00http://nithril.github.io/cr/2015/06/28/continuous-release-with-maven<p><img style="float: left;margin-right:20px;" src="/assets/2015-06-28-continuous-release-with-maven/maven_logo.png" /></p>
<p>Continuous Release process allows an artifact to be continuously <strong>releasable</strong> and promotable to the upper environment.</p>
<p>The topic has been covered by a large amount of articles, so why this new one? Continuous release of an Application (eg. executable jar, war)
is indeed well covered but not the continuous release of the dependencies (eg. jar library) of an Application.</p>
<p>In this article we will cover both topics and we will get ride of the release pipeline I describe in my previous article <code class="highlighter-rouge">Jenkins Workflow - Pipeline de release</code> to
rely solely on a slightly modified continuous integration pipeline. I will end the article with a description of the Continuous Integration pressure paradigm.</p>
<!--more-->
<h1 id="toc">ToC</h1>
<ul id="markdown-toc">
<li><a href="#toc" id="markdown-toc-toc">ToC</a></li>
<li><a href="#before-few-rules-must-be-followed" id="markdown-toc-before-few-rules-must-be-followed">Before few rules must be followed</a></li>
<li><a href="#scenario" id="markdown-toc-scenario">Scenario</a></li>
<li><a href="#the-process" id="markdown-toc-the-process">The process</a></li>
<li><a href="#fix-application-and-dependency-versions" id="markdown-toc-fix-application-and-dependency-versions">Fix <strong>Application</strong> and <strong>Dependency</strong> versions</a> <ul>
<li><a href="#using-a-build-number" id="markdown-toc-using-a-build-number">Using a Build Number</a></li>
<li><a href="#and-the-plugin-versions" id="markdown-toc-and-the-plugin-versions">And the plugin versions</a></li>
</ul>
</li>
<li><a href="#fix-application-dependency-version" id="markdown-toc-fix-application-dependency-version">Fix <strong>Application</strong> dependency version</a> <ul>
<li><a href="#the-available-options" id="markdown-toc-the-available-options">The available options</a></li>
<li><a href="#range-to-the-rescue" id="markdown-toc-range-to-the-rescue">Range to the rescue</a></li>
</ul>
</li>
<li><a href="#continuous-integration-pressure" id="markdown-toc-continuous-integration-pressure">Continuous Integration Pressure</a> <ul>
<li><a href="#using-a-version-range" id="markdown-toc-using-a-version-range">Using a version range</a></li>
<li><a href="#using-a-fixed-version" id="markdown-toc-using-a-fixed-version">Using a fixed version</a> <ul>
<li><a href="#continuous-release-without-build-number" id="markdown-toc-continuous-release-without-build-number">Continuous release without build number</a></li>
<li><a href="#continuous-release-with-build-number" id="markdown-toc-continuous-release-with-build-number">Continuous release with build number</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li>
</ul>
<h1 id="before-few-rules-must-be-followed">Before few rules must be followed</h1>
<p>Because a non snapshot build must be reproducible no modification are allowed once a project is released. This include no modification of the project and no modification
of the dependencies. Thus project version must be fixed and version dependencies too</p>
<h1 id="scenario">Scenario</h1>
<p>We have two projects: <strong>Application</strong> and <strong>Dependency</strong>. Developments of <strong>Application</strong> and <strong>Dependency</strong> are coupled for an iteration
because <strong>Application</strong> needs a feature of <strong>Dependency</strong>.</p>
<ul>
<li><strong>Application</strong> 1.0.0-SNAPSHOT</li>
</ul>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nt"><project></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>application<span class="nt"></artifactId></span>
<span class="nt"><version></span>1.0.0-SNAPSHOT<span class="nt"></version></span>
<span class="nt"></project></span></pre></td></tr></tbody></table></code></pre></figure>
<ul>
<li><strong>Dependency</strong> 1.2.0-SNAPSHOT</li>
</ul>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nt"><project></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>dependency<span class="nt"></artifactId></span>
<span class="nt"><version></span>1.2.0-SNAPSHOT<span class="nt"></version></span>
<span class="nt"></project></span></pre></td></tr></tbody></table></code></pre></figure>
<p><strong>Application</strong> depends on <strong>Dependency</strong>:</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>dependency<span class="nt"></artifactId></span>
<span class="nt"><version></span>1.2.0-SNAPSHOT<span class="nt"></version></span>
<span class="nt"></dependency></span></pre></td></tr></tbody></table></code></pre></figure>
<h1 id="the-process">The process</h1>
<p>The target process is a slightly modified continuous integration pipeline:</p>
<ul>
<li>Fix versions:
<ul>
<li>Project version</li>
<li>Dependency versions</li>
</ul>
</li>
<li>Compile, Test, Package, Deploy</li>
<li>If the project passes all the tests:
<ul>
<li>Promote the project artifact (ie. deploy to the upper environment)</li>
<li>Increase the project version</li>
</ul>
</li>
</ul>
<p>This process does not involve the common release steps: no need to remove snapshot qualifier, commit and wait
for the jenkins pipeline to finish. The usual continuous integration pipeline is self sufficient and generate a releasable
artifact.</p>
<h1 id="fix-application-and-dependency-versions">Fix <strong>Application</strong> and <strong>Dependency</strong> versions</h1>
<h2 id="using-a-build-number">Using a Build Number</h2>
<p>To fix the version I will use the build number for three reasons:</p>
<ul>
<li>It is a best practice to deploy only once a release version.</li>
<li>A build number relates to a meaningful information (jenkins build number, SCM revision…).</li>
<li>It is not always possible to use the incremental or minor part of the version to do continuous release.</li>
</ul>
<h2 id="and-the-plugin-versions">And the plugin versions</h2>
<p>Before compilation we fix the version using the <code class="highlighter-rouge">versions:set</code> goal:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre>mvn build-helper:parse-version versions:set <span class="nt">-DnewVersion</span><span class="o">=</span><span class="k">${</span><span class="nv">parsedVersion</span><span class="p">.majorVersion</span><span class="k">}</span>.<span class="k">${</span><span class="nv">parsedVersion</span><span class="p">.minorVersion</span><span class="k">}</span>.<span class="k">${</span><span class="nv">parsedVersion</span><span class="p">.incrementalVersion</span><span class="k">}</span>-<span class="k">${</span><span class="nv">BUILD_NUMBER</span><span class="k">}</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Where <a href="http://www.mojohaus.org/versions-maven-plugin/version-rules.html">BUILD_NUMBER</a> can be a number coming from the SCM or the Jenkins Job build number.
A build must have a BUILD_NUMBER greater than the previous builds.</p>
<p><code class="highlighter-rouge">build-helper:parse-version</code> is a convenient method to extract the version component.</p>
<p><strong>Application and Dependency versions are now fixed.</strong></p>
<h1 id="fix-application-dependency-version">Fix <strong>Application</strong> dependency version</h1>
<h2 id="the-available-options">The available options</h2>
<ul>
<li>
<p><a href="http://www.mojohaus.org/versions-maven-plugin/use-releases-mojo.html"><code class="highlighter-rouge">versions:use-releases</code></a>: <em>searches the pom for all -SNAPSHOT versions which have been released and replaces them with the corresponding release version.</em></p>
<p>Only update version using version without qualifier/build number.
Not suitable.</p>
</li>
<li>
<p><a href="http://www.mojohaus.org/versions-maven-plugin/use-latest-releases-mojo.html"><code class="highlighter-rouge">versions:use-latest-releases</code></a>: <em>searches the pom for all versions which have been a newer version and replaces them with the latest version.</em></p>
<p>Update version using latest release. Update scope is configurable (eg. <a href="http://www.mojohaus.org/versions-maven-plugin/use-latest-releases-mojo.html#allowIncrementalUpdates"><code class="highlighter-rouge">allowIncrementalUpdates</code></a>.
May be useful if continuous release is done using the incremental version.</p>
</li>
<li>
<p><a href="http://www.mojohaus.org/versions-maven-plugin/resolve-ranges-mojo.html"><code class="highlighter-rouge">versions:resolve-ranges</code></a>: <em>finds dependencies using version ranges and resolves the range to the specific version being used.</em></p>
<p>Suitable if <strong>Dependency</strong> version is defined as a range.</p>
</li>
<li>
<p><a href="http://www.mojohaus.org/versions-maven-plugin/update-properties-mojo.html"><code class="highlighter-rouge">versions:update-properties</code></a>: <em>updates properties defined in a project so that they correspond to the latest available version of specific dependencies. This can be useful if a suite of dependencies must all be locked to one version.</em></p>
<p>Suitable if <strong>Dependency</strong> version property is defined as a range. As we see, this goal add another level of flexibility.</p>
</li>
</ul>
<p>Note: All goals does not supports Maven properties defined in the root pom and used in a module. For this case <code class="highlighter-rouge">dependencyManagement</code> may be used in the root pom.</p>
<h2 id="range-to-the-rescue">Range to the rescue</h2>
<p>The only suitable option is the <code class="highlighter-rouge">versions:resolve-ranges</code> goal. <strong>Application</strong> must depend on <strong>Dependency</strong> using a
<a href="https://maven.apache.org/enforcer/enforcer-rules/versionRanges.html">version range</a> instead of a SNAPSHOT.
In our case the range is <code class="highlighter-rouge">[1.2.0,1.2.0-99999]</code>:</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>dependency<span class="nt"></artifactId></span>
<span class="nt"><version></span>[1.2.0,1.2.0-99999]<span class="nt"></version></span>
<span class="nt"></dependency></span></pre></td></tr></tbody></table></code></pre></figure>
<p>The <strong>Application</strong> now depends only on deployed version of <strong>Dependency</strong> and no longer depends on a SNAPSHOT dependency.</p>
<p>To fix the range we call the <code class="highlighter-rouge">versions:resolve-ranges</code> goal:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="c">#> mvn versions:resolve-ranges</span></pre></td></tr></tbody></table></code></pre></figure>
<p>The dependency is now:</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>dependency<span class="nt"></artifactId></span>
<span class="nt"><version></span>1.2.0-123<span class="nt"></version></span>
<span class="nt"></dependency></span></pre></td></tr></tbody></table></code></pre></figure>
<p><strong>The dependency version is now fixed.</strong></p>
<h1 id="continuous-integration-pressure">Continuous Integration Pressure</h1>
<p>The Continuous Integration Pressure concept is simple: When a new version of <strong>Dependency</strong> is deployed it must trigger the pipeline of <strong>Application</strong> in order to
test the integration of <strong>Dependency</strong> into <strong>Application</strong></p>
<p>The concept could go further, <strong>Application</strong> must test the integration prior to integrating the new version of <strong>Dependency</strong>. If the test failed, the integration
should be reviewed but must not block the developers or the pipeline.</p>
<p>On the CI side, a new version of <strong>Dependency</strong> will automatically trigger the integration test of all <strong>Application</strong> that depend on it.</p>
<p>There are different levels:</p>
<ul>
<li><strong>Application</strong> depends on <strong>Dependency</strong> using a version range</li>
<li><strong>Application</strong> depends on <strong>Dependency</strong> using a fixed version</li>
</ul>
<h2 id="using-a-version-range">Using a version range</h2>
<p>On the CI side, a new version of <strong>Dependency</strong> will automatically trigger the <strong>Application</strong> pipeline. As <strong>Application</strong> is using a range, the build will
use the latest build number. No modification of the above process is needed.</p>
<h2 id="using-a-fixed-version">Using a fixed version</h2>
<p>The drawback of SNAPSHOT and version range is their volatile nature. A build or the tests of <strong>Application</strong> may failed because of
a new version of <strong>Dependency</strong>. A fixed version resolves this issue.</p>
<p>On the CI side, a new version of <strong>Dependency</strong> will automatically trigger the <strong>Application</strong> integration pipelines.
This integration pipeline :</p>
<ul>
<li>Updates the fixed version of <strong>Dependency</strong></li>
<li>Launches the test</li>
<li>If tests are ok, the version change is committed</li>
</ul>
<p>It involves some modification of the process.</p>
<h3 id="continuous-release-without-build-number">Continuous release without build number</h3>
<p>if the incremental or minor part of the version is used to do continuous release, the <strong>Dependency</strong> version could be updated using <code class="highlighter-rouge">versions:use-latest-releases</code>
and its properties (eg. <code class="highlighter-rouge">allowIncrementalUpdates</code>). That’s it the version is updated.</p>
<h3 id="continuous-release-with-build-number">Continuous release with build number</h3>
<p>Vincent Latombe <a href="https://groups.google.com/d/msg/lescastcodeurs/yig2NTbr6vo/l1jxrLV5wHsJ">on the CastCodeur mailing list</a> suggests me this process (in french).</p>
<p>If a range is used it involves more configuration. We should retain the range somewhere in the POM to be able to fix the <strong>Dependency</strong> version.</p>
<p>For this purpose we use the <code class="highlighter-rouge">versions:update-properties</code> goal using the <code class="highlighter-rouge">properties</code> property which allows to
add restrictions that apply to specific properties.
We create a <code class="highlighter-rouge">dependency.version</code> maven property which holds the current version of <strong>Dependency</strong>.<br />
<strong>Dependency</strong> version is set to this property.</p>
<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="code"><pre> <span class="nt"><properties></span>
<span class="nt"><dependency.version></span>1.2.0-123<span class="nt"></dependency.version></span>
<span class="nt"></properties></span>
<span class="nt"><dependencies></span>
<span class="nt"><dependency></span>
<span class="nt"><groupId></span>org.nlab.article.release<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>dependency<span class="nt"></artifactId></span>
<span class="nt"><version></span>${dependency.version}<span class="nt"></version></span>
<span class="nt"></dependency></span>
<span class="nt"></dependencies></span>
<span class="nt"><build></span>
<span class="nt"><plugins></span>
<span class="nt"><plugin></span>
<span class="nt"><groupId></span>org.codehaus.mojo<span class="nt"></groupId></span>
<span class="nt"><artifactId></span>versions-maven-plugin<span class="nt"></artifactId></span>
<span class="nt"><version></span>2.2<span class="nt"></version></span>
<span class="nt"><configuration></span>
<span class="nt"><properties></span>
<span class="nt"><property></span>
<span class="nt"><name></span>dependency.version<span class="nt"></name></span>
<span class="nt"><version></span>[1.2.0,1.2.0-9999]<span class="nt"></version></span>
<span class="nt"></property></span>
<span class="nt"></properties></span>
<span class="nt"></configuration></span>
<span class="nt"></plugin></span>
<span class="nt"></plugins></span>
<span class="nt"></build></span></pre></td></tr></tbody></table></code></pre></figure>
<p>When we execute the goal:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre>mvn versions:update-properties</pre></td></tr></tbody></table></code></pre></figure>
<p>The versions plugin updates <code class="highlighter-rouge">dependency.version</code> using the provided restrictions range.</p>
<h1 id="conclusion">Conclusion</h1>
<p>We get continuous release for application and dependencies with continuous integration pressure.
Every artifact is releasable without the burden of a release process.
The same pipeline can be used to do CI and to generate a promotable artifact.</p>
Introduction à Splunk2015-06-02T14:58:46+00:00http://nithril.github.io/splunk/2015/06/02/introduction-a-splunk<p><img style="float: left;margin-right:20px;" src="/assets/2015-06-02-operational-intelligence-splunk/splunk.png" /></p>
<p>Splunk est un applicatif d’intelligence opérationnelle. Il extrait et indexe des datas et offre des features de data mining: extraction, exploitation et visualisation.</p>
<p>C’est un applicatif closed source fondé sur un business model payant en relation avec la volumétrie de data indexée par jour.</p>
<p>Dans cet article nous allons installer Splunk, configurer l’extraction de logs et de metrics, exploiter les logs dans une recherche simple, afficher des metrics
sur un graphique puis créer une alerte associée. Pour finir je concluerai en faisant un parallèle avec les solutions open sources.</p>
<!--more-->
<h1 id="splunk-en-quelques-mots">Splunk en quelques mots</h1>
<p>Splunk c’est sous la plume du <a href="http://www.splunk.com/">marketing</a>:</p>
<blockquote>
<p>You see servers and devices, apps and logs, traffic and clouds. We see data—everywhere. Splunk® offers the leading platform for Operational Intelligence. It enables the curious to look closely at what others ignore—machine data—and find what others never see: insights that can help make your company more productive, profitable, competitive and secure. What can you do with Splunk? Just ask.</p>
</blockquote>
<p>Le modèle de licence et de cout de Splunk est fondé sur la volumétrie de log/jour et [sur les éditions/features].(http://www.splunk.com/en_us/products/pricing.html).
Il offre plusieurs éditions <a href="http://www.splunk.com/en_us/products/splunk-enterprise/free-vs-enterprise.html">Enterprise / Cloud</a> <a href="http://www.splunk.com/en_us/products/splunk-light/splunk-light-vs-splunk-enterprise.html">/ Light</a>
dont une version free] qui limite les features et la volumétrie de logs à 500MB/day.
Splunk supporte <a href="http://www.splunk.com/en_us/download/splunk-enterprise.html">Linux, Windows, Solaris, Mac OS, FreeBSD, AIX</a>.</p>
<p>Splunk, hormis donc la version free mais limité, n’est pas gratuit et est closed source.
Il a pour lui de posséder des fonctions d’extraction, d’indexation et d’exploitation intégrées dans un seul outil là où pour avoir un
équivalent Open Source, nous devrions intégrer plus d’un outils.</p>
<h1 id="scenario">Scenario</h1>
<p>L’objectif va être d’exploiter les logs applicatifs et les metrics d’une application Java:</p>
<ul>
<li>Affichage des logs</li>
<li>Affichage sur un même graphique de la mémoire heap utilisée et de la mémoire heap max</li>
<li>Création d’une alerte sur la mémoire utilisée</li>
</ul>
<p>Les logs et les metrics seront écrits dans des fichiers de logs en utilisant logback. Il y aura deux types de fichiers en sortie (et donc deux appenders).
Le premier stockera les logs dans un format faiblement structuré texte, il a pour but d’être lisible par des Humains. Le second stockera dans le format JSON
pour être lu par les Machines. Je publierai un article portant sur le logging expliquant ce choix.</p>
<p>Les fichiers de logs seront stockés dans <code class="highlighter-rouge">/var/log/myapp</code>. Pour contraindre l’article à une taille raisonnable,
Splunk et l’application pourront accéder au même répertoire de logs. Donc pas de forwarder.</p>
<p>Vous pourrez trouver les sources de cet article <a href="https://github.com/nithril/article-splunk-operational-intelligence">sur github</a>.</p>
<h1 id="installation-de-splunk">Installation de Splunk</h1>
<h2 id="installation-du-package">Installation du package</h2>
<p>Je passe par docker pour minimiser les impacts sur ma machine. Attention, le téléchargement du package nécessite d’être enregistré sur leur site.
Le dockerfile se résume à l’installation du package debian:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre>FROM ubuntu:14.04
ADD package/splunk-6.2.3-264376-linux-2.6-amd64.deb /tmp/splunk-6.2.3-264376-linux-2.6-amd64.deb
RUN sudo dpkg -i /tmp/splunk-6.2.3-264376-linux-2.6-amd64.deb</pre></td></tr></tbody></table></code></pre></figure>
<p>Construction de l’image:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="nb">sudo </span>docker build <span class="nt">-t</span> nlab/splunk .</pre></td></tr></tbody></table></code></pre></figure>
<p>Pour un controle plus fin, je lance le container sur la commande bash</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="nb">sudo </span>docker run <span class="nt">-t</span> <span class="nt">-p</span> 8000:8000 <span class="nt">-v</span> <span class="nv">$HOME</span>/splunk/var:/opt/splunk/var/lib/splunk <span class="nt">-v</span> <span class="nv">$HOME</span>/splunk/apps:/opt/splunk/etc/apps <span class="nt">-v</span> <span class="nv">$HOME</span>/splunk/log:/var/log/myapp <span class="nt">-i</span> nlab/splunk /bin/bash</pre></td></tr></tbody></table></code></pre></figure>
<p>Splunk se lance par défaut sur le port 8000. Son répertoire d’installation est <code class="highlighter-rouge">/opt/splunk/</code>. Je mappe les répertoires suivants:</p>
<ul>
<li><code class="highlighter-rouge">$HOME/splunk/var => /opt/splunk/var/lib/splunk</code>: contient les datas (indexes…)</li>
<li><code class="highlighter-rouge">$HOME/splunk/apps => /opt/splunk/etc/apps</code>: contient les applications splunk</li>
<li><code class="highlighter-rouge">$HOME/splunk/log => /var/log/myapp</code>: contient les logs à indexer</li>
</ul>
<p>Ensuite dans le container je lance Splunk par cette commande:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre>/opt/splunk/bin/splunk start <span class="nt">--accept-license</span> <span class="nt">--answer-yes</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Et voila, splunk est lancé et accessible via l’adresse <code class="highlighter-rouge">http://localhost:8000</code>.</p>
<h2 id="configuration">Configuration</h2>
<p>Splunk peut être configuré de plusieurs façons: ligne de commande, fichiers de configuration, interface REST, interface web.
Les fichiers de configuration de splunk sont de <a href="http://en.wikipedia.org/wiki/INI_file">type ini</a>.</p>
<p>Il est conseillé de centraliser les ajouts dans une application Splunk plutôt que de modifier directement les fichiers de configuration <code class="highlighter-rouge">$SPLUNK/etc/system</code>.
La création d’une application est simple et peut se faire en ligne de commande ou via <a href="http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/BuildApp">l’interface web</a>.</p>
<h3 id="application">Application</h3>
<p>Nous allons créer l’application <code class="highlighter-rouge">nlab</code> via l’interface web <code class="highlighter-rouge">/opt/splunk/etc/apps/nlab/</code></p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/create-app.png" alt="Splunk" /></p>
<h3 id="indexes">Indexes</h3>
<p>Nous créons deux indexes <code class="highlighter-rouge">NLAB_LOGS</code> et <code class="highlighter-rouge">NLAB_METRICS</code> qui vont servir à stocker respectivement les logs et les metrics</p>
<p>La configuration des indexes est stockée dans le fichier <a href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/Indexesconf"><code class="highlighter-rouge">indexes.conf</code></a>
qui est stocké dans le répertoire de l’application <code class="highlighter-rouge">/opt/splunk/etc/apps/nlab/local</code>.
Pas de fioriture, nous ne mettons que le strict minimum: le répertoire de stockage des données suivant leurs états.</p>
<figure class="highlight"><pre><code class="language-ini" data-lang="ini"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre><span class="nn">[NLAB_LOGS]</span>
<span class="py">homePath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_logs/db</span>
<span class="py">coldPath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_logs/colddb</span>
<span class="py">thawedPath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_logs/thaweddb</span>
<span class="nn">[NLAB_METRICS]</span>
<span class="py">homePath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_metrics/db</span>
<span class="py">coldPath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_metrics/colddb</span>
<span class="py">thawedPath</span> <span class="p">=</span> <span class="s">$SPLUNK_DB/nlab_metrics/thaweddb</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Le nom de la section permet de nommer l’index. Les propriétés <code class="highlighter-rouge">homePath</code>, <code class="highlighter-rouge">coldPath</code> et <code class="highlighter-rouge">thawedPath</code> définissent le répertoire de stockage des données suivant leurs états.
Pour plus d’information sur la notion de staging et de bucket, voir la page suivante <a href="http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/HowSplunkstoresindexes">How the indexer stores indexes</a></p>
<h3 id="inputs">Inputs</h3>
<p>L’application va générer deux fichiers de logs: <code class="highlighter-rouge">app.json.log</code> et <code class="highlighter-rouge">metrics.json.log</code>.</p>
<p>La configuration des inputs est stockée dans le fichier <a href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf"><code class="highlighter-rouge">inputs.conf</code></a>
qui est stocké dans le répertoire de l’application <code class="highlighter-rouge">/opt/splunk/etc/apps/nlab/local</code>.</p>
<p>Les fichiers à monitorer sont définis dans le nom de la section.</p>
<blockquote>
<p>[monitor://<path>]</path></p>
<ul>
<li>This directs Splunk to watch all files in <path>.</path></li>
<li>
<path> can be an entire directory or just a single file.
</path>
</li>
<li>You must specify the input type and then the path, so put three slashes in your path if you are starting
at the root (to include the slash that goes before the root directory).</li>
</ul>
</blockquote>
<p>Des wildcards peuvent être utilisés dans le path pour monitorer des logs d’applicatifs suivant le même formalisme ou par typologie eg. <code class="highlighter-rouge">/var/log/httpd/*_access</code>.</p>
<figure class="highlight"><pre><code class="language-ini" data-lang="ini"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="nn">[monitor:///var/log/myapp/app.json.log]</span>
<span class="py">index</span><span class="p">=</span><span class="s">NLAB_LOGS</span>
<span class="py">sourcetype</span><span class="p">=</span><span class="s">NLAB_JSON</span>
<span class="nn">[monitor:///var/log/myapp/metrics.json.log]</span>
<span class="py">index</span><span class="p">=</span><span class="s">NLAB_METRICS</span>
<span class="py">sourcetype</span><span class="p">=</span><span class="s">NLAB_JSON</span></pre></td></tr></tbody></table></code></pre></figure>
<p>La propriété <code class="highlighter-rouge">index</code> permet de définir l’index de destination et la propriété <code class="highlighter-rouge">sourcetype</code> permet de caractériser le type de traitement à appliquer.</p>
<blockquote>
<p>Primarily used to explicitly declare the source type for this data, as opposed
to allowing it to be determined via automated methods. This is typically
important both for searchability and for applying the relevant configuration for this
type of data during parsing and indexing.</p>
</blockquote>
<p>C’est une bonne pratique d’indiquer à Splunk le type de logs / traitement à appliquer plutôt que de le laisser inférer celui-ci (même s’il est relativement bon à ce jeu là).</p>
<h3 id="processing-properties-props">Processing Properties (Props)</h3>
<p>Les processing properties permettent de créer, entre autre, les <code class="highlighter-rouge">sourcetype</code>.</p>
<p>La configuration des props est stockée dans le fichier <a href="http://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf"><code class="highlighter-rouge">props.conf</code></a>
qui est encore une fois stocké dans le répertoire de l’application <code class="highlighter-rouge">/opt/splunk/etc/apps/nlab/local</code>.</p>
<figure class="highlight"><pre><code class="language-ini" data-lang="ini"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="nn">[NLAB_JSON]</span>
<span class="py">INDEXED_EXTRACTIONS</span><span class="p">=</span><span class="s">JSON</span>
<span class="py">KV_MODE</span><span class="p">=</span><span class="s">none</span>
<span class="py">AUTO_KV_JSON</span><span class="p">=</span><span class="s">false</span></pre></td></tr></tbody></table></code></pre></figure>
<ul>
<li><code class="highlighter-rouge">INDEXED_EXTRACTIONS: Tells Splunk the type of file and the extraction and/or parsing method Splunk should use on the file.</code></li>
<li><code class="highlighter-rouge">KV_MODE: Used for search-time field extractions only. Specifies the field/value extraction mode for the data.</code></li>
<li><code class="highlighter-rouge">AUTO_KV_JSON: Used for search-time field extractions only. Specifies whether to try json extraction automatically.</code></li>
</ul>
<p>L’extraction des fields est donc faite à l’indexation.</p>
<h1 id="exploitation">Exploitation</h1>
<p>Nous sommes prêt à exploiter nos logs.</p>
<h2 id="affichage-des-events">Affichage des events</h2>
<p>Nous allons pouvoir rechercher les events par index:</p>
<ul>
<li>Query: <code class="highlighter-rouge">index="nlab_logs"</code>
<img src="/assets/2015-06-02-operational-intelligence-splunk/events-logs.png" alt="Splunk" /></li>
<li>Query: <code class="highlighter-rouge">index="nlab_metrics"</code>
<img src="/assets/2015-06-02-operational-intelligence-splunk/events-metrics.png" alt="Splunk" /></li>
</ul>
<h2 id="affichage-des-logs">Affichage des logs</h2>
<p>L’affichage des events est brute. Une table dédiée peut être créé n’affichant que les fields d’intêrets. Nous utilisons pour cela la fonction <a href="http://docs.splunk.com/Documentation/Splunk/6.2.3/SearchReference/Table"><code class="highlighter-rouge">table</code></a>
qui prend une liste de fields: <code class="highlighter-rouge">index="nlab_logs" | table timestamp level thread logger message</code>:</p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/events-logs-table.png" alt="Splunk" /></p>
<p>Cet affichage pourrait être encore raffiné en ne cherchant que les logs de niveau <code class="highlighter-rouge">ERROR</code>: <code class="highlighter-rouge">index="nlab_logs" level=ERROR | table timestamp level thread logger message</code>.</p>
<h2 id="graphing">Graphing</h2>
<p>Le premier graphe que nous allons afficher est celui de la mémoire utilisée.
Pour ce faire nous pouvons utiliser la query suivante <code class="highlighter-rouge">index="nlab_metrics" | timechart max("args.heap.HeapMemoryUsage.used")</code>
qui utilise la fonction <a href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Timechart"><code class="highlighter-rouge">timechart</code></a>.</p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/vizualize_used.png" alt="Splunk" /></p>
<p>La query suivante <code class="highlighter-rouge">index="nlab_metrics" | timechart max("args.heap.HeapMemoryUsage.max")</code> permet logiquement de faire de même avec la mémoire max.</p>
<p>Notre histoire se corse quand il s’agit d’afficher <a href="http://docs.splunk.com/Documentation/Splunk/6.2.3/Search/Chartmultipledataseries">les deux series sur un même chart</a>:</p>
<blockquote>
<p>Splunk Enterprise transforming commands do not support a direct way to define multiple data series in your charts (or timecharts). However, you CAN achieve this using a combination of the stats and xyseries commands.</p>
</blockquote>
<p>Ce qui traduit avec nos données donne la query suivante:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre>index=nlab_metrics | stats max("args.heap.HeapMemoryUsage.used") as memoryUsed, max("args.heap.HeapMemoryUsage.max") as memoryMax by _time,source
| eval s1="args.heap.HeapMemoryUsage.used args.heap.HeapMemoryUsage.max" | makemv s1 | mvexpand s1
| eval yval=case(s1=="args.heap.HeapMemoryUsage.used",memoryUsed,s1=="args.heap.HeapMemoryUsage.max",memoryMax)
| eval series=source+":"+s1 | xyseries _time,series,yval</pre></td></tr></tbody></table></code></pre></figure>
<p>Pour donner le graphe suivant:</p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/visualize-combined.png" alt="Splunk" /></p>
<p>La query est compliquée pour un besoin a priori trivial. Combiner les graphes avec Graphite se résume <a href="http://graphite.readthedocs.org/en/latest/functions.html">à définir une liste de fonctions séparée par un ampersand</a>.
Ce qui donnerait la query (HTTP) suivante <code class="highlighter-rouge">alias(args.heap.HeapMemoryUsage.used, 'Used')&alias(args.heap.HeapMemoryUsage.max, 'Max')</code>.
Simple et très efficace.</p>
<p>Par contre Splunk sait combiner des graphes suivant un critère de regroupement. Par exemple la query suivante affiche la mémoire libre par host:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre>index=nlab_metrics "args.heap.HeapMemoryUsage.used"="*" earliest=-60s| eval free=('args.heap.HeapMemoryUsage.max' - 'args.heap.HeapMemoryUsage.used')
| timechart avg(free) by host</pre></td></tr></tbody></table></code></pre></figure>
<p>Même si dans le cas présent nous n’avons qu’un host:</p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/visualize-group-by.png" alt="Splunk" /></p>
<h2 id="alerting">Alerting</h2>
<p>La création d’alerte se fait en définissant une query de recherche puis en créant une alerte à partir de celle-ci.</p>
<p>L’alerte que nous définissons se fera sur la mémoire JVM restante :</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre>index=nlab_metrics "args.heap.HeapMemoryUsage.used"="*" earliest=-60s| eval free=('args.heap.HeapMemoryUsage.max' - 'args.heap.HeapMemoryUsage.used')
| eval threshold=free - 'args.heap.HeapMemoryUsage.max' * 0.15 | search threshold < 0</pre></td></tr></tbody></table></code></pre></figure>
<ul>
<li><code class="highlighter-rouge">"args.heap.HeapMemoryUsage.used"="*"</code>: retourne tous les events ayant ce field valué</li>
<li><code class="highlighter-rouge">earliest=-60s</code>: entre maintenant et -60s dans le passé</li>
<li><code class="highlighter-rouge">eval free=('args.heap.HeapMemoryUsage.max' - 'args.heap.HeapMemoryUsage.used')</code>: définition du field <code class="highlighter-rouge">free</code> égale à la mémoire libre</li>
<li><code class="highlighter-rouge">eval threshold=free - 'args.heap.HeapMemoryUsage.max' * 0.15</code>: définition du field <code class="highlighter-rouge">free</code> égale à la mémoire libre moins le seuil d’alerte fixé à 15% de la mémoire max</li>
<li><code class="highlighter-rouge">search threshold < 0</code>: permet de filtrer les résultat suivant le threshold</li>
</ul>
<p>Nous la sauvons en tant qu’alerte</p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/alert-create-1.png" alt="Splunk" /></p>
<p><img src="/assets/2015-06-02-operational-intelligence-splunk/alert-create-2.png" alt="Splunk" /></p>
<p>Le système d’alerte se basant sur l’index, nous aurions pu créer une alerte sur l’index de logs associée au niveau de log ERROR donnant une query de ce type
<code class="highlighter-rouge">index=nlab_logs level=ERROR earliest=-60s</code></p>
<h1 id="conclusion">Conclusion</h1>
<p>Dans cet article nous n’avons survolé qu’une partie des possibilités offertes par Splunk:</p>
<ul>
<li>Splunk peut être <a href="http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Aboutclusters">clusterisé et répliqué</a> suivant différentes topologies
(eg. noeuds indexer, noeuds searcher).</li>
<li>L’extraction des logs sur un serveur distant peut et doit être réalisée en utilisant un <a href="http://www.splunk.com/en_us/download/universal-forwarder.html"><code class="highlighter-rouge">Splunk Universal Forwarder</code></a>
qui est configuré en utilisant les mêmes mécanismes que ceux que nous avons vus.</li>
<li>Dans sa version Enterprise, il permet la gestion d’un ensemble de forwarders et de leurs configurations qui est donc centralisées et poussées du serveur vers le forwarder.
La configuration est associée via un système de classificateur (par host, ip…)</li>
<li>Splunk possède un écosystème <a href="https://splunkbase.splunk.com/">d’apps et addons</a>, citons notamment <a href="https://splunkbase.splunk.com/app/273/">Splunk App for Unix and Linux</a></li>
<li>Il permet la création de dashboard et il offre également un SDK</li>
<li>L’aspect exploitation des logs peut être poussé en utilisant un ensemble de <a href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/WhatsInThisManual">commandes et de fonctions</a></li>
<li>[…]</li>
</ul>
<p>Qu’en est-il de l’existant Open Source ?</p>
<h2 id="existant-open-source">Existant Open Source</h2>
<p>La présentation <a href="http://fr.slideshare.net/cyrille.leclerc/open-source-monitoring-for-java-with-graphite">Monitoring Open Source pour Java avec JmxTrans, Graphite et Nagios</a>
est une bonne base présentant les outils Open Source nécessaires à notre besoin.</p>
<h3 id="metrics--graphing">Metrics / Graphing</h3>
<ul>
<li><a href="http://www.jmxtrans.org/">Jmxtrans</a>: Extraction des metriques exportées via JMX</li>
<li><a href="http://graphite.wikidot.com/">Graphite</a>: Stockage et exploitation des metrics (calculs…), rendu (image, texte…). Graphite se compose de de trois applicatifs Python carbon (listener), graphite (UI), whisper (stockage RRD)</li>
<li><a href="http://grafana.org/">Graphana</a>: Il permet de constituer des dashboards autour de metrics Graphite (entre autre); ce projet est une perle.</li>
</ul>
<h3 id="logs">Logs</h3>
<ul>
<li><a href="https://www.elastic.co/products/logstash">Logstash</a>, <a href="https://flume.apache.org/">flume</a>: Collecte des logs</li>
<li><a href="https://www.elastic.co/products/elasticsearch">Elasticsearch</a>: Stockage, query</li>
<li><a href="https://www.elastic.co/products/kibana">Kibana</a>: Exploitation des logs, visualisation et extraction.</li>
</ul>
<h3 id="alerting-1">Alerting</h3>
<ul>
<li><a href="https://github.com/scobal/seyren">Seyren</a>: Application d’alerting qui se branche à graphite. Il possède un nombre appréciable de canaux. Il nécessite MongoDB.</li>
</ul>
<hr />
<p>La liste n’est pas exhaustive et il y a bien sur des variations, <a href="http://influxdb.com/">InfluxDB</a> au lieu de graphite, Nagios au lieu de Seyren…
La liste des fonctionnalités des projets listés ci-dessous, une fois mis bout à bout, est <strong>conséquente</strong>. Cependant le nombre d’applicatifs impliqués dans la chaine est important
et la mise en haute disponibilité de chacun de ces élements pourrait faire l’objet d’un sujet dédié
(eg. pour graphite <a href="https://grey-boundary.io/the-architecture-of-clustering-graphite/">The architecture of clustering Graphite</a>).</p>
<p>Pour m’être frotté à la mise en place de cette stack open source avec en prime sa <em>puppetisation</em>, le faire ne fut pas nécessairement rapide et simple (et c’est sans parler de
l’aspect haute disponibilité). Le résultat était satisfaisant surtout sur les aspects monitoring. Difficile de justifier l’installation d’un mongodb pour le seul besoin de stocker des alertes.</p>
<p>Demeurait l’impression que le dashboard opérationnel était constitué de trois applications (Graphana, Kibana, Seyren) non intégré.</p>
<h2 id="pour-conclure">Pour conclure</h2>
<p><strong>Le point majeur est à mon sens l’aspect intégré et homogène de la solution</strong>. Qui a un coût bien sur.</p>
<p>Splunk a rempli son rôle est un minimum d’installation et de manipulation et les mêmes principes sont appliqués du serveur au forwarder.</p>
<p>L’aspect exploitation des logs est à première vu satisfaisant. Reste à valider que cette typologie d’accès au log se rapproche d’une consultation <code class="highlighter-rouge">vi</code>.</p>
<p>Mon bémol porte pour l’instant sur les aspects graphing qui bien <a href="http://docs.splunk.com/Documentation/Splunk/6.2.3/Viz/Visualizationreference#Charts">qu’il soit riche</a> ne me semble
pas égaler les possibilités et la facilité offertes par Graphite.</p>
Jenkins Workflow - Pipeline de release2015-05-14T14:58:46+00:00http://nithril.github.io/ci/2015/05/14/jenkins-workflow-release-part2<p><img style="float: left;margin-right:20px;" src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/jenkins.png" />
Votre projet est prêt à être releasé. Cela implique traditionnellement un ensemble d’étapes qu’il convient d’orchestrer
avec Jenkins en réutilisant si possible le pipeline de <code class="highlighter-rouge">compilation -> test -> package</code> tout en offrant un niveau d’automatisation satisfaisant.</p>
<p>Nous allons voir comment le plugin <code class="highlighter-rouge">Workflow</code> peut résoudre cette problématique.</p>
<!--more-->
<h2 id="introduction">Introduction</h2>
<p>Votre projet est prêt à être releasé. Cela implique traditionnellement :</p>
<ul>
<li>Une entrée utilisateur indiquant la version suivante</li>
<li>La suppression des qualifiers <code class="highlighter-rouge">SNAPSHOT</code> et la vérification que toutes les dépendances sont bien des versions non <code class="highlighter-rouge">SNAPSHOT</code></li>
<li>Une execution du pipeline de <code class="highlighter-rouge">compilation -> test -> package</code></li>
<li>Et quand tout ce passe bien, le passage à la version suivante</li>
<li>Et en cas d’erreur, un rollback à la version courante</li>
</ul>
<p>Soit: <code class="highlighter-rouge">prepare release -> compilation -> test -> package -> (next iteration | rollback)</code></p>
<p>Ces étapes peuvent se faire <em>simplement</em> en utilisant <a href="http://maven.apache.org/maven-release/maven-release-plugin/">le plugin release de maven</a>.
<strong>La problématique est qu’il est une spécificité et qu’il bypass complétement notre pipeline de <code class="highlighter-rouge">compilation -> test -> package</code></strong> avec toutes les spécificités qu’il peut contenir.</p>
<blockquote>
<p>La difficulté avec Jenkins va être d’ajouter en queue et en tête de notre pipeline usuel les deux étapes susnommées.
Jenkins ne permet pas, avec un systeme upstream/downstream basé sur des triggers de type post build, d’attendre la fin d’un pipeline (entendez avec une profondeur > 1).</p>
</blockquote>
<p>Cela doit passer par :</p>
<ul>
<li>Un freestyle job avec step de build bloquant de type “Trigger/call builds on other projects”</li>
<li>Un job de type <a href="https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin">MultiJob</a>,</li>
<li>Un job de type <a href="https://wiki.jenkins-ci.org/display/JENKINS/Build+Flow+Plugin">Build flow plugin</a></li>
<li>Un job de type <a href="https://github.com/jenkinsci/workflow-plugin">Workflow plugin</a> qui semble remplacer à terme le Build flow plugin</li>
<li>Autre?</li>
</ul>
<p>Pour cet article je vais utiliser le dernier: <a href="https://github.com/jenkinsci/workflow-plugin">Workflow plugin</a>.</p>
<h2 id="workflow-plugin">Workflow plugin</h2>
<p>Le workflow plugin ajoute un nouveau type de job nommé <code class="highlighter-rouge">Workflow</code>.</p>
<p><img src="/assets/2015-05-14-jenkins-workflow-release-part2/newjob-workflow.png" alt="JobDsl" /></p>
<p>Un job de type <code class="highlighter-rouge">Freestyle</code> permet de décrire au travers d’une interface utilisateur les steps constituants un job.
L’approche par interface utilisateur, si elle a l’avantage d’être visuelle et de guider l’utilisateur, a l’inconvénient d’une certaine rigidité.
Le job de type workflow permet de décrire au travers d’un DSL Groovy les steps constituants un job. On retrouve donc les steps d’un job classique sous la forme d’une DSL
augmenté de la puissance d’un langage de programmation (variable, condition, boucle…). Une approche donc plus flexible, mais plus technique.
Le script peut, comme pour un job de type DSL, être stocké dans le job ou dans un SCM.</p>
<p><img src="/assets/2015-05-14-jenkins-workflow-release-part2/newjob-workflow-script.png" alt="JobDsl" /></p>
<p>La documentation est spartiate voir inexistante, heureusement l’interface offre un snippet generator:</p>
<p><img src="/assets/2015-05-14-jenkins-workflow-release-part2/newjob-workflow-generator.png" alt="JobDsl" /></p>
<p>Pour plus d’information sur la big picture de ce plugin je vous invite à consulter la présentation de CloudBees
<a href="http://www.slideshare.net/cloudbees/jenkins-workflow-webinar-dec-10-2014">Jenkins Workflow Webinar - Dec 10, 2014</a></p>
<h2 id="pipeline-de-release">Pipeline de release</h2>
<p>Le pipeline de release sera articulé autour de 5 steps: <code class="highlighter-rouge">prepare release -> compilation -> test -> package -> (next iteration | rollback)</code>.
Ce pipeline utilisera un paramètre <code class="highlighter-rouge">NEXT_VERSION</code> qui est la version de la prochaine itération. Il sera saisi par l’utilisateur.</p>
<h3 id="step-1--prepare-release">Step 1 : Prepare release</h3>
<p>Ce step clone le projet, supprime le qualifier SNAPSHOT de <strong>l’ensemble des versions</strong>, dependances comprises, puis il commit les modifications.
Pour ce faire je vais utiliser un script shell. Pourquoi ne pas utiliser le plugin versions de maven ?</p>
<p>Ce plugin ne permet pas de supprimer le qualifier SNAPSHOT de la version du projet ni de supprimer ce qualifier dans un projet multi module définissant
une version au travers d’une propriété définie dans le POM root.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="n">sh</span> <span class="s1">'rm -Rf * .git'</span>
<span class="n">git</span> <span class="nl">url:</span><span class="n">REPOSITORY</span>
<span class="n">sh</span> <span class="s1">'git checkout master'</span>
<span class="n">sh</span> <span class="s1">'find . -name "pom.xml" | xargs -I file sed -i.bak file -e "s/-SNAPSHOT//"'</span>
<span class="n">sh</span> <span class="s1">'git commit --allow-empty -am "Release"'</span>
<span class="n">sh</span> <span class="s1">'git push'</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Ce snippet utilise 2 commandes DSL,<code class="highlighter-rouge">sh</code> et <code class="highlighter-rouge">git</code>, dont le nom est suffisamment explicite pour se passer d’explication.</p>
<p>La ligne 1 est l’équivalent d’un clean workspace. Les lignes 2 et 3 sont atypiques. Pourquoi ne pas simplement cloner le répertoire via un git clone?
Jenkins crée des <code class="highlighter-rouge">dot</code> répertoires dans le workspace courant et git n’apprécie pas de cloner dans un répertoire non vide.
A l’inverse la commande DSL <code class="highlighter-rouge">git</code> se positionne sur la référence du master en mode détaché.
Bref…</p>
<h3 id="step-234--compilation---test---package">Step 2,3,4 : compilation -> test -> package</h3>
<p>Les jobs construits dans la partie 1 de cette suite utilisent des relations upstream/downstream basées sur des triggers de type post build.
Comme précisé dans l’introduction, cette relation doit être deconstruite au profit d’une relation qui sera définie dans le job de release. Ormis cette différence,
les jobs restent inchangés.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="n">build</span> <span class="s1">'Project 1 - Compile'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Test'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Package'</span></pre></td></tr></tbody></table></code></pre></figure>
<p>La commande DSL <code class="highlighter-rouge">build</code> invoque un build. Elle peut prendre différents paramètres comme <code class="highlighter-rouge">Wait for completion</code>, <code class="highlighter-rouge">Propagate errors</code> ainsi que les paramètres du job.
Par exemple, et cela est à ma connaissance la seule manière de faire:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="n">build</span> <span class="nl">job:</span><span class="s1">'Foo'</span><span class="o">,</span> <span class="nl">parameters:</span> <span class="o">[[</span><span class="n">$class</span><span class="o">:</span> <span class="s1">'StringParameterValue'</span><span class="o">,</span> <span class="nl">name:</span> <span class="s1">'FOO'</span><span class="o">,</span> <span class="nl">value:</span> <span class="s1">'BAR'</span><span class="o">]]</span></pre></td></tr></tbody></table></code></pre></figure>
<p>En passant <code class="highlighter-rouge">[['FOO' : 'BAR']]</code> serait plus élégant.</p>
<p>Le step suivant, <code class="highlighter-rouge">(next iteration | rollback)</code>, est conditionné au résultat du présent step. Si les trois jobs passent, le projet est releasé, si un des trois
failed, le projet est rollbacké à sa version courante.</p>
<p>Cela peut s’exprimer simplement par un block <code class="highlighter-rouge">try-catch</code> car l’echec d’un build lance une exception:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="kt">def</span> <span class="n">success</span> <span class="o">=</span> <span class="kc">true</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">build</span> <span class="s1">'Project 1 - Compile'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Test'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Package'</span>
<span class="o">}</span> <span class="k">catch</span><span class="o">(</span><span class="n">e</span><span class="o">){</span>
<span class="n">success</span> <span class="o">=</span> <span class="kc">false</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<p>Il y a différente manière de procéder, par exemple en utilisant le retour de build qui est de type
<a href="https://github.com/jenkinsci/workflow-plugin/blob/master/support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/build/RunWrapper.java"><code class="highlighter-rouge">RunWrapper</code></a>:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre> <span class="kt">def</span> <span class="n">compileBuild</span> <span class="o">=</span> <span class="n">build</span> <span class="nl">job:</span> <span class="s1">'Project 1 - Compile'</span><span class="o">,</span> <span class="nl">propagate:</span> <span class="kc">false</span>
<span class="kt">def</span> <span class="n">success</span> <span class="o">=</span> <span class="s1">'SUCCESS'</span> <span class="o">==</span> <span class="n">compileBuild</span><span class="o">.</span><span class="na">result</span> </pre></td></tr></tbody></table></code></pre></figure>
<p>Le <code class="highlighter-rouge">propagate</code> mis à false est indispensable pour bloquer le lancement d’une exception en cas d’échec.</p>
<p>Notons également la commande <code class="highlighter-rouge">catchError</code> qui a une portée plus globale au bloc en cours d’exécution, voir <a href="https://github.com/jenkinsci/workflow-plugin/blob/master/basic-steps/src/main/resources/org/jenkinsci/plugins/workflow/steps/CatchErrorStep/help.html">l’aide du snippet generator</a></p>
<h3 id="job--next-iteration">Job : Next Iteration</h3>
<p>Ce step modifie la version en utilisant celle saisie par l’utilisateur. Pour ce faire j’utilise simplement
le plugin Maven Versions:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="kt">def</span> <span class="n">mvnHome</span> <span class="o">=</span> <span class="n">tool</span> <span class="s1">'Maven 3.2.2'</span>
<span class="n">sh</span> <span class="s2">"${mvnHome}/bin/mvn versions:set -DnewVersion=${NEXT_VERSION} -DgenerateBackupPoms=false"</span>
<span class="n">sh</span> <span class="s1">'git commit --allow-empty -am "Next Version"'</span>
<span class="n">sh</span> <span class="s1">'git push'</span></pre></td></tr></tbody></table></code></pre></figure>
<p>La ligne 1 permet de déclarer et d’utiliser un outil (ici Maven) préalablement défini dans les settings Jenkins</p>
<p><img src="/assets/2015-05-14-jenkins-workflow-release-part2/settings-maven.png" alt="JobDsl" /></p>
<h3 id="job--rollback">Job : Rollback</h3>
<p>Ce step rollback la version en utilisant celle initiale.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="kt">def</span> <span class="n">mvnHome</span> <span class="o">=</span> <span class="n">tool</span> <span class="s1">'Maven 3.2.2'</span>
<span class="n">sh</span> <span class="s2">"${mvnHome}/bin/mvn versions:set -DnewVersion=${currentVersion} -DgenerateBackupPoms=false"</span>
<span class="n">sh</span> <span class="s1">'git commit --allow-empty -am "Rollback"'</span>
<span class="n">sh</span> <span class="s1">'git push'</span></pre></td></tr></tbody></table></code></pre></figure>
<h3 id="orchestrateur">Orchestrateur</h3>
<p>Les 5 steps sont mis en musique par un job de type <code class="highlighter-rouge">Workflow</code> qui se charge de l’orchestration.
Ce job possède un paramètre qui sera à saisir par l’utilisateur: <code class="highlighter-rouge">NEXT_VERSION</code> est la version de la prochaine itération.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
</pre></td><td class="code"><pre><span class="n">node</span> <span class="o">{</span>
<span class="c1">//workspace cleanup</span>
<span class="n">sh</span> <span class="s1">'rm -Rf * .git'</span>
<span class="c1">//checkout the repository</span>
<span class="n">git</span> <span class="nl">url:</span> <span class="s1">'https://github.com/nithril/jenkins-jobdsl-project1.git'</span>
<span class="n">sh</span> <span class="s1">'git checkout master'</span>
<span class="c1">//extract the current version</span>
<span class="kt">def</span> <span class="n">pom</span> <span class="o">=</span> <span class="n">readFile</span> <span class="s1">'pom.xml'</span>
<span class="kt">def</span> <span class="n">currentVersion</span> <span class="o">=</span> <span class="k">new</span> <span class="n">XmlParser</span><span class="o">().</span><span class="na">parseText</span><span class="o">(</span><span class="n">pom</span><span class="o">).</span><span class="na">version</span><span class="o">.</span><span class="na">text</span><span class="o">()</span>
<span class="c1">//remove the snapshot qualifier</span>
<span class="n">sh</span> <span class="s1">'find . -name "pom.xml" | xargs -I file sed -i.bak file -e "s/-SNAPSHOT//"'</span>
<span class="c1">//push the change</span>
<span class="n">commitAndPush</span><span class="o">(</span><span class="s2">"Release ${currentVersion}"</span><span class="o">)</span>
<span class="kt">def</span> <span class="n">success</span> <span class="o">=</span> <span class="kc">true</span>
<span class="c1">//compile -> test -> package</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">build</span> <span class="s1">'Project 1 - Compile'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Test'</span>
<span class="n">build</span> <span class="s1">'Project 1 - Package'</span>
<span class="o">}</span>
<span class="k">catch</span> <span class="o">(</span><span class="n">e</span><span class="o">)</span> <span class="o">{</span>
<span class="n">success</span> <span class="o">=</span> <span class="kc">false</span>
<span class="n">echo</span> <span class="s2">"Error during the compile -> test -> package : ${e}"</span>
<span class="n">echo</span> <span class="s1">'The release will be rollbacked'</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">success</span><span class="o">)</span> <span class="o">{</span>
<span class="n">mavenSetVersion</span><span class="o">(</span><span class="n">NEXT_VERSION</span><span class="o">)</span>
<span class="n">commitAndPush</span><span class="o">(</span><span class="s2">"Next Version ${NEXT_VERSION}"</span><span class="o">)</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="n">mavenSetVersion</span><span class="o">(</span><span class="n">currentVersion</span><span class="o">)</span>
<span class="n">commitAndPush</span><span class="o">(</span><span class="s2">"Rollback to ${currentVersion}"</span><span class="o">)</span>
<span class="n">error</span> <span class="s1">'Error during the compile -> test -> package'</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kt">def</span> <span class="nf">commitAndPush</span><span class="o">(</span><span class="n">message</span><span class="o">)</span> <span class="o">{</span>
<span class="n">sh</span> <span class="s2">"git commit --allow-empty -am \"${message}\""</span>
<span class="n">sh</span> <span class="s1">'git push'</span>
<span class="o">}</span>
<span class="kt">def</span> <span class="nf">mavenSetVersion</span><span class="o">(</span><span class="n">newVersion</span><span class="o">)</span> <span class="o">{</span>
<span class="kt">def</span> <span class="n">mvnHome</span> <span class="o">=</span> <span class="n">tool</span> <span class="s1">'Maven 3.2.2'</span>
<span class="n">sh</span> <span class="s2">"${mvnHome}/bin/mvn versions:set -DnewVersion=${newVersion} -DgenerateBackupPoms=false"</span>
<span class="o">}</span></pre></td></tr></tbody></table></code></pre></figure>
<p>La commande <code class="highlighter-rouge">node</code> permet d’allouer un <code class="highlighter-rouge">executor</code> et un <code class="highlighter-rouge">workspace</code> sur un noeud Jenkins.
L’extraction de la version courante à partir du <code class="highlighter-rouge">POM</code> se fait au travers un parsing XML Groovy du résultat de la commande <code class="highlighter-rouge">readFile</code> (ligne 10 et 11).
On retrouve ensuite les snippet élaborés dans les steps ci-dessus. Pour le besoin de ce job j’ai créé deux fonctions <code class="highlighter-rouge">commitAndPush</code> et <code class="highlighter-rouge">mavenSetVersion</code>.</p>
<h2 id="résultat">Résultat</h2>
<p>L’execution du job donne <a href="https://gist.github.com/nithril/c9bec727e22a48cc3464">le resultat suivant en sortie de console</a>.</p>
<p>Le menu <code class="highlighter-rouge">Running Steps</code> permet de voir les steps executés.</p>
<p><img src="/assets/2015-05-14-jenkins-workflow-release-part2/orchestrator-job-steps.png" alt="JobDsl" /></p>
<p>Le clique sur l’icone de la console associé à un step affiche les logs du step correspondant. Seul petit bémol, la sortie d’un step de type <code class="highlighter-rouge">build</code> se résume
à l’affichage de <code class="highlighter-rouge">Starting building project: Project 1 - Compile</code> et non pas à la sortie du job sous jacent.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Il m’a demandé de revoir la conception classique que j’avais des pipelines Jenkins à base de relations upstream/downstream non bloquantes.
Le code est relativement concis et surtout localisé et auto suffisant pour comprendre l’entièreté du workflow sans avoir à naviguer dans les relations downstreams ou
à utiliser des plugins pour mettre en oeuvre des conditions.</p>
<p>Le manque de documentation rend la conception fastidieuse. C’est encore un plugin jeune et la compatibilité avec les plugins existants n’est pas automatique
<a href="https://github.com/jenkinsci/workflow-plugin/blob/master/COMPATIBILITY.md">mais va en s’améliorant</a></p>
<blockquote>
<p>For architectural reasons, plugins providing various extensions of interest to builds cannot be made automatically compatible with Workflow. Typically they require use of some newer APIs, large or small.</p>
</blockquote>
<p>La visualisation de l’ensemble pourrait être travaillée. La vue <code class="highlighter-rouge">Running Steps</code> pourrait compléter une vue de plus haut niveau où l’utilisateur aurait la capacité
de définir des steps de haut niveau à l’image du pipeline <code class="highlighter-rouge">prepare release -> compilation -> test -> package -> (next iteration | rollback)</code> et du plugin Build Pipeline.</p>
<p>Dans la partie 3, je reprendrai la partie 1 et la partie 2 suivant ce nouveau paradigme pour générer le pipeline nominal
<code class="highlighter-rouge">compilation -> test -> package</code> et celui de release <code class="highlighter-rouge">prepare release -> compilation -> test -> package -> (next iteration | rollback)</code></p>
Jenkins Job DSL - Création automatisée de pipelines2015-04-22T21:45:46+00:00http://nithril.github.io/ci/2015/04/22/jenkins-job-dsl-pipeline-part1<p><img style="float: left;margin-right:20px;" src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/jenkins.png" />
Jenkins peut rapidement devenir une usine à jobs. Sans être nécessairement Netflix aux 1001 projets, l’application de certains paradigmes peut faire
augmenter significativement le nombre de jobs.</p>
<p>Nous allons voir comment le plugin <code class="highlighter-rouge">Job DSL</code> peut résoudre cette problématique.</p>
<!--more-->
<h2 id="introduction">Introduction</h2>
<p>L’application de certains paradigmes peut faire augmenter significativement le nombre de jobs:</p>
<ul>
<li>La modularisation applicative</li>
<li>La réutilisation de briques techniques</li>
<li>Les utils que l’on peut être amené à développer</li>
<li>Le découpage du pipeline d’un projet en jobs jenkins associés aux étapes d’intégration et de déploiement continue: On part de <code class="highlighter-rouge">compilation</code>, <code class="highlighter-rouge">test</code> et <code class="highlighter-rouge">deployment</code>
pour arriver à des projets associés à N jobs.</li>
</ul>
<p>Suivant le dernier point, 10 projets vont générer au moins 30 jobs. Enjoy.</p>
<h2 id="objectif">Objectif</h2>
<p>L’idée face à cette configuration est de maintenir un pool de job type <strong>maitrisé</strong> et de ne pas multiplier les projets avec des configurations spécifiques.</p>
<p>Il reste tout de même la problématique de création et de maintenance des jobs.
Même si créer un job à <em>partir de</em> reste simple (quoi que source du syndrome du copier/coller), configurer le tout sous la forme d’un pipeline reste fastidieux.
Vient ensuite la maintenance de ces jobs où la modification d’un job type va toucher l’ensemble des jobs de ce type.
Sans évoquer l’ajout d’un nouveau job type s’intercalant entre deux types.</p>
<p>Il existe plusieurs moyens qui apportent des solutions à des niveaux différents:</p>
<ul>
<li><a href="https://wiki.jenkins-ci.org/display/JENKINS/Template+Project+Plugin">Template project plugin</a> permet de partager les builders d’un job</li>
<li><a href="https://www.cloudbees.com/products/jenkins-enterprise/plugins/templates-plugin">Templates Plugin</a> permet de définir des templates réutilisable de builder, job, folder,
auxiliary. Il est disponible dans la version Enterprise de Cloudbees</li>
<li><a href="https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin">Job DSL Plugin</a></li>
<li>Et sûrement d’autres plugins</li>
</ul>
<h2 id="dont-celui-qui-nous-interesse-le-job-dsl-plugin">Dont celui qui nous interesse, le <code class="highlighter-rouge">Job DSL Plugin</code></h2>
<blockquote>
<p>The job-dsl-plugin allows the programmatic creation of projects using a DSL. Pushing job creation into a script allows you to automate and standardize
your Jenkins installation, unlike anything possible before.</p>
</blockquote>
<p>Le tout exprimé dans le langage <a href="http://www.groovy-lang.org/">Groovy</a> qui propose
<a href="http://docs.groovy-lang.org/docs/latest/html/documentation/core-domain-specific-languages.html">des fonctionnalités poussées pour créer des DSL</a>.</p>
<p>Par exemple le script ci-dessous (tiré de la page du plugin)
permet de générer autant de job qu’il existe de branche sur le projet GitHub:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="kt">def</span> <span class="n">project</span> <span class="o">=</span> <span class="s1">'quidryan/aws-sdk-test'</span>
<span class="kt">def</span> <span class="n">branchApi</span> <span class="o">=</span> <span class="k">new</span> <span class="n">URL</span><span class="o">(</span><span class="s2">"https://api.github.com/repos/${project}/branches"</span><span class="o">)</span>
<span class="kt">def</span> <span class="n">branches</span> <span class="o">=</span> <span class="k">new</span> <span class="n">groovy</span><span class="o">.</span><span class="na">json</span><span class="o">.</span><span class="na">JsonSlurper</span><span class="o">().</span><span class="na">parse</span><span class="o">(</span><span class="n">branchApi</span><span class="o">.</span><span class="na">newReader</span><span class="o">())</span>
<span class="n">branches</span><span class="o">.</span><span class="na">each</span> <span class="o">{</span>
<span class="kt">def</span> <span class="n">branchName</span> <span class="o">=</span> <span class="n">it</span><span class="o">.</span><span class="na">name</span>
<span class="n">job</span> <span class="o">{</span>
<span class="n">name</span> <span class="s2">"${project}-${branchName}"</span><span class="o">.</span><span class="na">replaceAll</span><span class="o">(</span><span class="s1">'/'</span><span class="o">,</span><span class="s1">'-'</span><span class="o">)</span>
<span class="n">scm</span> <span class="o">{</span>
<span class="n">git</span><span class="o">(</span><span class="s2">"git://github.com/${project}.git"</span><span class="o">,</span> <span class="n">branchName</span><span class="o">)</span>
<span class="o">}</span>
<span class="n">steps</span> <span class="o">{</span>
<span class="n">maven</span><span class="o">(</span><span class="s2">"test -Dproject.name=${project}/${branchName}"</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Clair et concise, la configuration tient dans 1/3 d’écran avec juste ce qu’il y a à configurer.</p>
<h3 id="prerequisite">Prerequisite</h3>
<p>Obviously, un jenkins (au moins le LTS) et le <code class="highlighter-rouge">Job DSL Plugin</code>.</p>
<h2 id="définition-des-job-types">Définition des job types</h2>
<p>Je vais prendre ici le cas le plus simple, 3 job types différents : <code class="highlighter-rouge">compilation</code>, <code class="highlighter-rouge">test</code>, <code class="highlighter-rouge">package</code> (pour s’abstraire du déploiement).
L’objectif premier est de qualifier ces types et de définir la structure décrivant ces jobs.</p>
<p>Nous avons donc une liste de projets qui sont caractérisés par un identifiant, un nom et une url scm.</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"project1"</span><span class="p">,</span><span class="w">
</span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Project 1"</span><span class="p">,</span><span class="w">
</span><span class="s2">"scm"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://github.com/nithril/jenkins-jobdsl-project1.git"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"project2"</span><span class="p">,</span><span class="w">
</span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Project 2"</span><span class="p">,</span><span class="w">
</span><span class="s2">"scm"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://github.com/nithril/jenkins-jobdsl-project2.git"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span></code></pre></figure>
<ul>
<li>Le type <code class="highlighter-rouge">compilation</code> est un job maven qui lance une compilation.</li>
<li>Le type <code class="highlighter-rouge">test</code> est un job maven qui lance les tests.</li>
<li>Le type <code class="highlighter-rouge">package</code> est un freestyle job qui package l’artefact maven et copie le tout dans <code class="highlighter-rouge">/dev/null</code></li>
</ul>
<h2 id="création-du-job-de-génération">Création du job de génération</h2>
<ol>
<li>Je crée donc un job de type <code class="highlighter-rouge">Freestyle project</code> que je nomme <code class="highlighter-rouge">Generate Jobs</code></li>
<li>Je lui ajoute un build step de type <code class="highlighter-rouge">Process Job DSLs</code>.
Première option intéressante, le script peut être mis directement dans la configuration du job ou stocké sur le filesystem suite, par exemple, à un clone de son repository.</li>
<li>J’ajoute le SCM Git sur l’url <a href="https://github.com/nithril/jenkins-jobdsl.git">du projet</a> contenant le script groovy
<img src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/job-dsl-scm.png" alt="JobDsl" /></li>
<li>Et je paramêtre le chemin vers le fichier
<img src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/jobstep.png" alt="JobDsl" /></li>
</ol>
<h2 id="description-du-script">Description du script</h2>
<h3 id="chargement-de-la-structure-des-projets">Chargement de la structure des projets</h3>
<p>J’utilise pour cela le <a href="http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html">JsonSlurper</a>: <code class="highlighter-rouge">JSON slurper parses text or reader content into a data structure of lists and maps.</code></p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="kn">import</span> <span class="nn">groovy.json.JsonSlurper</span>
<span class="kt">def</span> <span class="n">projects</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JsonSlurper</span><span class="o">().</span><span class="na">parseText</span><span class="o">(</span><span class="n">readFileFromWorkspace</span><span class="o">(</span><span class="s2">"src/main/groovy/project.json"</span><span class="o">))</span></code></pre></figure>
<h3 id="iteration-sur-les-projets-et-création-des-jobs">Iteration sur les projets et création des jobs</h3>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="n">projects</span><span class="o">.</span><span class="na">each</span> <span class="o">{</span> <span class="n">project</span> <span class="o">-></span>
<span class="c1">//Define projects name</span>
<span class="kt">def</span> <span class="n">compileProjectName</span> <span class="o">=</span> <span class="s2">"${project.name} - Compile"</span>
<span class="kt">def</span> <span class="n">testProjectName</span> <span class="o">=</span> <span class="s2">"${project.name} - Test"</span>
<span class="kt">def</span> <span class="n">packageProjectName</span> <span class="o">=</span> <span class="s2">"${project.name} - Package"</span>
<span class="c1">//Compile Job</span>
<span class="n">mavenJob</span><span class="o">(</span><span class="n">compileProjectName</span><span class="o">)</span> <span class="o">{</span>
<span class="n">projectScm</span><span class="o">(</span><span class="n">delegate</span><span class="o">,</span> <span class="n">project</span><span class="o">)</span>
<span class="n">goals</span> <span class="s2">"compile"</span>
<span class="n">publishers</span> <span class="o">{</span>
<span class="n">downstream</span> <span class="n">testProjectName</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="c1">//Test Job</span>
<span class="n">mavenJob</span><span class="o">(</span><span class="n">testProjectName</span><span class="o">)</span> <span class="o">{</span>
<span class="n">projectScm</span><span class="o">(</span><span class="n">delegate</span><span class="o">,</span> <span class="n">project</span><span class="o">)</span>
<span class="n">goals</span> <span class="s2">"test"</span>
<span class="n">publishers</span> <span class="o">{</span>
<span class="n">downstream</span> <span class="n">packageProjectName</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="c1">//Package Job</span>
<span class="n">freeStyleJob</span><span class="o">(</span><span class="n">packageProjectName</span><span class="o">)</span> <span class="o">{</span>
<span class="n">projectScm</span><span class="o">(</span><span class="n">delegate</span><span class="o">,</span> <span class="n">project</span><span class="o">)</span>
<span class="n">steps</span> <span class="o">{</span>
<span class="n">maven</span> <span class="o">{</span>
<span class="n">goals</span> <span class="s2">"package"</span>
<span class="n">mavenInstallation</span> <span class="s2">"Maven 3.2.2"</span>
<span class="o">}</span>
<span class="n">shell</span> <span class="s2">"cp submodule/target/*.jar /dev/null"</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Les 2 projects ont respectivement leurs 3 jobs de définis avec leurs relations <code class="highlighter-rouge">downstream</code>.
<img src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/list-jobs.png" alt="JobDsl" /></p>
<h3 id="description-de-projectscm">Description de <code class="highlighter-rouge">projectScm</code></h3>
<p><code class="highlighter-rouge">projectScm</code> est la factorisation du bloc de configuration du SCM ci-dessous qui serait autrement à répeter dans chaque job:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="n">scm</span> <span class="o">{</span>
<span class="n">git</span> <span class="o">{</span>
<span class="n">remote</span> <span class="o">{</span>
<span class="n">url</span> <span class="n">project</span><span class="o">.</span><span class="na">scm</span>
<span class="o">}</span>
<span class="n">branch</span> <span class="s2">"master"</span>
<span class="n">createTag</span> <span class="kc">false</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>J’ai donc sorti ce bloc que j’ai mis dans une closure Groovy pour pouvoir l’utiliser dans la DSL Jenkins.
Il est cependant nécessaire de définir par délégation le scope d’évalutation de cette closure qui est mis au scope du DSL Jenkins en cours d’execution.
Pour plus d’information voir cet excellent article: <a href="http://java.dzone.com/articles/groovy-closures-owner-delegate">Groovy Closures: this, owner, delegate Let’s Make a DSL</a>.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy">
<span class="kt">def</span> <span class="n">projectScm</span> <span class="o">=</span> <span class="o">{</span> <span class="n">owner</span><span class="o">,</span> <span class="n">project</span> <span class="o">-></span>
<span class="n">delegate</span> <span class="o">=</span> <span class="n">owner</span>
<span class="n">scm</span> <span class="o">{</span>
<span class="n">git</span> <span class="o">{</span>
<span class="n">remote</span> <span class="o">{</span>
<span class="n">url</span> <span class="n">project</span><span class="o">.</span><span class="na">scm</span>
<span class="o">}</span>
<span class="n">branch</span> <span class="s2">"master"</span>
<span class="n">createTag</span> <span class="kc">false</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h1 id="et-pour-finir">Et pour finir</h1>
<p>Je crée une vue de type Pipeline que je mets en fin d’itération</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy">
<span class="c1">//Create the Pipeline view</span>
<span class="n">buildPipelineView</span><span class="o">(</span><span class="s2">"${project.name}"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">selectedJob</span> <span class="n">compileProjectName</span>
<span class="n">displayedBuilds</span> <span class="mi">3</span>
<span class="n">showPipelineParameters</span> <span class="kc">true</span>
<span class="n">showPipelineParametersInHeaders</span> <span class="kc">true</span>
<span class="n">showPipelineDefinitionHeader</span> <span class="kc">true</span>
<span class="o">}</span></code></pre></figure>
<p><img src="/assets/2015-04-22-jenkins-job-dsl-pipeline-part1/pipeline-project1.png" alt="JobDsl" /></p>
<h1 id="en-conclusion">En conclusion</h1>
<p>En 55 lignes de code et 12 lignes de fichier de description, j’ai pu définir un process de génération d’un pipeline de projet
qui dans notre exemple a créé 6 jobs et 2 vues de type pipeline.</p>
<p>Dans le prochain article, j’utiliserai un Job dsl pour générer les jobs de release.</p>