Jekyll2018-04-19T10:01:01+08:00https://leohacker.github.io/Divergent MindBlog about machine learning, distributing system, java and programming.Leo JiangNative code in Java2017-04-05T09:41:24+08:002017-04-05T09:41:24+08:00https://leohacker.github.io/java/Native-code-in-Java<p>在Java的源码里面,有些实现部分是使用native代码来实现的,其实也就是用C/CPP来实现。在Java的代码里面,
使用关键字native来标记一个方法是native code。我们最容易发现是由native代码来实现的函数,很可能就是
位于<code class="highlighter-rouge">Object.java</code>中的hashcode, clone这些会出现在经典书籍中的函数。</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@HotSpotIntrinsicCandidate</span>
<span class="kd">protected</span> <span class="kd">native</span> <span class="n">Object</span> <span class="nf">clone</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">CloneNotSupportedException</span><span class="o">;</span>
</code></pre></div></div>
<p>clone()函数是一个很有趣的函数。实现Cloneable接口的类,仅仅需要调用super.clone()就可以生成一个自身的
对象的克隆实例。这里有趣的地方在于,真正实现的代码在Object这类里面,而创建的对象却是调用类的真实对象,克隆
对象会正确的包含调用类的instance fields。而当我们好奇的想看看是怎么实现的,你就会发现在Java的代码中,仅仅
声明了一个返回Object的protected native方法。而且这个native方法还可以抛出一个Java异常,有趣吧。</p>
<p>对于如何正确实现一个Cloneable类,可以参考Effective Java。</p>
<p>那么,实现clone()的native代码在哪里呢?</p>
<ul>
<li>首先,我们找到OpenJDK是如何组织native代码的。Java的native代码是基于JNI技术,在实现上native代码的对应
文件名是Java Class的<code class="highlighter-rouge">.c</code>版本。</li>
<li>于是,我们可以在java.base模块发现好几个native目录,有各个平台依赖的native代码,也有<code class="highlighter-rouge">share/native</code>。</li>
</ul>
<p>clone()的实现就在<code class="highlighter-rouge">Object.c</code>中。不过这里并没有真正的代码实现,不过还是有点线索的。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">JNINativeMethod</span> <span class="n">methods</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span><span class="s">"hashCode"</span><span class="p">,</span> <span class="s">"()I"</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">JVM_IHashCode</span><span class="p">},</span>
<span class="p">{</span><span class="s">"wait"</span><span class="p">,</span> <span class="s">"(J)V"</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">JVM_MonitorWait</span><span class="p">},</span>
<span class="p">{</span><span class="s">"notify"</span><span class="p">,</span> <span class="s">"()V"</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">JVM_MonitorNotify</span><span class="p">},</span>
<span class="p">{</span><span class="s">"notifyAll"</span><span class="p">,</span> <span class="s">"()V"</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">JVM_MonitorNotifyAll</span><span class="p">},</span>
<span class="p">{</span><span class="s">"clone"</span><span class="p">,</span> <span class="s">"()Ljava/lang/Object;"</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">JVM_Clone</span><span class="p">},</span>
<span class="p">};</span>
<span class="n">JNIEXPORT</span> <span class="kt">void</span> <span class="n">JNICALL</span>
<span class="nf">Java_java_lang_Object_registerNatives</span><span class="p">(</span><span class="n">JNIEnv</span> <span class="o">*</span><span class="n">env</span><span class="p">,</span> <span class="n">jclass</span> <span class="n">cls</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">(</span><span class="o">*</span><span class="n">env</span><span class="p">)</span><span class="o">-></span><span class="n">RegisterNatives</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">cls</span><span class="p">,</span>
<span class="n">methods</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">methods</span><span class="p">)</span><span class="o">/</span><span class="k">sizeof</span><span class="p">(</span><span class="n">methods</span><span class="p">[</span><span class="mi">0</span><span class="p">]));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>从上面的代码可以看出,这部分C代码定义了JVM_Clone方法,并将其注册为clone方法的native方法。那么怎么找到JVM_Clone函数呢?
我们可以使用OpenGrok这样的代码搜索引擎来查找JVM_Clone这个函数。</p>
<p>我们会发现,真正的实现在<code class="highlighter-rouge">hotspot/src/share/vm/prims/jvm.cpp</code>, hotspot仓库的jvm.cpp实现里面。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// java.lang.Object ///////////////////////////////////////////////
</span>
<span class="n">JVM_ENTRY</span><span class="p">(</span><span class="n">jobject</span><span class="p">,</span> <span class="n">JVM_Clone</span><span class="p">(</span><span class="n">JNIEnv</span><span class="o">*</span> <span class="n">env</span><span class="p">,</span> <span class="n">jobject</span> <span class="n">handle</span><span class="p">))</span>
<span class="n">JVMWrapper</span><span class="p">(</span><span class="s">"JVM_Clone"</span><span class="p">);</span>
<span class="n">Handle</span> <span class="n">obj</span><span class="p">(</span><span class="n">THREAD</span><span class="p">,</span> <span class="n">JNIHandles</span><span class="o">::</span><span class="n">resolve_non_null</span><span class="p">(</span><span class="n">handle</span><span class="p">));</span>
<span class="k">const</span> <span class="n">KlassHandle</span> <span class="n">klass</span> <span class="p">(</span><span class="n">THREAD</span><span class="p">,</span> <span class="n">obj</span><span class="o">-></span><span class="n">klass</span><span class="p">());</span>
<span class="n">JvmtiVMObjectAllocEventCollector</span> <span class="n">oam</span><span class="p">;</span>
<span class="cp">#ifdef ASSERT
</span> <span class="c1">// Just checking that the cloneable flag is set correct
</span> <span class="k">if</span> <span class="p">(</span><span class="n">obj</span><span class="o">-></span><span class="n">is_array</span><span class="p">())</span> <span class="p">{</span>
<span class="n">guarantee</span><span class="p">(</span><span class="n">klass</span><span class="o">-></span><span class="n">is_cloneable</span><span class="p">(),</span> <span class="s">"all arrays are cloneable"</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">guarantee</span><span class="p">(</span><span class="n">obj</span><span class="o">-></span><span class="n">is_instance</span><span class="p">(),</span> <span class="s">"should be instanceOop"</span><span class="p">);</span>
<span class="n">bool</span> <span class="n">cloneable</span> <span class="o">=</span> <span class="n">klass</span><span class="o">-></span><span class="n">is_subtype_of</span><span class="p">(</span><span class="n">SystemDictionary</span><span class="o">::</span><span class="n">Cloneable_klass</span><span class="p">());</span>
<span class="n">guarantee</span><span class="p">(</span><span class="n">cloneable</span> <span class="o">==</span> <span class="n">klass</span><span class="o">-></span><span class="n">is_cloneable</span><span class="p">(),</span> <span class="s">"incorrect cloneable flag"</span><span class="p">);</span>
<span class="p">}</span>
<span class="cp">#endif
</span>
<span class="c1">// Check if class of obj supports the Cloneable interface.
</span> <span class="c1">// All arrays are considered to be cloneable (See JLS 20.1.5)
</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">klass</span><span class="o">-></span><span class="n">is_cloneable</span><span class="p">())</span> <span class="p">{</span>
<span class="n">ResourceMark</span> <span class="n">rm</span><span class="p">(</span><span class="n">THREAD</span><span class="p">);</span>
<span class="n">THROW_MSG_0</span><span class="p">(</span><span class="n">vmSymbols</span><span class="o">::</span><span class="n">java_lang_CloneNotSupportedException</span><span class="p">(),</span> <span class="n">klass</span><span class="o">-></span><span class="n">external_name</span><span class="p">());</span>
<span class="p">}</span>
<span class="c1">// Make shallow object copy
</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">size</span> <span class="o">=</span> <span class="n">obj</span><span class="o">-></span><span class="n">size</span><span class="p">();</span>
<span class="n">oop</span> <span class="n">new_obj_oop</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">obj</span><span class="o">-></span><span class="n">is_array</span><span class="p">())</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">length</span> <span class="o">=</span> <span class="p">((</span><span class="n">arrayOop</span><span class="p">)</span><span class="n">obj</span><span class="p">())</span><span class="o">-></span><span class="n">length</span><span class="p">();</span>
<span class="n">new_obj_oop</span> <span class="o">=</span> <span class="n">CollectedHeap</span><span class="o">::</span><span class="n">array_allocate</span><span class="p">(</span><span class="n">klass</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">length</span><span class="p">,</span> <span class="n">CHECK_NULL</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">new_obj_oop</span> <span class="o">=</span> <span class="n">CollectedHeap</span><span class="o">::</span><span class="n">obj_allocate</span><span class="p">(</span><span class="n">klass</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">CHECK_NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// 4839641 (4840070): We must do an oop-atomic copy, because if another thread
</span> <span class="c1">// is modifying a reference field in the clonee, a non-oop-atomic copy might
</span> <span class="c1">// be suspended in the middle of copying the pointer and end up with parts
</span> <span class="c1">// of two different pointers in the field. Subsequent dereferences will crash.
</span> <span class="c1">// 4846409: an oop-copy of objects with long or double fields or arrays of same
</span> <span class="c1">// won't copy the longs/doubles atomically in 32-bit vm's, so we copy jlongs instead
</span> <span class="c1">// of oops. We know objects are aligned on a minimum of an jlong boundary.
</span> <span class="c1">// The same is true of StubRoutines::object_copy and the various oop_copy
</span> <span class="c1">// variants, and of the code generated by the inline_native_clone intrinsic.
</span> <span class="n">assert</span><span class="p">(</span><span class="n">MinObjAlignmentInBytes</span> <span class="o">>=</span> <span class="n">BytesPerLong</span><span class="p">,</span> <span class="s">"objects misaligned"</span><span class="p">);</span>
<span class="n">Copy</span><span class="o">::</span><span class="n">conjoint_jlongs_atomic</span><span class="p">((</span><span class="n">jlong</span><span class="o">*</span><span class="p">)</span><span class="n">obj</span><span class="p">(),</span> <span class="p">(</span><span class="n">jlong</span><span class="o">*</span><span class="p">)</span><span class="n">new_obj_oop</span><span class="p">,</span>
<span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">align_object_size</span><span class="p">(</span><span class="n">size</span><span class="p">)</span> <span class="o">/</span> <span class="n">HeapWordsPerLong</span><span class="p">);</span>
<span class="c1">// Clear the header
</span> <span class="n">new_obj_oop</span><span class="o">-></span><span class="n">init_mark</span><span class="p">();</span>
<span class="c1">// Store check (mark entire object and let gc sort it out)
</span> <span class="n">BarrierSet</span><span class="o">*</span> <span class="n">bs</span> <span class="o">=</span> <span class="n">Universe</span><span class="o">::</span><span class="n">heap</span><span class="p">()</span><span class="o">-></span><span class="n">barrier_set</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">bs</span><span class="o">-></span><span class="n">has_write_region_opt</span><span class="p">(),</span> <span class="s">"Barrier set does not have write_region"</span><span class="p">);</span>
<span class="n">bs</span><span class="o">-></span><span class="n">write_region</span><span class="p">(</span><span class="n">MemRegion</span><span class="p">((</span><span class="n">HeapWord</span><span class="o">*</span><span class="p">)</span><span class="n">new_obj_oop</span><span class="p">,</span> <span class="n">size</span><span class="p">));</span>
<span class="n">Handle</span> <span class="n">new_obj</span><span class="p">(</span><span class="n">THREAD</span><span class="p">,</span> <span class="n">new_obj_oop</span><span class="p">);</span>
<span class="c1">// Special handling for MemberNames. Since they contain Method* metadata, they
</span> <span class="c1">// must be registered so that RedefineClasses can fix metadata contained in them.
</span> <span class="k">if</span> <span class="p">(</span><span class="n">java_lang_invoke_MemberName</span><span class="o">::</span><span class="n">is_instance</span><span class="p">(</span><span class="n">new_obj</span><span class="p">())</span> <span class="o">&&</span>
<span class="n">java_lang_invoke_MemberName</span><span class="o">::</span><span class="n">is_method</span><span class="p">(</span><span class="n">new_obj</span><span class="p">()))</span> <span class="p">{</span>
<span class="n">Method</span><span class="o">*</span> <span class="n">method</span> <span class="o">=</span> <span class="p">(</span><span class="n">Method</span><span class="o">*</span><span class="p">)</span><span class="n">java_lang_invoke_MemberName</span><span class="o">::</span><span class="n">vmtarget</span><span class="p">(</span><span class="n">new_obj</span><span class="p">());</span>
<span class="c1">// MemberName may be unresolved, so doesn't need registration until resolved.
</span> <span class="k">if</span> <span class="p">(</span><span class="n">method</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">methodHandle</span> <span class="n">m</span><span class="p">(</span><span class="n">THREAD</span><span class="p">,</span> <span class="n">method</span><span class="p">);</span>
<span class="c1">// This can safepoint and redefine method, so need both new_obj and method
</span> <span class="c1">// in a handle, for two different reasons. new_obj can move, method can be
</span> <span class="c1">// deleted if nothing is using it on the stack.
</span> <span class="n">m</span><span class="o">-></span><span class="n">method_holder</span><span class="p">()</span><span class="o">-></span><span class="n">add_member_name</span><span class="p">(</span><span class="n">new_obj</span><span class="p">(),</span> <span class="nb">false</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Caution: this involves a java upcall, so the clone should be
</span> <span class="c1">// "gc-robust" by this stage.
</span> <span class="k">if</span> <span class="p">(</span><span class="n">klass</span><span class="o">-></span><span class="n">has_finalizer</span><span class="p">())</span> <span class="p">{</span>
<span class="n">assert</span><span class="p">(</span><span class="n">obj</span><span class="o">-></span><span class="n">is_instance</span><span class="p">(),</span> <span class="s">"should be instanceOop"</span><span class="p">);</span>
<span class="n">new_obj_oop</span> <span class="o">=</span> <span class="n">InstanceKlass</span><span class="o">::</span><span class="n">register_finalizer</span><span class="p">(</span><span class="n">instanceOop</span><span class="p">(</span><span class="n">new_obj</span><span class="p">()),</span> <span class="n">CHECK_NULL</span><span class="p">);</span>
<span class="n">new_obj</span> <span class="o">=</span> <span class="n">Handle</span><span class="p">(</span><span class="n">THREAD</span><span class="p">,</span> <span class="n">new_obj_oop</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">JNIHandles</span><span class="o">::</span><span class="n">make_local</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="n">new_obj</span><span class="p">());</span>
<span class="n">JVM_END</span>
</code></pre></div></div>
<p>整个的代码实现相对于其他Object的native方法要长很多,大概是因为有很多后续的操作吧。这里使用JVM_ENTRY宏组织
native方法,可以看出内存的拷贝使用很直接的方法。也可以找到抛出异常的代码。</p>
<p>以clone()这个函数为例子,我们大概就可以找到以后如何查询和分析native代码的思路了。</p>Leo Jiang在Java的源码里面,有些实现部分是使用native代码来实现的,其实也就是用C/CPP来实现。在Java的代码里面, 使用关键字native来标记一个方法是native code。我们最容易发现是由native代码来实现的函数,很可能就是 位于Object.java中的hashcode, clone这些会出现在经典书籍中的函数。Essential Z Shell and Oh My Zsh2016-12-07T22:29:17+08:002016-12-07T22:29:17+08:00https://leohacker.github.io/linux/essential-zsh<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#completion" id="markdown-toc-completion">Completion</a> <ul>
<li><a href="#hippie-complete" id="markdown-toc-hippie-complete">Hippie Complete</a></li>
<li><a href="#path-complete" id="markdown-toc-path-complete">Path Complete</a></li>
<li><a href="#directory-navigation" id="markdown-toc-directory-navigation">Directory Navigation</a></li>
</ul>
</li>
<li><a href="#command-and-environment" id="markdown-toc-command-and-environment">Command and Environment</a></li>
<li><a href="#alias" id="markdown-toc-alias">Alias</a></li>
<li><a href="#globbing" id="markdown-toc-globbing">Globbing</a></li>
<li><a href="#kill-completion" id="markdown-toc-kill-completion">Kill Completion</a></li>
<li><a href="#oh-my-zsh" id="markdown-toc-oh-my-zsh">Oh-my-zsh</a> <ul>
<li><a href="#zsh-syntax-highlighting" id="markdown-toc-zsh-syntax-highlighting">zsh-syntax-highlighting</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<p>Zsh作为用户终端Shell,吸引用户的主要是帮助用户更简洁的输入,改变用户交互的方式,提供更多有效的
信息。本质上,Zsh改变的只是输入过程,实际传递给Shell编程接口解释和执行的还是真正的命令,所以很多
功能都是发挥TAB的作用,让你快捷地输入或扩展参数。</p>
<p>虽然也存在其他Zsh配套的框架工具,不过Oh My Zsh大名顶顶,几乎就是zsh的代名词,使用也不复杂。
没有插件管理,不过似乎问题也不大。</p>
<p>References:</p>
<ul>
<li><a href="https://github.com/robbyrussell/oh-my-zsh/">Oh My Zsh</a></li>
<li><a href="http://reasoniamhere.com/2014/01/11/outrageously-useful-tips-to-master-your-z-shell/">Blog</a></li>
<li><a href="http://www.slideshare.net/jaguardesignstudio/why-zsh-is-cooler-than-your-shell-16194692">Slide</a></li>
</ul>
<h2 id="completion">Completion</h2>
<p>ZSH提供比Bash更好的补全功能。在Bash中,我们可以用TAB补齐命令,当前目录和文件。ZSH的补齐则支持
命令,参数选项(option),路径,而且键入两次TAB后,光标在多个候选之间游历,用户可以直接回车选择。
参数选项,子命令的补全通常是由各种软件对应的插件支持,插件一般也会提供一些alias。</p>
<h3 id="hippie-complete">Hippie Complete</h3>
<p>在Bash中我通常使用<code class="highlighter-rouge">Alt + .</code>来补全最后一个参数。Zsh则带给我们<code class="highlighter-rouge">Alt + /</code>补全,hippie completion,根据
你的输入历史补全当前部分输入的参数。这是一个更通用的补全策略。</p>
<h3 id="path-complete">Path Complete</h3>
<p>路径补全是最基础和最常用的。在Zsh中,如果输入部分string,zsh会缩小匹配的范围。而且匹配不要求必须是前缀,
可以是任意位置开始的substring,所以非常的智能。另外,zsh还支持匹配远程服务器上的路径,Amazing!
例如这篇博客的文件是 2016-12-07-essential-zsh.md,我用编辑器打开的时候可以输入<code class="highlighter-rouge">atom zsh</code>,
然后TAB,ZSH会帮我匹配这个文件的全名。</p>
<p>对于路径的补全,还支持路径扩展的概念,输入路径的时候可以不用一级一级的TAB扩展。如果你准确的知道路径,
可以只输入首字母或可以区分的前缀。例如<code class="highlighter-rouge">ls /u/l/b</code>,想匹配<code class="highlighter-rouge">/usr/local/bin</code>,不过由于<code class="highlighter-rouge">/usr/lib</code>的存在,
要输入<code class="highlighter-rouge">/u/lo/b</code>,然后TAB,缩写路径就会扩展为匹配的实际路径。</p>
<h3 id="directory-navigation">Directory Navigation</h3>
<p>Zsh中有directory stack的概念,你可以输入命令’d’显示最近访问的目录。目录栈里面的目录有数字序号,
可以直接输入序号切换目录。还有autocd功能,你不需要输入命令cd,而是直接输入目录名字,回车,done!
进一步也提供了<code class="highlighter-rouge">...</code>和<code class="highlighter-rouge">....</code>这样的alias,让你可以快速移动。觉得这还不够好,我们有autojump和z的插件,
迅速进入常用目录。让盯着你屏幕看的人lost吧,太爽了。</p>
<p>甚至还有一个更意想不到的功能,路径替换path replacement。假设你想进入/usr/locale/share目录,
但是你手指把你带到了<code class="highlighter-rouge">cd /u/lo/b</code>,这是当前目录已经是<code class="highlighter-rouge">/usr/local/bin</code>,你可以<code class="highlighter-rouge">cd bin share</code>
来修正路径。而且可以不是当前目录层,例如</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd /srv/www/site1/current/log
cd site1 site2
pwd => /srv/www/site2/current/log
</code></pre></div></div>
<p>好吧,实在是太强大了。</p>
<h2 id="command-and-environment">Command and Environment</h2>
<p><code class="highlighter-rouge">C-r</code>是我常用的历史回溯命令,方便我找到用过的命令。在ZSH中,可以输入开头的几个字符,然后就可以
用光标键在历史记录的命令中回溯。ZSH的历史记录与bash不同,是所有shell session共享的,这点也很方便。
zsh-history-substring-search提供了fish shell like历史查找功能。</p>
<p>也可以打开Auto Correction模式,ZSH会自动帮助我们做检查,发现输入的命令错误,提示正确的候选命令。</p>
<p>如果是输入很长的命令,C-x C-e会打开<code class="highlighter-rouge">$EDITOR</code>编辑器,让你编辑当前命令。</p>
<p>环境变量也可以用<code class="highlighter-rouge">vared</code>命令编辑。在ZSH中可以用TAB扩展环境变量,这样就不需要<code class="highlighter-rouge">echo $ENV</code>。</p>
<h2 id="alias">Alias</h2>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># suffix alias 用指定的编辑器打开某种后缀的文件。</span>
<span class="nv">$ </span><span class="nb">alias</span> <span class="nt">-s</span> <span class="nv">cpp</span><span class="o">=</span>vim
<span class="nv">$ </span><span class="nb">alias</span> <span class="nt">-s</span> <span class="nv">log</span><span class="o">=</span><span class="s2">"less -MN"</span>
<span class="nv">$ </span>test.cpp
<span class="nv">$ </span>dev.log
<span class="c"># global alias 在任何位置可以展开的alias,不仅仅是命令的开始位置</span>
<span class="nv">$ </span><span class="nb">alias</span> <span class="nt">-g</span> ...<span class="o">=</span><span class="s1">'../..'</span>
<span class="nv">$ </span><span class="nb">cd</span> ...
<span class="nv">$ </span><span class="nb">alias</span> <span class="nt">-g</span> <span class="nv">X</span><span class="o">=</span><span class="s1">'| xargs'</span>
<span class="nv">$ </span>find <span class="nb">.</span> <span class="nt">-name</span> <span class="s2">"*.pyc"</span> <span class="nt">-type</span> f <span class="nt">-print</span> X /bin/rm <span class="nt">-f</span>
<span class="nv">$ </span><span class="nb">alias</span> <span class="nt">-g</span> <span class="nv">gp</span><span class="o">=</span><span class="s1">'| grep -i'</span>
<span class="nv">$ </span>ps ax gp ruby
<span class="c"># Flag Description</span>
<span class="c"># L print each alias in the form of calls to alias</span>
<span class="c"># g list or define global aliases</span>
<span class="c"># m print aliases matching specified pattern</span>
<span class="c"># r list or define regular aliases</span>
<span class="c"># s list or define suffix aliases</span>
</code></pre></div></div>
<h2 id="globbing">Globbing</h2>
<p>在zsh中甚至可以不使用find命令,而用<code class="highlighter-rouge">ls **/filename</code>代替。这个特殊<code class="highlighter-rouge">**</code>表示匹配任意层目录,而<code class="highlighter-rouge">*</code>
表示仅匹配一层。这种globbing匹配方式本质在命令上展开所有匹配的文件路径,所以如果其他命令可以接受
多个参数,也是适用的。例如<code class="highlighter-rouge">wc -l **/*.md</code>,计算博客文章的行数。</p>
<p>Zsh也支持带有正则特点的globbing。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># list text files that end in a number from 1 to 10
ls -l zsh_demo/**/*<1-10>.txt
# list text files that start with the letter a
ls -l zsh_demo/**/[a]*.txt
# list text files that start with either ab or bc
ls -l zsh_demo/**/(ab|bc)*.txt
# list text files that don't start with a lower or uppercase c
ls -l zsh_demo/**/[^cC]*.txt
</code></pre></div></div>
<p>Zsh Globbing有一种后缀的Globbing Qualifier,实现基于文件属性的过滤。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
# show only directories
print -l zsh_demo/**/*(/)
# show only regular files
print -l zsh_demo/**/*(.)
# show empty files
ls -l zsh_demo/**/*(L0)
# show files greater than 3 KB
ls -l zsh_demo/**/*(Lk+3)
# show files modified in the last hour
print -l zsh_demo/**/*(mh-1)
# sort files from most to least recently modified and show the last 3
ls -l zsh_demo/**/*(om[1,3])
</code></pre></div></div>
<p>上面的例子来自<a href="http://reasoniamhere.com/2014/01/11/outrageously-useful-tips-to-master-your-z-shell/">Blog</a>,
在这篇博客里面还介绍了参数扩展的modifier和flag,说实话这种记不住的东西没有用处。</p>
<h2 id="kill-completion">Kill Completion</h2>
<p>ZSH的kill命令也得到改进,你可以输入kill <TAB>,就会试图匹配所有进程,不过需要列出的进程太多。
类似其他补齐,你可以输入启动进程的命令的开头几个字符,就会缩小范围,也类似补齐当前文件目录一样,
可以在多个候选中遍历。这对于要手动杀死进程的时候,实在是太方便了。</TAB></p>
<h2 id="oh-my-zsh">Oh-my-zsh</h2>
<p>Mac用户通常长时间不关闭MBP,所以很难有机会让Oh-my-zsh更新,我们可以手动更新: <code class="highlighter-rouge">upgrade_oh_my_zsh</code>。
如果要安装非Oh-my-zsh默认的插件,应该安装到zsh_custom/plugins目录。自己定制的主题放在zsh_custom/themes下面。</p>
<p>Plugins:</p>
<ul>
<li>autoenv</li>
<li>z</li>
<li>docker</li>
<li>git</li>
<li>jira https://github.com/robbyrussell/oh-my-zsh/tree/master/plugins/jira</li>
<li>mercurial https://github.com/robbyrussell/oh-my-zsh/tree/master/plugins/mercurial</li>
<li>zsh-autosuggestion</li>
<li>zsh-history-substring-search</li>
<li>zsh-syntax-highlighting</li>
</ul>
<h3 id="zsh-syntax-highlighting">zsh-syntax-highlighting</h3>
<p>在<code class="highlighter-rouge">~/.oh-my-zsh/custom/plugins</code>目录克隆<code class="highlighter-rouge">https://github.com/zsh-users/zsh-syntax-highlighting.git</code>,
然后将zsh-syntax-highlighting添加为最后一个插件。此插件对zsh的其他部分有一个依赖,所以必须是最后一个插件。</p>Leo Jiang介绍ZShell和Oh My ZshJava Microbenchmark Tool2016-12-02T00:12:00+08:002016-12-02T00:12:00+08:00https://leohacker.github.io/java/Java-Microbenchmark-Tool<p>这篇博客只是一个简单的笔记,记录一个从前不了解的概念Microbenchmark,以及它在Java上的两个工具:JMH (Java Microbenchmark Harness) 和 Caliper 。</p>
<p>刚开始的时候,不清楚Microbenchmark的含义,不知道这个micro是什么东西micro。后来看了stackoverflow上的一个答案,原来就是指测试那些很小的操作,
小到你的测试计时代码都比它大,这时观察者已经开始影响被观察对象的观察结果了,有点薛定谔的猫的意思。所以需要专门的工具来为这种测试运行benchmark。</p>
<p><a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a>是OpenJDK官方提供的运行Microbenchmark的工具,
你应该遵循官方说明构建工程,生成jar包,执行测试代码。Google也出过一个工具,Caliper,好像更早做出来。估计没什么机会用,但作为一种测试类型,
和相应的解决方案和工具,记录在这。</p>
<p>References:</p>
<ul>
<li>https://adoptopenjdk.gitbooks.io/adoptopenjdk-getting-started-kit/content/en/openjdk-projects/jmh/jmh.html</li>
<li>http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking/</li>
<li><a href="https://github.com/google/caliper">Caliper on Github</a></li>
</ul>Leo JiangMicrobenchmarkEssential Rsync2016-11-27T00:12:47+08:002016-11-27T00:12:47+08:00https://leohacker.github.io/linux/essential-rsync<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#基础" id="markdown-toc-基础">基础</a> <ul>
<li><a href="#原理" id="markdown-toc-原理">原理</a></li>
<li><a href="#语法" id="markdown-toc-语法">语法</a></li>
</ul>
</li>
<li><a href="#选项options" id="markdown-toc-选项options">选项(Options)</a> <ul>
<li><a href="#一般选项" id="markdown-toc-一般选项">一般选项</a></li>
<li><a href="#比较算法" id="markdown-toc-比较算法">比较算法</a></li>
<li><a href="#传输方式" id="markdown-toc-传输方式">传输方式</a></li>
<li><a href="#处理大文件" id="markdown-toc-处理大文件">处理大文件</a></li>
<li><a href="#备份" id="markdown-toc-备份">备份</a></li>
<li><a href="#符号链接" id="markdown-toc-符号链接">符号链接</a></li>
<li><a href="#删除多余文件" id="markdown-toc-删除多余文件">删除多余文件</a></li>
<li><a href="#拷贝文件列表" id="markdown-toc-拷贝文件列表">拷贝文件列表</a></li>
<li><a href="#文件过滤" id="markdown-toc-文件过滤">文件过滤</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<h2 id="基础">基础</h2>
<h3 id="原理">原理</h3>
<p>rsync是镜像同步和备份工具,主要的作用是在两个目录之间拷贝文件。rsync采用只传输变化部分的算法,
所以效率非常高。原理基本上就是,比较发送端和接收端的目录和文件,根据文件的大小和修改时间等判断文件
是否需要更新,然后比较文件的差异,传输差异部分的块(block)。</p>
<p>在应用场景上,设计时考虑了文件系统,符号链接,磁盘空间大小,网络连接带宽,中断的处理,大文件传输等问题。
在生成需要同步的拷贝文件列表后,用户可以指定过滤规则使得可以准确的传输要同步的文件。也提供了方便的删除
不需要的文件的选项。</p>
<p>从使用的角度,我们可以从一般同步本地和服务器目录的角度了解基本的时候方式,也可以从系统管理员的角度以备份
为目的来理解rsync的功能。原来有使用rsync来达到备份系统,恢复系统,同步多个系统的用法,不过现在DevOps
主要使用虚拟机,软件配置工具(puppet),Docker,就没有必要在这些关键的领域使用rsync,毕竟rsync有同步失败,
非原子化,有状态不一致的可能。</p>
<h3 id="语法">语法</h3>
<p>rsync的语法就是<code class="highlighter-rouge">rsync [options...] src... [dest]</code>,可以有多个源目录,不过通常只有一个。发送端
和接收端可以是服务器端<code class="highlighter-rouge">user@host:path</code>,但是不支持两个服务器同步,一定是一个本地,一个远程。</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># copy the src/bar to /data/dest/bar, will create a new directory 'bar' in dest.</span>
rsync <span class="nt">-avz</span> src/bar /data/dest
<span class="c"># copy the content of bar to /data/dest</span>
rsync <span class="nt">-avz</span> src/bar/ /data/dest
<span class="c"># copy the content of bar into /data/dest/bar</span>
rsycn <span class="nt">-avz</span> src/bar/ /data/dest/bar
<span class="c"># list files if no dest specified.</span>
rsycn <span class="nt">-avz</span> somehost:
</code></pre></div></div>
<p>我们可以理解为将最后一个路径分隔字符后的目录和文件内容传输给接收端。如果不指定接收端,就列出所有文件。</p>
<h2 id="选项options">选项(Options)</h2>
<p>选项的设计满足了使用rsync的各种用户场景。首先让我们先说一个使用选项(option)时候需要注意的小技巧。
~符号如果是起始的字符,会被Shell转换为用户目录,但是如果是<code class="highlighter-rouge">--option=~/foo</code>就不会,
这种情况要使用<code class="highlighter-rouge">--option ~/foo</code>的形式。</p>
<p>下面主要根据选项的分类记录其用法,这里仅包括部分,特别常用的(vz)和比较偏门用法的没有列出。文章根据Rsync的
man page整理而成。</p>
<h3 id="一般选项">一般选项</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># General
-q, --quite
静音模式,适合在cron任务的情况下使用
-n, --dry-run
和verbose,itemize-changes一起使用,模拟实际运行。
-i, --itemize-changes
列出所有变化的文件和信息。
--no-OPTION
允许你在使用其他选项的同时,关闭某些选项,尤其在使用-a的情况下有用。例如 -a --no-o ,保留其他信息但不包括owner信息。
在这里,顺序是重要的。
-M, --remote-option=OPTION
rsync -av -M --log-file=foo -M--fake-super src/ dest/ 最好多次指定需要在接收端使用的选项。
rsync -av -x -M--no-x src/ dest/ 有的选项是在两端都有作用的,在接收端指定nagtive的选项,可以让其只作用于发送端。
--log-file=FILE
rsync -av --M=--log-file=/tmp/rlog src/ dest 在服务器端保存log,尤其是在调试rsync为什么会意外关闭的时候有用。
还可以使用log-file-format来指定为每个更新的文件的log记录的格式。
--stats
给出统计信息。
-h, --human-readable
human readable数字格式。三个level,缺省是一个h,如果要以1000为单位就是 -hh ,以1024为单位就是 -hhh。
如果要没有格式的数字,-no=h。
</code></pre></div></div>
<h3 id="比较算法">比较算法</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-I, --ignore-times
关闭快速比较算法的时间戳条件,即不跳过大小和时间匹配的文件,这导致所有文件都会被更新。
--size-only
如果你使用了其他同步工具,而这个工具不能很好的保留修改时间戳,然后在使用rsync的时候,可以将快速比较算法改为只比较文件大小。
-c, --checksum
你也可以让rsync在传输前使用checksum的方式比较文件,而不是使用快速比较算法。
无论那种算法,rsync都会在传输完毕后使用checksum验证文件正确传输。rsync是使用MD5算法计算checksum。
-O, --omit-dir-times
omit directories from --times
-J, --omit-dir-times
omit symlinks from --times
</code></pre></div></div>
<h3 id="传输方式">传输方式</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-a, --archive
等价于 -rlptgoD, recursive, copy symlink, preserve permission, modification time,
group, owner, copy devices and specials files. 相关的选项还有numeric-ids, usermap, groupmap,
chown等
-r, --recursive
rsync在版本3.0.0后使用incremental scan,在完成一些目录的扫描后就开始传输。有些选项要求知道所有的文件列表,例如--delete-before,
--delete-after, --prune-empty-dirs, --delay-updates。
rsync默认只处理文件,目录会被忽略。如果在命令行上指定一个目录,需要使用-d, --dirs选项或者-r选项。
类似recursive选项,不同的是dirs没有递归效果,也就是仅传递一层目录中的文件和子目录。
-R, --relative
将命令行上的src的路径发送给server端,作为相对路径。
rsync -avR /foo/bar/baz.c remote:/tmp/ 将得到 remote:/tmp/foo/bar/baz.c
rsync -avR /foo/./bar/baz.c remote:/tmp/ 将得到 remote:/tmp/bar/baz.c, 重点是使用.来限定路径的起始位置。
--no-implied-dirs
在使用-R的时候,源路径是path/foo/bar,path和path/foo被成为implied directory。使用这个选项,implied directory的属性不会传输。
如果目标端有这个目录,就使用现有的目录,如果没有,就用缺省的属性创建新的目录。如果目标端的目录path或者path是符号链接,正常情况下这个
符号链接会被删除,然后创建目录path/foo。
如果想保持目标端的现有的符号链接目录,就可以使用no-implied-dirs选项。类似的可以使用keep-dirlinks选项。
--skip-compress=LIST
rsync默认不会压缩某些类型的文件,因为它们的格式是已经压缩过的,再次压缩没有效果。
7z ace avi bz2 deb gpg gz iso jpeg jpg lz lzma lzo mov mp3 mp4 ogg png rar rpm rzip tbz tgz tlz txz xz z zip
-y, --fuzzy
这个选项挺魔法的。当接收端没有某个文件的时候,通常是文件重新传递。不过考虑如果是有个文件在接收端改名的情况,就没有必要重新传递。于是fuzzy
算法会智能的查找是否有相同大小,修改时间,相似名字的文件,来加速文件的传输和创建。
-S, --sparse
某些文件里面含有大量的空字符,例如虚拟机文件中的未使用空间,这种稀疏文件应该使用这个选项,否则备份文件可能比源文件更大。
-T, --temp-dir=DIR
指定临时文件的工作目录。
</code></pre></div></div>
<h3 id="处理大文件">处理大文件</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--inplace
inplace的更新方式显然很危险,不过在处理大文件的时候,或者要保持硬链接的时候,或者在一个copy-on-write的文件系统上,有用。
--append
--append-verify
传输文件的时候,假设文件开头的部分是相同的,只有尾部的数据是新添加的。显然只针对某些数据文件。verify版本的选项会在结束后校验,如果不同,
就使用inplace的方式重新传输。
--max-size=SIZE
</code></pre></div></div>
<h3 id="备份">备份</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-b, --backup
对更新的文件做备份,可以使用backup-dir选项设置备份的目录,suffix设置备份的后缀。
-u, --update
更新模式,如果目标端的文件比源文件还要新,就不传输。
--existing, --ignore-non-existing
更新模式,仅更新目标端存在的文件。
--ignore-existing
不更新目标端已经存在的文件。其作用是在一次rsync传输中断,再次重传。这个是搭配link-dest这样的模式来使用的。
--compare-dest=DIR
设置用来比较的目录树。接收端存在一个旧版本的备份,现在得到一个需要更新的文件的目录树,于是将旧版本的目录树作为比较用的目录树,rsync命令的dest
是生成的差异文件的目录树。目标目录中如果有相同文件的话,会被删除。
--copy-dest=DIR
类似compare-dest,不过也会拷贝没有变化的文件。实际效果和拷贝整个目录没有区别,不过在拷贝没有变化的文件是采用拷贝本地文件的方式,会比较快。
这个选项的目的是,拷贝得到一个新的备份目录,而不干扰原来的备份目录,在完成所有文件备份后才切换备份目录。
--link-dest=DIR
相比copy-dest,更进一步的,使用硬链接的方式来拷贝相同的文件。
</code></pre></div></div>
<h3 id="符号链接">符号链接</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-l, --links
保持符号链接。
-L, --copy-links,
拷贝链接的目标文件,而不是链接。
--copy-unsafe-links
拷贝符号链接,即使这些链接指向拷贝的目录书外面的文件。
--safe-links
忽略unsafe links,所有的absolute symlink也忽略。不要将这个选项和relative一起使用。
--munge-links
munge符号链接为一种不可用的状态,即指向一个不存在的目录。或者将一个处于munged存储状态的符号链接恢复。
如果要对接收端使用这个功能,要使用--remote-option选项。
-k, --copy-dirlinks
拷贝目录的符号链接。当发送方是符号链接的目录,而接收方是真实的目录,如果不使用这个选项,接收方的目录会被删除。
-K, --keep-dirlinks
保持目录的符号链接。当发送方是真实目录,而接收方是符号链接目录,如果不使用这个选项,接收方的符号链接会被删除。
</code></pre></div></div>
<h3 id="删除多余文件">删除多余文件</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--delete
在接收端删除发送端没有的文件。实际删除文件前,最好用dry run先查看会删除哪些文件。
--delete-before
在传输前删除,可以腾出接收端的磁盘空间。
--delete-during, --del
在传输每个目录前扫描和删除。
--deleted-excluded
和--exclude一起使用,除了删除发送端不存在的文件,也删除被列出的excluded的文件。
</code></pre></div></div>
<h3 id="拷贝文件列表">拷贝文件列表</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--files-from=FILE
在文件中指定具体的文件列表。当使用这个选项的时候,--relative和--dirs是隐含的选项,可以和-a一起使用但是不包括-r的含义,需要显式使用-r选项。
--ignore-missing-args
如果某个列出的文件不存在,不生成出错信息,忽略这个文件。
--delete-missing-args
如果某个列出的文件不存在,在接收端删除这个文件。
-m, --prune-empty-dirs
从文件列表中删除空目录。当空目录从文件列表中删除了,如果同时使用delete选项,这个空目录也会从接收端删除。如果不想被删除,可以使用exclude
将文件和目录从拷贝文件的列表中过滤出去,也就不受delete的影响。
--list-only
列出源文件。
</code></pre></div></div>
<h3 id="文件过滤">文件过滤</h3>
<p>过滤规则使得我们可以选择哪些文件需要传输(include)和哪些文件排除(exclude)。当拷贝文件列表创建以后,rsync针对每个文件
或目录比对过滤规则,第一个匹配的规则生效。如果第一个匹配的是exclude模板,文件被排除,如果是include模板,文件不被排除,如果没有
匹配,文件不被排除。所以过滤规则在命令上的顺序很重要。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-C, --cvs-exclude
忽略各种版本管理文件和目录,备份文件。
RCS SCCS CVS CVS.adm RCSLOG cvslog.* tags TAGS .make.state .nse_depinfo *~ #* .#* ,* _$* *$ *.old *.bak *.BAK
*.orig *.rej .del-* *.a *.olb *.o *.obj *.so *.exe *.Z *.elc *.ln core .svn/ .git/ .hg/ .bzr/
--exclude=PATTERN
--exclude-from=FILE
--include=PATTERN
--include-from=FILE
</code></pre></div></div>
<p>当使用–filter的时候,可以指定Filter Rules。在用户手册中, FILTER RULES小节说明了如何指定这些规则。其中的修饰符可以实现,
指定需要传输的隐藏文件,保护文件不被删除,从其他文件中获得过滤规则,等等。INCLUDE/EXCLUDE PATTERN RULES小节说明了如何书写pattern。
常用的+-两个符号,不过可以有其他修饰符,所以pattern的书写是十分复杂的。</p>
<p>filter, include, exclude在命令上只能使用一次,如果想指定多个规则,请使用include-from/exclude-from选项。</p>
<p>例子:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># won't work as the parent directory "some" is excluded by the '*' rule.
+ /some/path/this-file-will-not-be-found
+ /file-is-included
- *
# workaround: list the parent directory first.
+ /some/
+ /some/path/
+ /some/path/this-file-is-found
+ /file-also-included
- *
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"- *.o" would exclude all names matching *.o
"- /foo" would exclude a file (or directory) named foo in the transfer-root directory
"- foo/" would exclude any directory named foo
"- /foo/*/bar" would exclude any file named bar which is at two levels below a directory named foo in the transfer-root directory
"- /foo/**/bar" would exclude any file named bar two or more levels below a directory named foo in the transfer-root directory
The combination of "+ */", "+ *.c", and "- *" would include all directories and C source files but nothing else (see also the --prune-empty-dirs option)
The combination of "+ foo/", "+ foo/bar.c", and "- *" would include only the foo directory and foo/bar.c (the foo directory must be explicitly included or it would be excluded by the "*")
</code></pre></div></div>Leo Jiangrsync的基本使用Essential Docker2016-11-10T21:30:00+08:002016-11-10T21:30:00+08:00https://leohacker.github.io/devops/essential-docker<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
<li><a href="#install" id="markdown-toc-install">Install</a></li>
<li><a href="#custom-docker-daemon-options" id="markdown-toc-custom-docker-daemon-options">Custom docker daemon options</a></li>
<li><a href="#image-management" id="markdown-toc-image-management">Image Management</a></li>
<li><a href="#container-management" id="markdown-toc-container-management">Container Management</a> <ul>
<li><a href="#info-in-container" id="markdown-toc-info-in-container">Info in Container</a></li>
<li><a href="#network" id="markdown-toc-network">Network</a></li>
<li><a href="#data" id="markdown-toc-data">Data</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<p>本文根据Docker的官方文档整理而成。</p>
<h3 id="introduction">Introduction</h3>
<p>Docker技术提供了操作系统层面的虚拟化技术,利用Linux内核的cgroups和kernel namespaces,还有Union特性的文件系统,在Linux主机上构建出一个隔离的运行实例,称为容器(container)。镜像是容器运行前的文件系统快照,容器是运行时实例。在Docker的官方网页<a href="https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/">镜像,容器和存储器驱动器</a>介绍了image和其中的多层layer的概念,layer是只读的文件系统,运行时的容器有一个可写层,基于copy on write策略。Layer是在所有镜像之间可以共享的,所以任何一个docker镜像可能很小。如果多个镜像是基于某一个基础镜像做出来的,基础的部分只需要下载一次,效率非常高。</p>
<blockquote>
<p>The Linux kernel’s support for namespaces mostly[9] isolates an application’s view of the operating environment, including process trees, network, user IDs and mounted file systems, while the kernel’s cgroups provide resource limiting, including the CPU, memory, block I/O and network. Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addition to using abstracted virtualization interfaces via libvirt, LXC (Linux Containers) and systemd-nspawn.[10][11][12] – from Wikipedia</p>
</blockquote>
<p>类似Github, <a href="https://hub.docker.com">Docker Hub</a>提供了管理Docker容器镜像的功能,类似于Git仓库的使用方式和概念,可以建立一个镜像的多个版本,版本是用tag来标识的。</p>
<h3 id="install">Install</h3>
<p>需要安装的是Docker CE (Community Edition),以前叫Docker Engine,主要的功能是运行Docker Image。Docker CE提供Restful API,Docker CLI通过调用这些API提供用户操作的界面。</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># remove the old versions.</span>
<span class="nv">$ </span><span class="nb">sudo </span>apt-get remove docker docker-engine docker.io
<span class="nv">$ </span><span class="nb">sudo </span>apt-get update
<span class="nv">$ </span><span class="nb">sudo </span>apt-get install apt-transport-https ca-certificates curl software-properties-common
<span class="nv">$ </span><span class="nb">sudo </span>apt-get install linux-image-extra-<span class="k">$(</span>uname <span class="nt">-r</span><span class="k">)</span> linux-image-extra-virtual
<span class="c"># pls follow the install guide if key changed.</span>
<span class="c"># https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#install-using-the-repository</span>
<span class="c">#</span>
<span class="nv">$ </span>curl <span class="nt">-fsSL</span> https://download.docker.com/linux/ubuntu/gpg | <span class="nb">sudo </span>apt-key add -
<span class="nv">$ </span><span class="nb">sudo </span>apt-key fingerprint 0EBFCD88
<span class="nv">$ </span><span class="nb">sudo </span>add-apt-repository <span class="se">\</span>
<span class="s2">"deb [arch=amd64] https://download.docker.com/linux/ubuntu </span><span class="se">\</span><span class="s2">
</span><span class="k">$(</span>lsb_release <span class="nt">-cs</span><span class="k">)</span><span class="s2"> </span><span class="se">\</span><span class="s2">
stable"</span>
<span class="c"># or use the below command</span>
<span class="c"># $ echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list</span>
<span class="nv">$ </span><span class="nb">sudo </span>apt-get update
<span class="nv">$ </span><span class="nb">sudo </span>apt-get install docker-ce
<span class="c"># start the docker engine</span>
<span class="nv">$ </span><span class="nb">sudo </span>systemctl start docker
<span class="c"># or</span>
<span class="nv">$ </span><span class="nb">sudo </span>service docker start
<span class="c"># Start docker on boot (systemd)</span>
<span class="nv">$ </span><span class="nb">sudo </span>systemctl <span class="nb">enable </span>docker
<span class="c"># Start docker on boot (upstart)</span>
<span class="nv">$ </span> <span class="nb">echo </span>manual | <span class="nb">sudo </span>tee /etc/init/docker.override
<span class="c"># Start docker on boot (chkconfig)</span>
<span class="nv">$ </span><span class="nb">sudo </span>chkconfig docker on
<span class="c"># make you run the docker without sudo.</span>
<span class="nv">$ </span><span class="nb">sudo </span>groupadd docker
<span class="nv">$ </span><span class="nb">sudo </span>usermod <span class="nt">-aG</span> docker <span class="nv">$USER</span>
<span class="c"># show the info of docker engine.</span>
<span class="nv">$ </span>docker info
</code></pre></div></div>
<h3 id="custom-docker-daemon-options">Custom docker daemon options</h3>
<p>There are a number of ways to configure the daemon flags and environment variables for your Docker daemon. The recommended way is to use the platform-independent daemon.json file, which is located in /etc/docker/ on Linux by default. See <a href="https://docs.docker.com/engine/reference/commandline/dockerd//#daemon-configuration-file">Daemon configuration file</a>.</p>
<p>You can configure nearly all daemon configuration options using daemon.json. The following example configures two options. One thing you cannot configure using daemon.json mechanism is a HTTP proxy.</p>
<h3 id="image-management">Image Management</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># login into docker hub.</span>
docker login <span class="nt">-u</span> username
<span class="c"># search the images</span>
docker search sinatra
<span class="c"># push the local image to docker hub.</span>
docker push user/image-name
<span class="c"># pull the docker image from docker hub.</span>
docker pull training/sinatra
<span class="c"># Run a container, update the software in the container, them commit this image</span>
docker commit <span class="nt">-m</span> <span class="s2">"Comment"</span> <span class="nt">-a</span> <span class="s2">"author"</span> container-id user/image-name:tag
<span class="c"># tag the image with version.</span>
docker tag image-id user/image-name:tag
<span class="c"># list local images</span>
docker images
<span class="c"># remove local image</span>
docker rmi <span class="nt">-f</span> image-id
<span class="c"># show image layers</span>
docker <span class="nb">history </span>image:tag
</code></pre></div></div>
<h2 id="container-management">Container Management</h2>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># run ubuntu image with interactive mode(-i) pseudo tty(-t).</span>
docker run <span class="nt">-t</span> <span class="nt">-i</span> ubuntu /bin/bash
<span class="c"># run daemonized docker</span>
docker run <span class="nt">-d</span> ubuntu /bin/bash <span class="nt">-c</span> <span class="s2">"while true; do echo hello world; sleep 1; done"</span>
<span class="c"># run daemonied docker for wabapp, -P map all of ports inside the container to host random ports.</span>
docker run <span class="nt">-d</span> <span class="nt">-P</span> user/webapp python app.py
<span class="c"># specify the mapping of host post 80 to container port 5000.</span>
docker run <span class="nt">-d</span> <span class="nt">-p</span> 80:5000 user/webapp python app.py
<span class="c"># name the container</span>
<span class="nv">$ </span>docker run <span class="nt">-d</span> <span class="nt">-P</span> <span class="nt">--name</span> web training/webapp python app.py
<span class="c"># start the container</span>
docker start container-name
<span class="c"># stop the container</span>
docker stop container-name
<span class="c"># restart</span>
docker restart container-name
<span class="c"># kill</span>
docker <span class="nb">kill </span>container-name
<span class="c"># remove</span>
docker rm container-name
<span class="c"># remove all containers</span>
docker rm <span class="sb">`</span>docker ps <span class="nt">-a</span> <span class="nt">-q</span><span class="sb">`</span>
<span class="c"># check docker container status</span>
docker ps
<span class="c"># check container size</span>
docker ps <span class="nt">-s</span>
<span class="c"># attach to a container</span>
<span class="c"># 如果是bash进程作为 foreground,得到是一个交互式界面。退出bash也就退出container。</span>
<span class="c"># 如果是一个服务进程作为foreground,得到是此服务进程的log输出界面。</span>
docker attach
</code></pre></div></div>
<h3 id="info-in-container">Info in Container</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># inspect: list all the information of docker container/image in json format.</span>
docker inspect container-name
<span class="c"># check the process output in docker container.</span>
docker logs container-name
<span class="c"># tail -f like</span>
docker logs <span class="nt">-f</span> container-name
<span class="c"># show the host mapping for container port</span>
docker port container-name 5000
<span class="c"># check the process status in container.</span>
docker top container-name
</code></pre></div></div>
<h3 id="network">Network</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># list all the networks (default: null, host, bridge)</span>
<span class="nv">$ </span>docker network <span class="nb">ls</span>
<span class="c"># show the info of network 'bridge'</span>
<span class="nv">$ </span>docker network inspect bridge
<span class="c"># disconnect the container 'networktest' from network 'bridge'</span>
<span class="nv">$ </span>docker network disconnect bridge networktest
<span class="c"># create own network, -d specify the driver type.</span>
<span class="nv">$ </span>docker network create <span class="nt">-d</span> bridge my-bridge-network
<span class="c"># run the container and add to network.</span>
<span class="nv">$ </span>docker run <span class="nt">-d</span> <span class="nt">--network</span><span class="o">=</span>my-bridge-network <span class="nt">--name</span> db training/postgres
<span class="c"># run the interactive shell for container db.</span>
<span class="nv">$ </span>docker <span class="nb">exec</span> <span class="nt">-it</span> db bash
</code></pre></div></div>
<h3 id="data">Data</h3>
<p>Docker的最佳实践推荐在layers中只包含程序,而不是数据。理由很自然,数据是可能变化的,docker作为镜像要被许多项目共享。
同时程序和数据是一体的,没有任何数据处理的程序是没有意义的。那么数据放哪里呢?Layers是只读,不能保持数据,最外层的可写层
不是持久存在的,如果container被删除了,也就不存在了。作为虚拟云技术,container的生命期完全是动态的,所以需要一个外部的
持久层的数据存储位置。</p>
<blockquote>
<p>A data volume is a directory or file in the Docker host’s filesystem that is mounted directly into a container. Data volumes are not controlled by the storage driver. Reads and writes to data volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes.</p>
</blockquote>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 指定一个container内部的位置,使用docker分配的一个volumes目录,匿名数据卷。</span>
<span class="c"># /webapp mount point inside the container, you can use inspect to find the location folder on host.</span>
<span class="c"># /var/lib/docker/volumes/437841e70eaf07782366ba554ce7782b5805cf496256220ae3187946a0815639/_data</span>
docker run <span class="nt">-d</span> <span class="nt">-P</span> <span class="nt">--name</span> web <span class="nt">-v</span> /webapp training/webapp python app.py
</code></pre></div></div>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 指定docker分配和volumes目录名称</span>
docker run <span class="nt">-d</span> <span class="nt">-P</span> <span class="nt">--name</span> web <span class="nt">-v</span> webapp_data:/webapp training/web python app.py
</code></pre></div></div>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s2">"Mounts"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"Name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"webapp_data"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/var/lib/docker/volumes/webapp_data/_data"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Destination"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/webapp"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Driver"</span><span class="p">:</span><span class="w"> </span><span class="s2">"local"</span><span class="p">,</span><span class="w">
</span><span class="s2">"Mode"</span><span class="p">:</span><span class="w"> </span><span class="s2">"z"</span><span class="p">,</span><span class="w">
</span><span class="s2">"RW"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="s2">"Propagation"</span><span class="p">:</span><span class="w"> </span><span class="s2">"rprivate"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span></code></pre></div></div>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 指定一个主机上的目录作为volume目录。</span>
<span class="c"># mount a host directory as data volume. -v host_path:container_path</span>
<span class="nv">$ </span>docker run <span class="nt">-d</span> <span class="nt">-P</span> <span class="nt">--name</span> web <span class="nt">-v</span> /src/webapp:/webapp training/webapp python app.py
</code></pre></div></div>
<p>本质上,数据卷的记载就是一个mount的过程,容器内部的目录被另外一个目录覆盖,在unmount后,原来的目录又暴露出来,完全和mount的行为一致。
所以-v 参数也可以用来mount文件,不过由于编辑动作可能导致inode变化,而在容器环境下不允许,所以其实不推荐mount需要写的文件。所以基本上
来说,数据卷这个特性目的就是为了加载数据目录。</p>
<p>docker还支持数据卷容器</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># 创建一个dbstore名字的数据卷容器
$ docker create -v /dbdata --name dbstore training/postgres /bin/true
# db1, db2使用来自dbstore的数据目录/dbdata
$ docker run -d --volumes-from dbstore --name db1 training/postgres
$ docker run -d --volumes-from dbstore --name db2 training/postgres
# 支持volume的链式引用
$ docker run -d --name db3 --volumes-from db1 training/postgres
</code></pre></div></div>
<p>volume和容器不是绑定的,所以删除容器不会删除数据卷。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># find dangline volumes
docker volume ls -f dangling=true
docker rm <volume name>
# docker daemon will clean up anonymous volumes when container deleted.
# /foo deleted but not awesome volume.
$ docker run --rm -v /foo -v awesome:/bar busybox top
</code></pre></div></div>
<p>备份和恢复数据卷</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># 将dbstore中的数据/dbdata备份到/backup/backup.tar,并通过数据卷加载传递到本地目录。
$ docker run --rm --volumes-from dbstore -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
# 加载本地目录到容器的backup目录,然后将备份文件恢复到/dbdata目录。
$ docker run -v /dbdata --name dbstore2 ubuntu /bin/bash
$ docker run --rm --volumes-from dbstore2 -v $(pwd):/backup ubuntu bash -c "cd /dbdata && tar xvf /backup/backup.tar --strip 1"
</code></pre></div></div>Leo JiangDocker的基本概念和使用Read the JDK 9 source code in Intellij IDEA2016-11-10T00:00:00+08:002016-11-10T00:00:00+08:00https://leohacker.github.io/java/read-jdk-source-code-intellij<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
</nav>
</aside>
<p>Intellij IDEA也是在JDK开发社区使用很广泛的编辑器,尤其是开发JDK本身不需要其他框架支持,
社区版就够用了。问题是JDK9是模块化的结构,而且在JDK的代码仓库比较多,不是很容易作为Intellij的模块打开。</p>
<p>JDK开发社区中,AdoptOpenJDK给出过一个脚本<a href="https://github.com/AdoptOpenJDK/BuildHelpers/blob/master/buildIntelliJModules.sh">BuildHelpers.sh</a>。
在2015年的时候,Maurizio Cimadamore和Chris Hegarty给出了OpenJDK官方的<a href="https://bugs.openjdk.java.net/browse/JDK-8074716">答案</a>。</p>
<p>使用方法也很简单:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># clone openjdk source code forest
hg clone http://hg.openjdk.java.net/jdk9/dev 9dev.src
cd 9dev.src
sh ./get_source.sh
# auto-configure, install the ant in appropriate position or default position.
# or you need to run this configur command with --with-ant-home <ANT_HOME>
bash configure
# run the script to build intellij project files.
sh common/bin/idea.sh
# output folder: the .idea hidden folder under the toplevel.
</code></pre></div></div>
<p>这个脚本在生成Intellij项目后,提供了几个Build命令。不过由于缺少BSF Manager(Javascript Engine Manager),
无法通过build。目前我不了解Ant,只能放在一边了。</p>
<p>Intellij的项目生成后,我们可以打开这个项目,阅读各个类的代码,跳转都是好用的。Excellent!</p>Leo Jiang如何设置Intellij IDEA阅读JDK9模块化的代码Build a blog with Jekyll on Github2016-11-04T10:50:00+08:002016-11-04T10:50:00+08:00https://leohacker.github.io/programmer/jekyll-blog-on-github<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#安装" id="markdown-toc-安装">安装</a></li>
<li><a href="#定制" id="markdown-toc-定制">定制</a></li>
<li><a href="#运行" id="markdown-toc-运行">运行</a></li>
<li><a href="#编写" id="markdown-toc-编写">编写</a></li>
</ul>
</nav>
</aside>
<p>本博客使用<a href="https://jekyllrb.com/">Jekyll</a>静态博客系统构建,其特点是博客的内容不是数据库管理的,而是以markdown文件的方式存放在post目录下,不容易丢失,迁移也很容易。本博客主题使用<a href="https://mmistakes.github.io/minimal-mistakes/">Minimal mistakes</a>,是一个响应式的两栏的布局,它提供了几种漂亮的布局和比较灵活的配置。</p>
<h2 id="安装">安装</h2>
<p>Jekyll提供了一个处理markdown文件生成静态网站的基础框架,但是没有也不可能给出你的静态网站的设计。所以我们需要主题来设置网站的界面风格。通常,我们是通过查找合适的主题,然后在Github上fork这个项目来得到这个主题。minimal-mistakes给出了一个简洁的单页的风格,也提供了有全宽标题图片的页面,还提供了Feature文章的排版风格,非常适合博客。出于方便管理和升级的角度,Minimal mistakes 4.0以后提供了Gem包的方式,但是github pages不支持第三方插件,所以还是使用fork的方式。</p>
<p>Jekyll博客使用Gemfile管理依赖,所以我们使用Bundler安装所有需要的Jekyll和Jekyll的插件。当然我们需要先安装RVM或者ruby。我通常会使用RVM设置一个专门的gemset(2.3@Jekyll)给博客。</p>
<ul>
<li>Ruby: 2.3</li>
<li>Gemset: Jekyll</li>
<li>Jekyll: 3.3</li>
<li>Minimal mistakes version: 4.0.4</li>
</ul>
<p><a href="https://mmistakes.github.io/minimal-mistakes/docs/quick-start-guide/">Minimal mistakes Quick Start Guide</a>给出了安装这个主题的步骤。</p>
<ul>
<li>Fork minimal-mistakes repository</li>
<li>Rename the repository to yourname.github.io</li>
<li>Fill your Gemfile, use bundler to install the gems into Jekyll gemset.</li>
<li>Remove the gh-pages branches and folders in demo site.</li>
</ul>
<p>如果我们需要升级,可以添加远程上游仓库。由于会修改和定制一些内容,每次升级后需要解决冲突。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git remote add upstream https://github.com/mmistakes/minimal-mistakes.git
$ git pull upstream master
</code></pre></div></div>
<p>Jekyll的基本原理是将位于_posts目录和_pages目录下的文件,生成为_site目录下的静态文件。主题的仓库里面没有包含这些目录,我们需要生成或者将以前的文章迁移到这里。Jekyll默认有_draft和_post,但是没有_pages。在_config.yml中的include设置包含_pages,所以我们可以直接读取和使用_pages下的文章。</p>
<h2 id="定制">定制</h2>
<ul>
<li>在<code class="highlighter-rouge">_config.yml</code>文件中设置站点和作者的信息,评论系统的设置。设置default front matter。</li>
<li>修改_data/navigations.yml,改变站点的导航栏。每个url对应_pages下面的一个文件,可以是html或者markdown格式。</li>
<li>创建assets/images存放图片,添加头像和缺省的文章图片,将favicon路径指向avatar.png。</li>
<li>设置MathJax。在<code class="highlighter-rouge">_includes/head/custom.html</code>中引入MathJax脚本CDN路径,注意必须是https协议。由于Github支持https,在_config.yml中也配置了网站的url是使用https,所以要使用https的CDN。在kramdown的配置部分,要指定<code class="highlighter-rouge">math_engine: mathjax</code>。为了正确设置MathJax,不能使用Compress HTML的特性。因为MathJax的JavaScript需要保留换行。</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</code></pre></div></div>
<ul>
<li>在文章的YAML Front Matter,可以设置CSS class。这个提供了自定义界面风格的极大灵活性。</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> layout: splash
classes:
- landing
- dark-theme
<body class="layout-splash landing dark-theme">
</code></pre></div></div>
<ul>
<li>minimal-mistakes原来的风格适合英文,而且在小屏幕上字体很合适。但是相同字号的中文有点大,而且我主要是想书写数学公式和程序代码,目标是通过桌面的大屏幕来阅读。调整CSS的设置,使得列表(包括toc)的文字大小一致,各个文字段落更加紧凑,调整标题的大小。设置Single布局使用全部宽度,不显示sidebar。</li>
<li>语法高亮使用monoki风格。从github上找到Jekyll兼容的语法高亮CSS定义,不过缺少几项。对code的默认颜色设置为lime(亮绿色),效果不错。具体的修改参考<code class="highlighter-rouge">_sass/_syntax.scss</code>文件。</li>
<li>在Category页面,添加Categroy的按钮在头部。</li>
<li>添加Search功能
<ul>
<li>http://victorvoid.github.io/</li>
<li>https://github.com/victorvoid/space-jekyll-template</li>
<li>http://jekyll.tips/jekyll-casts/jekyll-search-using-lunr-js/</li>
<li>https://github.com/christian-fei/Simple-Jekyll-Search</li>
<li>https://github.com/slashdotdash/jekyll-lunr-js-search</li>
<li>http://mathayward.com/jekyll-search/</li>
</ul>
</li>
</ul>
<h2 id="运行">运行</h2>
<p><code class="highlighter-rouge">jekyll build</code>编译生成站点,<code class="highlighter-rouge">jekyll serve</code>在本地启动一个服务器,通过<code class="highlighter-rouge">127.0.0.1:4000</code>访问。在使用过程中,可能遇到github api authentication问题,可以在github上建立一个personal token,打开public repo访问权限,设置环境变量JEKYLL_GITHUB_TOKEN,就不会给出警告了。</p>
<h2 id="编写">编写</h2>
<ul>
<li>可以创建一个到其他网址的链接,在YAML front matter添加<code class="highlighter-rouge">link: http://url-you-want-linked</code>。</li>
<li>嵌入视频</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><iframe width="640" height="360" src="https://www.youtube-nocookie.com/embed/l2Of1-d5E5o?controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>
</code></pre></div></div>
<ul>
<li>图片
<ul>
<li><a href="https://mmistakes.github.io/minimal-mistakes/post%20formats/post-image-standard/">插入图片</a></li>
<li><a href="https://mmistakes.github.io/minimal-mistakes/markup/markup-image-alignment/">图片对齐</a></li>
<li><a href="https://mmistakes.github.io/minimal-mistakes/markup-more-images/">Featured图片</a></li>
</ul>
</li>
<li><a href="https://mmistakes.github.io/minimal-mistakes/markup/markup-html-tags-and-formatting/">HTML Tags帮助</a></li>
</ul>Leo JiangBuild a blog powered with Jekyll and Minimal mistake, hosted on Github page.CLDR - Locale2016-08-06T00:50:00+08:002016-08-06T00:50:00+08:00https://leohacker.github.io/textprocessing/CLDR-Locale<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#引言" id="markdown-toc-引言">引言</a></li>
<li><a href="#locale" id="markdown-toc-locale">Locale</a> <ul>
<li><a href="#locale的概念" id="markdown-toc-locale的概念">Locale的概念</a></li>
<li><a href="#locale-identifier" id="markdown-toc-locale-identifier">Locale Identifier</a> <ul>
<li><a href="#bcp47" id="markdown-toc-bcp47">BCP47</a></li>
<li><a href="#bcp47-language-tag-conversion" id="markdown-toc-bcp47-language-tag-conversion">BCP47 Language Tag Conversion</a></li>
<li><a href="#locale-identifier-on-unix" id="markdown-toc-locale-identifier-on-unix">Locale Identifier on Unix</a></li>
<li><a href="#locale-identifier-on-windows-platforms" id="markdown-toc-locale-identifier-on-windows-platforms">Locale Identifier on Windows Platforms</a></li>
<li><a href="#locale-identifier-on-web" id="markdown-toc-locale-identifier-on-web">Locale Identifier on Web</a></li>
</ul>
</li>
<li><a href="#locale-inheritance-and-matching" id="markdown-toc-locale-inheritance-and-matching">Locale Inheritance and Matching</a> <ul>
<li><a href="#locale-inheritance" id="markdown-toc-locale-inheritance">Locale Inheritance</a></li>
<li><a href="#locale-lookup-fallback" id="markdown-toc-locale-lookup-fallback">Locale Lookup Fallback</a></li>
<li><a href="#locale-matching" id="markdown-toc-locale-matching">Locale Matching</a></li>
</ul>
</li>
<li><a href="#cldr-data" id="markdown-toc-cldr-data">CLDR Data</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<h2 id="引言">引言</h2>
<p>Unicode Common Locale Data Repository (<a href="http://cldr.unicode.org/">CLDR</a>)是软件国际化的基石,它作为一个国际标准提供了构建国际化软件所需要的定义和数据。本文假设读者对于软件国际化有基本的了解,知道国际化是关于语言,地域,时间,数字,时区等用户配置相关的软件技术的统称。英语使用Locale这个概念和词汇,而这个词实际用中文很难准确翻译,所以在本文中直接使用英文,请参照下面的讲解仔细理解Locale这个词汇的含义。本文基于<a href="http://www.unicode.org/reports/tr35/">LDML 技术报告</a>来解读CLDR。</p>
<h2 id="locale">Locale</h2>
<h3 id="locale的概念">Locale的概念</h3>
<p>Locale对于初学者是一个模糊的概念,对于略有一些国际化知识的人又是一个容易误解的概念。在LDML报告开头的一个小节很好的解释了什么是<strong>locale</strong>。其中的关键是,我们要认识到Locale其实不等于语言和地域,或者它们的组合,它代表一个和用户loclae设置相关的数据集合。这是我迄今为止看到的最明白的讲解。我不能用一句话来概括人家几个段落,请认真阅读以下文字。</p>
<blockquote>
<p>The first issue is basic: what is a locale? In this model, a locale is an identifier (id) that refers to a set of user preferences that tend to be shared across significant swaths of the world. Traditionally, the data associated with this id provides support for formatting and parsing of dates, times, numbers, and currencies; for measurement units, for sort-order (collation), plus translated names for time zones, languages, countries, and scripts. The data can also include support for text boundaries (character, word, line, and sentence), text transformations (including transliterations), and other services.</p>
</blockquote>
<blockquote>
<p>Locale data is not cast in stone: the data used on someone’s machine generally may reflect the US format, for example, but preferences can typically set to override particular items, such as setting the date format for 2002.03.15, or using metric or Imperial measurement units. In the abstract, locales are simply one of many sets of preferences that, say, a website may want to remember for a particular user. Depending on the application, it may want to also remember the user’s time zone, preferred currency, preferred character set, smoker/non-smoker preference, meal preference (vegetarian, kosher, and so on), music preference, religion, party affiliation, favorite charity, and so on.</p>
</blockquote>
<blockquote>
<p>Locale data in a system may also change over time: country boundaries change; governments (and currencies) come and go: committees impose new standards; bugs are found and fixed in the source data; and so on. Thus the data needs to be versioned for stability over time.</p>
</blockquote>
<blockquote>
<p>In general terms, the locale id is a parameter that is supplied to a particular service (date formatting, sorting, spell-checking, and so on). The format in this document does not attempt to represent all the data that could conceivably be used by all possible services. Instead, it collects together data that is in common use in systems and internationalization libraries for basic services. The main difference among locales is in terms of language; there may also be some differences according to different countries or regions. However, the line between locales and languages, as commonly used in the industry, are rather fuzzy. Note also that the vast majority of the locale data in CLDR is in fact language data; all non-linguistic data is separated out into a separate tree. For more information, see Section 3.10 Language and Locale IDs.</p>
</blockquote>
<blockquote>
<p>We will speak of data as being “in locale X”. That does not imply that a locale is a collection of data; it is simply shorthand for “the set of data associated with the locale id X”. Each individual piece of data is called a resource or field, and a tag indicating the key of the resource is called a resource tag.</p>
</blockquote>
<h3 id="locale-identifier">Locale Identifier</h3>
<p>用户locale是数据集,在使用中需要使用一个id来指定某个集合。常见的POSIX的locale表示有<code class="highlighter-rouge">en_US</code>, <code class="highlighter-rouge">zh_CN</code>,BCP47则提供了更丰富的subtag来表示locale中的概念。CLDR基于BCP47,是BCP47的超集。在使用Locale ID的时候,原则是尽可能用短的表示。</p>
<blockquote>
<p>Unicode LDML uses stable identifiers based on <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">BCP47</a> for distinguishing among languages, locales, regions, currencies, time zones, transforms, and so on.</p>
</blockquote>
<h4 id="bcp47">BCP47</h4>
<p>BCP47由两个RFC文档组成。BCP stands for ‘Best Current Practice’.</p>
<ul>
<li><strong>Tags for Identifying Languages</strong> <a href="http://w3c.github.io/ltli/#bib-RFC5646">RFC5646</a>,定义了各种language tag的语法、形式和术语。</li>
<li><strong>Matching of Language Tags</strong> <a href="http://w3c.github.io/ltli/#bib-RFC4647">RFC4647</a>,描述几种用于匹配、比较和选择language tag的方案。</li>
</ul>
<p>W3C和Java都使用BCP47,也就是都可以使用BCP风格的language tag表示方式,也采用BCP47的Language Tag匹配方案。</p>
<p>首先我们来看看CLDR中language id的EBNF范式。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>unicode_language_id
="root"
| (unicode_language_subtag
(sep unicode_script_subtag)?
| unicode_script_subtag)
(sep unicode_region_subtag)?
(sep unicode_variant_subtag)* ;
</code></pre></div></div>
<p>unicode_language_id是在CLDR中BCP47的language tags的对应定义。常见的POSIX风格的locale id可以认为是language subtag + region subtag。所有的subtags都在IANA有<a href="http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">注册</a>。CLDR将这些subtag放在<code class="highlighter-rouge">common/validity/</code>目录下的文件里: language, script, region, variant, 也包括currency和unit这样和语言无关的subtag。这些文件是机器可读的,所以我们并不容易理解其中的定义。</p>
<p>作为速查,在这里也抄录作为无定义(undefined/unknown)时使用的subtag值。</p>
<table>
<thead>
<tr>
<th style="text-align: left">Code Type</th>
<th style="text-align: left">Value</th>
<th style="text-align: left">Description in Referenced Standards</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">Language</td>
<td style="text-align: left">und</td>
<td style="text-align: left">Undetermined language</td>
</tr>
<tr>
<td style="text-align: left">Script</td>
<td style="text-align: left">Zzzz</td>
<td style="text-align: left">Code for uncoded script, Unknown [UAX24]</td>
</tr>
<tr>
<td style="text-align: left">Region</td>
<td style="text-align: left">ZZ</td>
<td style="text-align: left">Unknown or Invalid Territory</td>
</tr>
<tr>
<td style="text-align: left">Currency</td>
<td style="text-align: left">XXX</td>
<td style="text-align: left">The codes assigned for transactions where no currency is involved</td>
</tr>
<tr>
<td style="text-align: left">Time Zone</td>
<td style="text-align: left">unk</td>
<td style="text-align: left">Unknown or Invalid Time Zone</td>
</tr>
<tr>
<td style="text-align: left">Subdivision</td>
<td style="text-align: left">ZZZZ</td>
<td style="text-align: left">Unknown or Invalid Subdivision</td>
</tr>
</tbody>
</table>
<p><a href="https://www.w3.org/International/articles/language-tags/">Language tags in HTML and XML</a>对于这些subtags有非常好的讲解。这篇文档非常出色,请仔细阅读。从CLDR的角度,上面的BNF范式定义的是Language ID,完整的Locale ID的格式是<code class="highlighter-rouge">language-extlang-script-region-variant-extension-privateuse</code>,当然不用每个域都指定。下面有简单讲解Unicode支持的扩展。CLDR文档在BCP47 Conformance一节,明确提到不支持extlang的。</p>
<p>关于Language ID的连接符和大小写规范:</p>
<blockquote>
<p>The identifiers can vary in case and in the separator characters. The <code class="highlighter-rouge">-</code> and <code class="highlighter-rouge">_</code> separators are treated as equivalent. All identifier field values are case-insensitive. Although case distinctions do not carry any special meaning, an implementation of LDML should use the casing recommendations in [BCP47], especially when a Unicode locale identifier is used for locale data exchange in software protocols. The recommendation is that: the region subtag is in uppercase, the script subtag is in title case, and all other subtags are in lowercase.</p>
</blockquote>
<p>scripts subtags仅应该在需要明确指定语言的变种的时候使用。典型的script subtags是中文的Hans和Hant,还有Bopomofo。以前我们用<code class="highlighter-rouge">zh_CN</code>隐含表示使用简体,用<code class="highlighter-rouge">zh_TW</code>隐含表示使用繁体,实际上使用Hans和Hant是正确的方法。</p>
<p>如果想指定没有script呢?例如为语音材料指定locale。</p>
<blockquote>
<p>If you specifically want to indicate that content is not written, there is a subtag for that. For example, you could use <strong>en-Zxxx</strong> to make it clear that an audio recording in English is not written content.</p>
</blockquote>
<p>BCP47有扩展(extension)机制,可以使用单字符来表示某一个扩展,例如’x’表示私用扩展。CLDR维护’u’和’t’扩展。</p>
<ul>
<li>BCP47 U Extension, locale扩展,可以在正常的locale id后面指定如农历,电话号码簿等locale信息。</li>
<li>BCP47 T Extension, transformations扩展,可以指定文字的转换,例如transliteration(某些文字的拉丁转写)。</li>
</ul>
<p>Unicode Locale扩展包含日历,货币,排序,数字,段行,计量,时区等的定义。所以还是要仔细研究的,相应的CLDR文件在bcp47目录下,可以查阅LDML的3.6.1节Key And Type Definitions。时区的数据采用<a href="http://www.iana.org/time-zones">tz database</a>,由于tz database的id不符合BCP47语法要求,所以CLDR在bcp47/timzezone.xml中定义了缩短的ID。这种短ID尽可能使用5个字符的,国家(2)+地区(3),的表示方式。如果不是5个字符,就表明没有对应的locale,例如utcw01对应Etc/GMT+1。文档中提到,时区不属于国家,不能假设前两个字符就是这个时区所属于的国家,并举了一个有趣的例子。如果夏威夷离开美国而加入加拿大,它的CLDR时区符号还是ushnl,而不会改变。</p>
<blockquote>
<p>The ‘u’ extension data is stored in multiple XML files located under common/bcp47 directory in CLDR. Each file contains the locale extension key/type values and their backward compatibility mappings appropriate for a particular domain. common/bcp47/collation.xml contains key/type values for collation, including optional collation parameters and valid type values for each key.</p>
</blockquote>
<blockquote>
<p>The ‘t’ extension data is stored in common/bcp47/transform.xml.</p>
</blockquote>
<p>extlang和variant是比较少用到的,所以不在这里解读。同理不解读U Extension中的Subdivision和T Extension。</p>
<p>References:</p>
<ul>
<li><a href="https://tools.ietf.org/html/bcp47">BCP47</a></li>
<li><a href="http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">IANA Language Subtag Registry</a></li>
</ul>
<h4 id="bcp47-language-tag-conversion">BCP47 Language Tag Conversion</h4>
<p>BCP47 Language Tag也不是完全等价于CLDR的Language ID。</p>
<p>A valid [BCP 47] language tag can be converted to a valid Unicode language/locale identifier by performing the following transformation.</p>
<ol>
<li>Canonicalize the language tag (afterwards, there will be no extlang subtag)</li>
<li>Replace the BCP 47 primary language subtag “und” with “root” if no script, region, or variant subtags are present</li>
<li>If the BCP 47 primary language subtag matches the type attribute of a languageAlias element in <a href="http://www.unicode.org/reports/tr35/tr35-info.html#Supplemental_Data">Supplemental Data</a>, replace the language subtag with the replacement value.
<ol>
<li>If there are additional subtags in the replacement value, add them to the result, but only if there is no corresponding subtag already in the tag.</li>
</ol>
</li>
<li>If the BCP 47 region subtag matches the type attribute of a territoryAlias element in Supplemental Data, replace the language subtag with the replacement value, as follows:
<ol>
<li>If there is a single territory in the replacement, use it.</li>
<li>If there are multiple territories:
<ol>
<li>Look up the most likely territory for the base language code (and script, if there is one).</li>
<li>If that likely territory is in the list, use it.</li>
<li>Otherwise, use the first territory in the list.</li>
</ol>
</li>
</ol>
</li>
</ol>
<p>Examples:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Original</th>
<th style="text-align: left">Result</th>
<th style="text-align: left">Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">en-US</td>
<td style="text-align: left">en-US</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">und</td>
<td style="text-align: left">root</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">und-US</td>
<td style="text-align: left">und-US</td>
<td style="text-align: left">no changes, because region subtag is present</td>
</tr>
<tr>
<td style="text-align: left">und-u-cu-USD</td>
<td style="text-align: left">root-u-cu-usd</td>
<td style="text-align: left">changed, because region subtag is present</td>
</tr>
<tr>
<td style="text-align: left">cmn-TW</td>
<td style="text-align: left">zh_TW</td>
<td style="text-align: left">language alias</td>
</tr>
<tr>
<td style="text-align: left">sr-CS</td>
<td style="text-align: left">sr-RS</td>
<td style="text-align: left">territory alias</td>
</tr>
<tr>
<td style="text-align: left">sh</td>
<td style="text-align: left">sr-Latn</td>
<td style="text-align: left">multiple replacement subtags, 3.1 above</td>
</tr>
<tr>
<td style="text-align: left">sh-Cyrl</td>
<td style="text-align: left">sr-Cyrl</td>
<td style="text-align: left">no replacement with multiple subtags, 3.1 above</td>
</tr>
<tr>
<td style="text-align: left">hy-SU</td>
<td style="text-align: left">hy-AM</td>
<td style="text-align: left">multiple territory values, 4.2 above. <territoryAlias type=”SU” replacement=”RU AM AZ BY EE GE KZ KG LV LT MD TJ TM UA UZ” …/></td>
</tr>
</tbody>
</table>
<p>languageAlias的定义在文件<code class="highlighter-rouge">common/supplemental/supplementalMetadata.xml</code>中。 <code class="highlighter-rouge"><languageAlias type="sh" replacement="sr_Latn" reason="legacy"/></code>定义了别名替换。sh是sr的别名,所以替换,同时因为没有script subtags,所以附加Latn。而下一条,由于已经有Cyrl,所以仅替换为sr。有点复杂。hy-SU的例子,需要用到likely subtags的概念,相应的定义在<code class="highlighter-rouge">common/supplemental/likelySubtags.xml</code>。hy语言的likely subtag定义<code class="highlighter-rouge"><likelySubtag from="hy" to="hy_Armn_AM"/></code>,所以替换后是<code class="highlighter-rouge">hy-AM</code>。</p>
<h4 id="locale-identifier-on-unix">Locale Identifier on Unix</h4>
<blockquote>
<p>On POSIX platforms such as Unix, Linux and others, locale identifiers are defined by <a href="https://en.wikipedia.org/wiki/ISO_15897">ISO 15897</a>, which is similar to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier. It is defined in this format: <code class="highlighter-rouge">[language[_territory][.codeset][@modifier]]</code>. — from wikipedia</p>
</blockquote>
<p>和CLDR的一个较大不同点在于将编码也作为id的一部分。术语POSIX Locale等价于C Locale,也就是在没有指定任何实际的Locale的情况下,glibc函数遵循的缺省行为。在CLDR中,C Locale的对应术语叫root locale,BCP47对应的primary langauge tag <code class="highlighter-rouge">und</code>。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Output of the command 'locale' on Ubuntu 16.04
# enumerate all posix locale environment variables.
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
</code></pre></div></div>
<p>The Open Group组织的标准The Single Unix Specification的<a href="http://pubs.opengroup.org/onlinepubs/007908799/xbd/locale.html#tag_005_002">Locale</a>部分详细的描述了Unix平台上的locale环境变量和locale定义文件。我们可以看到标准定义的环境变量比Ubuntu中实现的要少。在C代码中,我们通过函数<code class="highlighter-rouge">setlocale()</code>来设置程序的locale。<a href="http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html">环境变量</a>这篇文章中,描述了环境变量对于哪个函数有影响,可以作为参考。</p>
<p>Locale定义文件是源文件,我们需要使用命令<code class="highlighter-rouge">localedef</code>将其编译为二进制文件。这些源文件通常位于<code class="highlighter-rouge">/usr/share/i18n/locales</code>目录,相应的二进制文件位于<code class="highlighter-rouge">/usr/lib/locale</code>,可以查看localedef的manpage了解这些信息。两篇参考文献都详细的描述了locale定义文件,大部分雷同,不过在Base Specification中提供了POSIX Locale的定义。在<code class="highlighter-rouge">/usr/share/i18n/charmaps</code>存放了各种字符集(charset)的定义文件,也就是编码的定义。</p>
<p>References:</p>
<ul>
<li><a href="http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html">Locale in The Open Group Base Specification</a></li>
<li><a href="http://pubs.opengroup.org/onlinepubs/007908799/xbd/locale.html#tag_005_002">Locale in THe Single Unix Specification</a></li>
</ul>
<h4 id="locale-identifier-on-windows-platforms">Locale Identifier on Windows Platforms</h4>
<blockquote>
<p>Windows uses specific language and territory strings. The locale identifier (LCID) for unmanaged code on Microsoft Windows is a number such as 1033 for English (United States) or 1041 for Japanese (Japan). These numbers consist of a language code (lower 10 bits) and culture code (upper bits) and are therefore often written in hexadecimal notation, such as 0x0409 or 0x0411.</p>
</blockquote>
<blockquote>
<p>Starting with Windows Vista, new functions[2] that use BCP 47 locale names have been introduced to replace nearly all LCID-based APIs.</p>
</blockquote>
<p>References:</p>
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/ms912047(WinEmbedded.10).aspx">Microsoft Locale ID Chart with decimal equivalents</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/cc233968.aspx">LCID Structure</a></li>
</ul>
<h4 id="locale-identifier-on-web">Locale Identifier on Web</h4>
<p>W3C采用BCP47标准。</p>
<blockquote>
<p>Identification of language and locale has a broad range of applications within the World Wide Web. Existing standards which make use of language identification include the xml:lang attribute in <a href="http://w3c.github.io/ltli/#bib-XML10">XML10</a>, the lang and hreflang atttributes in <a href="http://w3c.github.io/ltli/#bib-HTML">HTML</a>, the language property in <a href="http://w3c.github.io/ltli/#bib-XSL10">XSL10</a>, and the :lang pseudo-class in CSS <a href="http://w3c.github.io/ltli/#bib-CSS3-SELECTORS">CSS3-SELECTORS</a>. Language tags are also used to identify locales, such as in the Unicode Common Locale Data Repository or “CLDR” project <a href="http://w3c.github.io/ltli/#bib-CLDR">CLDR</a>.</p>
</blockquote>
<blockquote>
<p>Many other W3C and Web-related specifications use language tags:</p>
<ul>
<li>XHTML 1.0 uses language tags in the HTML lang attribute and the XML xml:lang attribute, as well as the hreflang attribute.</li>
<li>HTTP uses language tags in the Accept-Language and Content-Language headers.</li>
<li>SMIL and SVG can use language tags in the switch statement.</li>
<li>CSS and XSL use language tags for detailed style control.</li>
</ul>
<p>Note also that language information can be attached to objects such as images and included audio files.</p>
</blockquote>
<p>References:</p>
<ul>
<li><a href="https://www.w3.org/TR/ltli/">Language Tags and Locale Identifiers for the World Wide Web</a></li>
</ul>
<h3 id="locale-inheritance-and-matching">Locale Inheritance and Matching</h3>
<h4 id="locale-inheritance">Locale Inheritance</h4>
<p>Locale主要以语言为核心来组织数据,那么在相同的语言下不同文字和地域的locale数据其实共同的。所以在组织locale数据的时候,是采用递增的方式,在通用语言部分存放大部分的数据,对于特定的locale仅仅存放特有的数据。这就是Locale的继承关系。在某些特殊情况下,孩子locale可以指定没有某个父辈locale中数据,这是可能的哦。Locale的继承关系,也不是移除地域符号那么简单。例如zh_Hant的parent就是root,而不是zh。(zh_Hant的collation还是要遵循zh)所以在CLDR中定义了parentLocale来覆盖正常的继承关系。我们可以在LDML Part6: Supplemental找到相应的定义。</p>
<p>缺省值</p>
<blockquote>
<p>For identifiers, such as language codes, script codes, region codes, variant codes, types, keywords, currency symbols or currency display names, the default value is the identifier itself whenever if no value is found in the root. Thus if there is no display name for the region code ‘QA’ in root, then the display name is simply ‘QA’.</p>
</blockquote>
<p>有些locale数据是仅仅与地域相关的,例如货币,计量,星期的约定。在LDML中举了一个例子,fr_US是一个不存在的locale,那么当用于货币时,货币的符号遵循语言fr的部分,货币数量格式遵循US的部分。如果一个locale没有指定地域,那么地域相关的设定用<a href="http://www.unicode.org/reports/tr35/#Likely_Subtags">Likely Subtags</a>来推断。</p>
<p>References:</p>
<ul>
<li><a href="http://www.unicode.org/cldr/charts/latest/supplemental/likely_subtags.html">Likely Subtags</a></li>
</ul>
<h4 id="locale-lookup-fallback">Locale Lookup Fallback</h4>
<p>在LDML4.1.1小节,描述了Bundle Lookup和Item Lookup。Bundle Lookup会按照zh_CN, zh, default locale, root的顺序查找合适的locale。而Item Lookup不使用default locale,在没有找到当前指定语言包zh的情况下,会去找root_alias*(语言使用root,其他尽可能的找别名)。使用缺省locale的方式对于message的显示是有效的,地域或许也可以从language中去推导,但是对于排序,断句等就不适合,所以对于它们使用root。</p>
<h4 id="locale-matching">Locale Matching</h4>
<p>在RFC4647中定义了几个用于选择Locale的概念:</p>
<p>A <strong>language range</strong> is a string similar in structure to a language tag that is used for “identifying sets of language tags that share specific attributes”.</p>
<p>A <strong>language priority</strong> list is a collection of one or more language ranges identifying the user’s language preferences for use in matching. As the name suggests, such lists are normally ordered or weighted according to the user’s preferences. The HTTP <a href="http://w3c.github.io/ltli/#bib-RFC2616">RFC2616</a> Accept-Language<a href="http://w3c.github.io/ltli/#bib-RFC3282">RFC3282</a> header is an example of one kind of language priority list.</p>
<p>A <strong>basic language range</strong> is simply a language tag used to express a language preference. An <strong>extended language</strong> range allows a more expressive set of language preference through the use of a wildcard subtag <code class="highlighter-rouge">*</code>.</p>
<p>CLDR的4.4 Language Matching描述了相应的算法,<a href="https://docs.oracle.com/javase/tutorial/i18n/locale/matching.html">Java Tutorial</a>提供了如何使用这些概念的例子,还是比较清楚的。</p>
<h3 id="cldr-data">CLDR Data</h3>
<p>数字 bcp47/number.xml or supplemental/numberingSystems.xml</p>
<p>References:</p>
<ul>
<li><a href="http://demo.icu-project.org/icu-bin/locexp">ICU Locale Demo</a> 应该没有包含所有的信息,不过在这里可以找到在某个locale下的数据,尤其是当你想找语言,文字,地域的翻译时。</li>
</ul>Leo Jiang解读CLDR - LocaleMarkdown, GFM and Kramdown in Jekyll2016-06-22T22:45:00+08:002016-06-22T22:45:00+08:00https://leohacker.github.io/programmer/markdown-GFM-kramdown<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> </h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#markdown-syntax" id="markdown-toc-markdown-syntax">Markdown Syntax</a> <ul>
<li><a href="#header" id="markdown-toc-header">Header</a></li>
<li><a href="#blockquotes" id="markdown-toc-blockquotes">Blockquotes</a></li>
<li><a href="#list" id="markdown-toc-list">List</a></li>
<li><a href="#code-block" id="markdown-toc-code-block">Code block</a></li>
<li><a href="#link" id="markdown-toc-link">Link</a></li>
<li><a href="#styling" id="markdown-toc-styling">Styling</a></li>
<li><a href="#inline-html" id="markdown-toc-inline-html">Inline <abbr title="HyperTextMarkupLanguage">HTML</abbr></a></li>
</ul>
</li>
<li><a href="#gfm" id="markdown-toc-gfm">GFM</a> <ul>
<li><a href="#gfm-code-block" id="markdown-toc-gfm-code-block">GFM Code block</a></li>
<li><a href="#gfm表格" id="markdown-toc-gfm表格">GFM表格</a></li>
<li><a href="#gfm-autolink" id="markdown-toc-gfm-autolink">GFM Autolink</a></li>
<li><a href="#github专用" id="markdown-toc-github专用">Github专用</a></li>
</ul>
</li>
<li><a href="#kramdown" id="markdown-toc-kramdown">Kramdown</a> <ul>
<li><a href="#image" id="markdown-toc-image">Image</a></li>
<li><a href="#table" id="markdown-toc-table">Table</a></li>
<li><a href="#math" id="markdown-toc-math">Math</a></li>
<li><a href="#footnotes" id="markdown-toc-footnotes">Footnotes</a></li>
<li><a href="#definition-list" id="markdown-toc-definition-list">Definition List</a></li>
<li><a href="#abbreviations" id="markdown-toc-abbreviations">Abbreviations</a></li>
</ul>
</li>
<li><a href="#jekyll-theme--minimal-mistake" id="markdown-toc-jekyll-theme--minimal-mistake">Jekyll Theme – Minimal Mistake</a> <ul>
<li><a href="#teaser-image" id="markdown-toc-teaser-image">Teaser Image</a></li>
<li><a href="#utility-classes" id="markdown-toc-utility-classes">Utility Classes</a></li>
</ul>
</li>
<li><a href="#customization-todo" id="markdown-toc-customization-todo">Customization TODO</a></li>
</ul>
</nav>
</aside>
<p>本文中多数语法的示例能被Kramdown支持的,都采用实际显示的方式展示,具体的语法书写可以查看源码。</p>
<h2 id="markdown-syntax">Markdown Syntax</h2>
<p>一篇文章的基本单元是章节,段落,引用,列表,程序员常用的代码块,多数文章中还会嵌入链接,图片,适当的排版。Markdown的主要特性包括:</p>
<ul>
<li>inline <abbr title="HyperTextMarkupLanguage">HTML</abbr></li>
<li>automatic paragraphs</li>
<li>headers</li>
<li>blockquotes</li>
<li>lists</li>
<li>code block</li>
<li>links</li>
<li>images</li>
</ul>
<h3 id="header">Header</h3>
<p>Header是用多个<code class="highlighter-rouge">#</code>来表示。段落依靠连续两个回车来分割,类似Latex语法,一个回车不会对段落的分割起作用。
如果你确实想要一个硬回车,而且不产生一个新段落,在行尾输入两个空格,然后一个回车。</p>
<h3 id="blockquotes">Blockquotes</h3>
<p>引用使用<code class="highlighter-rouge"><</code>作段落的前缀,可以给每一行加前缀,也可以只给第一行加。引用可以嵌套,即使用多个<code class="highlighter-rouge"><</code>做段落的前缀。
在引用的文字中,支持Markdown语法,也可以有Header, 列表,代码块。</p>
<h3 id="list">List</h3>
<p>列表有无序和有序的,无序的用<code class="highlighter-rouge">*+-</code>这几个符号,有序的用数字。列表也支持多段落,某一项可以是多个段落。
列表项中可以使用引用和代码块等语法。</p>
<h3 id="code-block">Code block</h3>
<p>递进4个空格就是代码块了,Markdown语法在代码块中是无效的。</p>
<h3 id="link">Link</h3>
<p>链接在Markdown中有两种形式,inline和reference,链接可以是相对路径。</p>
<ul>
<li><code class="highlighter-rouge">[Text](real link "optional title")</code></li>
<li><code class="highlighter-rouge">[Text][link definition]</code></li>
<li><code class="highlighter-rouge">[link definition]: real link "title"</code></li>
</ul>
<p>Link Definition仅仅用于方便Markdown文件的书写和处理,不会出现在<abbr title="HyperTextMarkupLanguage">HTML</abbr>的输出中。Link Definition可以递进两个空格。
如果不提供文字作为Link Definition,就使用Text作为Link Definition。</p>
<p>如果想自动书写链接,可以用尖括号包围。</p>
<ul>
<li><a href="http://www.google.com">http://www.google.com</a></li>
<li><a href="mailto:username@example.com">username@example.com</a></li>
</ul>
<p>图片的语法和链接相似。</p>
<ul>
<li><code class="highlighter-rouge">![Text](/path/to/img) "optional title"</code></li>
<li><code class="highlighter-rouge">![Text](id)</code></li>
<li><code class="highlighter-rouge">[id]: /path/to/img "title"</code></li>
</ul>
<h3 id="styling">Styling</h3>
<p>Markdown也提供了基本的格式语法</p>
<ul>
<li>单个 <code class="highlighter-rouge">*</code> 或者 <code class="highlighter-rouge">_</code> 斜体 Italic</li>
<li>两个 <code class="highlighter-rouge">*</code> 或者 <code class="highlighter-rouge">_</code> 粗体 Bold</li>
<li>三个 Horizontal line</li>
<li>` (backtick) 內联代码</li>
</ul>
<h3 id="inline-html">Inline <abbr title="HyperTextMarkupLanguage">HTML</abbr></h3>
<p>Markdown和<abbr title="HyperTextMarkupLanguage">HTML</abbr>语法是可以兼容的,可以直接书写<abbr title="HyperTextMarkupLanguage">HTML</abbr>语句在一个Markdown文件中,当然不推荐。
Markdown会正确的处理特殊字符的转义问题,如果是<code class="highlighter-rouge">&copy;</code>会被保留<code class="highlighter-rouge">&</code>字符,从而在<abbr title="HyperTextMarkupLanguage">HTML</abbr>中产生©;
如果是 A & B,则会被转义为<code class="highlighter-rouge">&amp;</code>,从而正确现实为A & B。类似的特殊字符有<code class="highlighter-rouge"><</code>。</p>
<h2 id="gfm">GFM</h2>
<p>GFM改进了代码块和链接,额外的提供了任务列表和表格,结合Github自己的特点,提供了Issue, PR和Commit链接。不过GFM不支持footnote的。GFM在格式上改进包括用两个 <code class="highlighter-rouge">~</code> 表示删除。</p>
<h3 id="gfm-code-block">GFM Code block</h3>
<p>在GFM中可以使用三个backtick来引用代码块,而且可以指定语言。参考GFM<a href="https://github.com/github/linguist/blob/master/lib/linguist/languages.yml">支持的语言</a></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>```bash
git push
```
</code></pre></div></div>
<h3 id="gfm表格">GFM表格</h3>
<table>
<thead>
<tr>
<th style="text-align: left">Left-aligned</th>
<th style="text-align: center">Center-aligned</th>
<th style="text-align: right">Right-aligned</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">git status</td>
<td style="text-align: center">git status</td>
<td style="text-align: right">git status</td>
</tr>
<tr>
<td style="text-align: left">git diff</td>
<td style="text-align: center">git diff</td>
<td style="text-align: right">git diff</td>
</tr>
</tbody>
</table>
<h3 id="gfm-autolink">GFM Autolink</h3>
<p>Kramdown支持GFM输入,autolink特性似乎是不支持。</p>
<h3 id="github专用">Github专用</h3>
<p>GFM也提供一些只在Github中使用的特性。在GFM中,不需要尖括号也可以自动链接。
例如直接写http://www.google.com。支持引用Issue,Pull Request,commit SHA,
或者@mention某个人或者组织。</p>
<ul>
<li><code class="highlighter-rouge">#number</code></li>
<li><code class="highlighter-rouge">username#number</code></li>
<li><code class="highlighter-rouge">username/Repository#number</code></li>
<li><code class="highlighter-rouge">username/REpository@SHA</code></li>
<li><code class="highlighter-rouge">@github/support</code></li>
</ul>
<p>任务列表</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- [ ] task description
- [X] completed task
</code></pre></div></div>
<h2 id="kramdown">Kramdown</h2>
<p>在Atom编辑器中书写的时候,是采用Markdown Preview Plus来预览的。
Markdown Preview Plus是Atom自带的Markdown Preview的fork版本,不完全支持Kramdown的语法,
所以很多Kramdown特有的语法不能得到正确显示。</p>
<h3 id="image">Image</h3>
<p>可以使用attributes指定图像的宽高。</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Here is an inline !<span class="p">[</span><span class="nv">smiley</span><span class="p">](</span><span class="sx">smiley.png</span><span class="p">)</span>{:height="36px" width="36px"}.
And here is a referenced !<span class="p">[</span><span class="nv">smile</span><span class="p">]</span>
<span class="p">[</span><span class="ss">smile</span><span class="p">]:</span> <span class="sx">smile.png</span>
{: height="36px" width="36px"}
</code></pre></div></div>
<h3 id="table">Table</h3>
<p>从实际效果看,Github不支持。
|—————–+————+—————–+—————-|
| Default aligned |Left aligned| Center aligned | Right aligned |
|—————–|:———–|:—————:|—————:|
| First body part |Second cell | Third cell | fourth cell |
| Second line |foo | <strong>strong</strong> | baz |
| Third line |quux | baz | bar |
|—————–+————+—————–+—————-|
| Second body | | | |
| 2 line | | | |
|=================+============+=================+================|
| Footer row | | | |
|—————–+————+—————–+—————-|</p>
<h3 id="math">Math</h3>
<p>写一个inline公式 <script type="math/tex">a_i</script></p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
& \phi(x,y) = \phi \left(\sum_{i=1}^n x_ie_i, \sum_{j=1}^n y_je_j \right)
= \sum_{i=1}^n \sum_{j=1}^n x_i y_j \phi(e_i, e_j) = \\
& (x_1, \ldots, x_n) \left( \begin{array}{ccc}
\phi(e_1, e_1) & \cdots & \phi(e_1, e_n) \\
\vdots & \ddots & \vdots \\
\phi(e_n, e_1) & \cdots & \phi(e_n, e_n)
\end{array} \right)
\left( \begin{array}{c}
y_1 \\
\vdots \\
y_n
\end{array} \right)
\end{align*} %]]></script>
<p>由于MathJax对于产生的<abbr title="HyperTextMarkupLanguage">HTML</abbr>有格式上的要求(CDATA中的换行),不能使用<abbr title="HyperTextMarkupLanguage">HTML</abbr>压缩。所以在设计和使用Jekyll主题的时候,不能使用Jekyll-Compress-<abbr title="HyperTextMarkupLanguage">HTML</abbr>。</p>
<h3 id="footnotes">Footnotes</h3>
<p>This is some text.<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. Other text.<sup id="fnref:footnote"><a href="#fn:footnote" class="footnote">2</a></sup>.</p>
<h3 id="definition-list">Definition List</h3>
<p>效果如下:</p>
<dl>
<dt>ES6/ES2015</dt>
<dd>The new version of the popular JavaScript language</dd>
</dl>
<h3 id="abbreviations">Abbreviations</h3>
<p>This is some text not written in <abbr title="HyperTextMarkupLanguage">HTML</abbr> but in <abbr title="It's called Markdown">another language</abbr>!</p>
<h2 id="jekyll-theme--minimal-mistake">Jekyll Theme – Minimal Mistake</h2>
<h3 id="teaser-image">Teaser Image</h3>
<p>在_config.yml里面可以设置缺省的<code class="highlighter-rouge">teaser: "500x300.png"</code>,在Front Matter里面可以设置</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>header:
teaser: my-awesome-post-teaser.jpg
</code></pre></div></div>
<h3 id="utility-classes">Utility Classes</h3>
<p>https://mmistakes.github.io/minimal-mistakes/docs/utility-classes/</p>
<h2 id="customization-todo">Customization TODO</h2>
<ul>
<li>FancyBox 显示图片 http://fancyapps.com/fancybox/</li>
<li>Github Syntax Highlight https://github.com/mojombo/tpw/blob/master/css/syntax.css</li>
</ul>
<p>Images
… which is shown in the screenshot below:
<img src="https://leohacker.github.io/assets/screenshot.jpg" alt="My helpful screenshot" /></p>
<p>Download
… you can <a href="https://leohacker.github.io/assets/mydoc.pdf">get the PDF</a> directly.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>First footnote <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:footnote">
<p>Second footnote <a href="#fnref:footnote" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Leo Jiang记录Markdown的标准语法,GFM的扩展语法,Kramdown语法,以及在Jekyll里面写作用的Liquid标记。Java on MacOSX2014-11-26T11:19:00+08:002014-11-26T11:19:00+08:00https://leohacker.github.io/macosx/java/java-on-macosx<h2 id="java-on-macosx">Java on MacOSX</h2>
<p>从某个版本起,MacOSX就不再默认安装Java的虚拟机了,自然也没有JDK。Apple官方发布的Java即使是2014版本也仅包含Java1.6,估计以后官方都不会发布新版本了吧。</p>
<p>所以我们需要安装Oracle版本的JDK。而对于管理MacOSX上的多版本Java,我曾经有痛苦的经历,大量的符号链接,多个不同位置的Library。对于不做MacOS开发的人,不了解系统的<code class="highlighter-rouge">/System/Library/Frameworks</code> <code class="highlighter-rouge">/System/Libraray/Java</code> <code class="highlighter-rouge">/Library/Java/JavaVirtualMachine</code>的含义,简直就是泥潭,很难理顺其中的关系。</p>
<p>最近升级Yosemite后,再次查看Java的环境,真的是干净了。由于系统清除了过去Apple安装和保持在系统中多个低版本符号链接,目前在<code class="highlighter-rouge">/System/Library/Frameworks/JavaVM.framework</code>中仅保留了我升级后的Java1.8。Frameworks在MacOSX中的作用就类似共享库,不过其中也可以包含文档,资源文件等非代码的文件。每个Framework可以包含多个版本,以前在这里就有很多低版本的Java指向<code class="highlighter-rouge">Versions/CurrentJDK</code>目录,使用当前版本的虚拟机环境。来自Oracle的JDK1.8和原来的Java1.6都安装在<code class="highlighter-rouge">/Library/Java/JavaVirtualMachine</code>目录里面。而<code class="highlighter-rouge">/System/Library/Java</code>里面没有什么东西。</p>
<h2 id="library-on-macosx">Library on MacOSX</h2>
<p>参考这篇官方文档<a href="https://developer.apple.com/library/mac/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileSystemOverview/FileSystemOverview.html">File System Programming Guide</a>,Library可以有多个不同级别的存储位置。在每个用户的目录下,有一个Library,这里面存放用户相关的数据,例如Preferences。JDK存放在<code class="highlighter-rouge">/Library/Java/JavaVirtualMachine</code>和<code class="highlighter-rouge">/System/Library/Framework</code>中,不同的是在<code class="highlighter-rouge">/System/Library/Framework</code>中存放的是framework,而<code class="highlighter-rouge">/Library/Java/JavaVirtualMachine</code>中存放的是JDK1.8本身。</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The Library directory is where apps and other code modules store their custom data files. Regardless of whether you are writing code for iOS or OS X, understanding the structure of the Library directory is important. You use this directory to store data files, caches, resources, preferences, and even user data in some specific situations.
There are several Library directories throughout the system but only a few that your code should ever need to access:
Library in the current home directory—This is the version of the directory you use the most because it is the one that contains all user-specific files. In iOS, Library is placed inside the apps data bundle. In OS X, it is the app’s sandbox directory or the current user’s home directory (if the app is not in a sandbox).
/Library (OS X only)—Apps that share resources between users store those resources in this version of the Library directory. Sandboxed apps are not permitted to use this directory.
/System/Library (OS X only)—This directory is reserved for use by Apple.
</code></pre></div></div>
<h2 id="intellij-idea-still-use-jdk16">Intellij IDEA still use JDK1.6</h2>
<p>在升级系统JDK后发现一个问题,Intellij IDEA无法启动,原因是它还坚持使用JDK1.6。stackoverflow给出了两个方法:修改plist或者安装JDK1.6。</p>
<p>当某个应用程序需要使用某个指定版本的Java时,我们可以在<code class="highlighter-rouge">/Applications/the-application.app/Contents</code>中找到Info.plist,修改其中指定<code class="highlighter-rouge">JVMVersion</code>的版本即可。</p>
<p>我们可以下载Apple官方的JDK1.6 <a href="http://support.apple.com/kb/dl1572">Java for OSX 2014-001</a>,也可以下载Oracle的官方版本 <a href="http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase6-419409.html">Oracle JDK1.6 Download</a>。安装完Apple JDK 1.6以后,在<code class="highlighter-rouge">/System/Library/Frameworks/JavaVM.framework/Versions</code>就是这个样子:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.4 -> CurrentJDK
lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.4.2 -> CurrentJDK
lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.5 -> CurrentJDK
lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.5.0 -> CurrentJDK
lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.6 -> CurrentJDK
lrwxr-xr-x 1 root wheel 10B Nov 26 14:45 1.6.0 -> CurrentJDK
drwxr-xr-x 7 root wheel 238B Nov 26 14:45 A
lrwxr-xr-x 1 root wheel 1B Nov 26 14:45 Current -> A
lrwxr-xr-x 1 root wheel 59B Nov 26 14:45 CurrentJDK -> /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents
</code></pre></div></div>
<p>那些符号链接都回来了,不过这次我们有经验了,能够很好地分辨JDK1.6和JDK1.8。于是我们可以总结出Apple和Oracle的JDK安装方式:</p>
<ul>
<li><code class="highlighter-rouge">/System/Library/Java/JavaVirtualMachines/1.6.0.jdk</code> Apple安装在系统级Library目录下。</li>
<li><code class="highlighter-rouge">/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk</code> Oracle安装在应用程序级Library目录下。</li>
<li><code class="highlighter-rouge">/System/Library/Frameworks/JavaVM.framework/Versions</code> 包含两个版本的framework支持。</li>
</ul>
<p>我们还是可以用/usr/libexec/java_home来找出系统默认的JDK的HOME目录。Eclipse等工具也会在系统中查找到多个版本的JDK。</p>Leo Jiang当多个版本Java在MacOSX上的时候