zsh + tmux + vim = awesome

I have frequently switched between working on IDEs like IntelliJ and PyCharm to plain old command line. While I usually settled on working with bash and vim, I realized that there is a lot of cool tweaking that you can do, to make the overall development experience much more productive and awesome.

zsh is awesome!

First of all, lets come to the shell. I usually used to just go with bash, until I saw that zsh is way more loaded ( for instance cycling through your directory when you do cd, correction suggestions, … I am still discovering all the cool stuff, but here’s a great  HN discussion ). So I switched to using zsh as my default shell. I also was able to get a cool theme, by using the oh-my-zsh framework – which you should definitely checkout! The default ‘robbyrussell’ theme was good enough for me, I didn’t want anything too fancy that required me to add powerfonts / color schemes. But here’s how my shell prompt looks like. It basically prints out my directory, followed by which git branch I am currently on which is super useful. There’s other stuff that you can add to the right like your battery status, etc if interested.

Screen Shot 2016-04-03 at 12.06.14 AM

Making Vim more awesome!

While I definitely do not want to start another flame war on vim vs emacs, here’s my take. I used emacs initially and I really loved it. Being able to run the terminal from within the shell and gdb without doing anything special was super convenient. But a lot of my work requires ssh-ing to a bunch of different servers, and most of them don’t come with emacs installed. I figured that if I was using vim most of the time, I might as well use it all of the time, so that I could do the keystrokes quicker.

I have installed some really useful plugins and remapped some of the keys, and it has made working in vim a lot more efficient. I’ll include my vimrc at the bottom of the post. I use Vundle  for plugin management. I use the gruvbox dark theme, because it is nice and aesthetic and works right out of the box without needing to install new palettes. I found a lot of the plugins by looking at http://vimawesome.com/. Currently I have used these for working on C++. I am still tweaking vim for Python and Clojure. Basically with the right set of plugins, your vim is as powerful and much more light weight than any IDE – IntelliJ used to take some crazy amount of memory on my OS X.  I remapped C-d to easily switch between vim and the shell, and also have a mapping that makes sure that the current search term is always at the page centre. I am using a minimal set of plugins – NerdTree for directory navigation, Tagbar for quickly capturing the file structure ( needs to have ctags installed), Airline for getting a nice status line at the bottom, Syntastic that quickly highlights any compile time errors, similar to an IDE and a couple of others. I am yet to try YouCompleteMe for code completion, but it’s definitely on the top of my list.

One super useful feature in vim is that it allows opening files in different tabs, something I only discovered recently. Simply do :tabe <file> and the file will be opened in a new tab. It is very easy to cycle between tabs using gT. You can also directly jump to a tab i by using {i}gT. I installed Tabline, which is a nice plugin that adds a number to every tab title, making it much easier to jump to tabs directly.

Here’s the final effect:

Screen Shot 2016-04-03 at 1.08.01 AM

tmux saves the day!

Finally tmux! I was initially quite scared of using tmux – I thought it was too complicated for a noob like me, but it turns out that there are really very few keystrokes with which you can do most of the useful stuff. I prefer using tmux for splitting panes and stuff because that means that I can use it both on linux and OS X, unlike iterm.

The biggest advantage of tmux is definitely that it keeps all your sessions alive, exactly like you left them. No more worrying when you accidentally unplug the wifi cable, or snap your macbook shut. I typically find it very useful when building a large project, I can just come back in the next day and check on the build status by attaching back my session, or can attach and continue with my debugging, where I left off during the commute.

There are some other cool projects like tmuxinator, which will actually fire up various sessions for you,  tile your panes and keep everything ready for you each day! I haven’t had a chance to look at it so far, but that’s definitely something I want to try out

 

All in all I am definitely more productive with all these tweaks. If there are any plugins or tools that have served you well, I would be very excited to try those out, so do let me know. Finally here’s my vimrc:


set backspace=indent,eol,start
set nocompatible " be iMproved, required
filetype off " required
" set the runtime path to include Vundle and initialize
set rtp+=/Users/xyz/.vim/bundle/Vundle.vim
call vundle#begin()
" alternatively, pass a path where Vundle should install plugins
"call vundle#begin('~/some/path/here')
" The following are examples of different formats supported.
" Keep Plugin commands between vundle#begin/end.
" let Vundle manage Vundle, required
Plugin 'VundleVim/Vundle.vim'
Plugin 'majutsushi/tagbar'
Plugin 'morhetz/gruvbox'
Plugin 'scrooloose/nerdtree'
Plugin 'scrooloose/syntastic' "Syntax checking hacks for vim, displays errors on left after save
Plugin 'luochen1990/rainbow'
Plugin 'terryma/vim-multiple-cursors' "Submlime multiple text selection
Plugin 'Raimondi/delimitMate' "Auto add dual quotes, parenthesis, brackets, etc.
Plugin 'mkitt/tabline.vim'
Plugin 'bling/vim-airline'
" All of your Plugins must be added before the following line
call vundle#end() " required
filetype plugin indent on " required
" To ignore plugin indent changes, instead use:
"filetype plugin on
"
" Brief help
" :PluginList – lists configured plugins
" :PluginInstall – installs plugins; append `!` to update or just :PluginUpdate
" :PluginSearch foo – searches for foo; append `!` to refresh local cache
" :PluginClean – confirms removal of unused plugins; append `!` to auto-approve removal
"
" see :h vundle for more details or wiki for FAQ
" Put your non-Plugin stuff after this line
nmap <F8> :TagbarToggle<CR>
nmap <F10> :NERDTreeToggle<CR>
nmap <C-d> :sh<CR>
" emacs binding for begin and end
cnoremap <c-a> <home>
cnoremap <c-e> <end>
" Keep search matches in the middle of the window.
nnoremap n nzzzv
nnoremap N Nzzzv
set background=dark
colorscheme gruvbox
let g:rainbow_active = 1
:syntax on
:set laststatus=2
:set statusline=%F
:set cursorline
:set tags+=$OSCAR_HOME/tags
:set number
"Use spaces instead of tabs
set tabstop=2 softtabstop=2 expandtab shiftwidth=2

view raw

.vimrc

hosted with ❤ by GitHub

Tech Book for this month : Computer Architecture

It’s now a year since I left grad school and almost a year into my first job. One good part about grad school, was that it automatically meant doing a lot of reading and learning, every week, but it’s difficult to maintain that routine once you are out of the system. So early into 2016, I have decided to put some time each month and try reading a new technical book every month. I will first be reading up the books that have been on my bookshelves since some time.

I’ll be starting this month with Computer Architecture: A Quantitative Approach. To make sure I have placed it right next to my cosy futon!

IMG_0846
The book ready for me to grab!

 

Segmentation and Paging in Linux

The goal of both paging and segmentation is ultimately the same – to provide a way to translate virtual or logical addresses into physical addresses, that actually exist in the RAM.

Segmentation

Segmentation in Linux is not really used as the main virtualization technique anymore, and is kept mainly as a relic from the past.

There is a segment register for the code and the data segment in the program, as well as a bunch of other segments, which we won’t discuss. But each of these segments are described by a segment descriptor which is an 8 byte  entity.  There is obviously a bunch of stuff in each segment descriptor, but it mainly has the base address for each segment, its bounds and so on. The list of all the segment descriptors is maintained by the kernel in the Global Descriptor Table (GDT).  There is 1 GDT per every processor.  The kernel can also use a per-process descriptor table also called the Local Descriptor Table (LDT),  but this is rarely used.

So how do we get to the segment descriptors? Each segment register stores a segment selector.  A segment selector is a 16 bit entity. The 2 LSBs stand for the Requested Privilege Level (RPL).  Clearly it can represent 4 different privilege levels. The privilege levels 0-2 are all considered to be kernel mode whereas  level 3 is considered to be user mode. Thus this provides an easy way to protect the user mode from performing kernel-mode operations. The next bit is either a 0 or 1 indicating whether we should be using a GDT or LDT. Finally the remaining 13 MSBs are an index into the GDT(LDT) for the segment descriptor corresponding to the segment. In practice though, it isn’t necessary to access the GDT each time we need to look up a segment descriptor. Whenever there is a context switch, the kernel will initialize the segment registers with the corresponding segment selectors for the new process. At the same time, there are also corresponding non-programmable registers that will store the segment descriptor for the segment. Thus the  GDT needs to be looked up just once, during the context switch.

 

Paging

Paging in linux just provides a means to translate the virtual page numbers (VPN) to the corresponding physical page numbers. The typical way to do this is to maintain a page table. A page table simply provides a mapping from each VPN to the corresponding PPN.  Unfortunately, even for 32 bit architectures, this would mean having 2^32 entries, in the main memory.

The CR3 register stores the base address of the page table. Obviously, since each process will have a different VPN -> PPN mapping, there is a separate page table for each process. Whenever there is a context switch, the kernel saves the CR3 for the process in its task_struct and restores the CR3 from the task_struct for the new process.

Thus in linux, a page table is usually multi-level. First let’s talk about 32 bit architectures.  In this case the CR3 register points to page directory, the page directory in turn points to the page table. Each VPN is thus split into 3 parts. The first 10 MS bits are the index into the page directory and the next 10 bits are an index into the page table. Finally the last 12 bits are the actual offset in the page.  In case of 64 bit architectures, there is even more level of indirection.

 

Static vs Shared libraries and Dynamic Linking

What is a (static) library?

A library is just an object file, that is obtained by pre-compiling several source files together. These source files offer common functionality that may be re-used in a lot of other source files. Thus instead of compiling them each time, with each other source file that requires them, it is more efficient to just pre-compile it, ready to be linked with whichever program needs to use it. Static libraries end with a “.a” extension.

How does the compiler know that a particular static library must be linked to some program?

Well, this is specified by using the -l option with gcc. For instance, if your hello.c program needs to use the libmath.a library, then it would be compiled as 

gcc  -c hello.c

gcc -o hello hello.o -lmath

Also if the libmath.a library isn’t located in a standard path like say /usr/lib, then it is also necessary to specify the exact path at which it may be found by using the -L switch with gcc

Ok, so what are shared libraries then?

So static libraries are not bad. They keep pre-compiled the commonly used code, so it is efficient and ready to use, we save up on the compile time. But there may be more than 1 program that is using the libmath library. And each such separate program that needs the libmath library simply just links in its own copy. This means that multiple programs running in the memory may have multiple copies of the same library. Clearly this wastes space. What we want to have is, a way of sharing the same library amongst several programs, that may be resident in the memory at the same time. Shared libraries offer this solution. Shared libraries typically end with the “.so” extension

Well that, how do shared libraries work?

First of all, shared libraries are usually compiled with -fPIC option (positionally independent code) – this makes it possible to place it at any virtual memory location, during runtime. 

Executables ( which are usually in the ELF format btw) that need to use shared libraries, aren’t directly linked to them, instead the ELF header, contains information about which shared libraries the executable will require.

Now it is the job of the dynamic linker, to locate these libraries at run time ( if some other program is already using these libraries, they might already be in the memory, otherwise they will need to be loaded into the memory) and perform all the symbol resolution. Again, if the libraries aren’t in some standard location like /usr/lib, then the path needs to be specified by setting the LD_LIBRARY_PATH variable. Shared libraries are pretty much the norm in most of the use cases today, with static libraries being used rarely.

Some useful tools while working with shared libraries?

One very useful tool is ldd.  Should be run as ldd <name of your executable>. This will tell you all the shared libraries that are used in the executable, and also the path to which they resolve. Often useful in figuring out some weird compilation error. 

ldconfig is another tool. This will keep track of all the possible directories in which the shared libraries may be located. The paths can be specified in the /etc/ld.so.conf and it will also search standard paths like /usr/lib. It then creates a list of all the different libraries in /etc/ld.so.cache. This helps the dynamic linker to resolve the paths quickly without wasting too much time in searching all the possible directories.

And finally what is dynamic linking?

Not going into it much here, but essentially instead of loading all the shared libraries that a program depends on upfront, this is a way of lazily loading a library, only when it is required. The dlopen API is provided and used for achieving this.

Dynamic programming via memoization

I personally find it much easier to make a recursive function efficient by caching previous solutions a.k.a a top down approach vs a bottom up approach.

There are so many interesting resources that go through some cool problems to which DP can be applied. The most interesting ones I found are the MIT 6.006 series on youtube by Erik Demaine. There, Erik discusses some nice basic properties of DP, such as it being equivalent to a DAG traversal etc, etc and then goes on to apply it to some pretty unconventional problems like playing the guitar or blackjack. Here’s the link : Video Lectures. They focus mainly on the bottom up approach.

Another compendium of  some classic DP problems is DP-Practice. Here again the flash animations mainly focus on the bottom-up approach.

In this blog post I’ll focus on some of the main techniques/ gotchas that I’ve found useful while using the top-down / memoization approach on recursive solutions.

 

  • If you are planning to apply memoization on your recursive function make sure that it returns a value ( the answer to the subproblem). Often recursive functions are void, with the state being stored internally, but it will take much more effort to convert such recursive functions to memoized DP’s
  • Make sure that your recursive problem is solving sub-problems and reducing the state-space for all the parameters. While writing recursive solutions, just because it is already very “brute-forcey”,  it is easy to forget that certain parts are unnecessary. Make sure to consistently prune the state at every call, and pass strictly the state that is required for that particular call. If you skip doing this, it will be impossible to memoize your DP solution.
  • Remember to pass in only the “independent” states and not the “dependent” state as parameters to your recursive function. For instance, let’s say you have a problem where you want to check if a string s1 is formed by some interleaving of strings s2 and s3. One possible state you could pass in the recursive function is the set{i, j, k} the indices for s1, s2 and s3 where you are currently positioned. However notice, that in this case, the variable i is not really required, it is dependent on j and k, basically i = j+k in any valid solution to the problem. Thus you really need to pass in just 2 variables  {j, k} in the state space.
  • Once you have decided the minimum set of parameters that make up your state space, the memoization is just about making either an array/ hashmap , that stores for each valid combination of the parameters, the solution that was computed by the function. Here’s why the previous step is important, the space complexity of your solution will grow as the number of parameters grow.
  • Now that you have a cache in place for storing solutions, you just need to modify your functions (at the logical level, there may of course be some other simple code changes you’ll need to make) so that :
    • on entering the function you check whether the particular combination of parameters already has a solution in the cache, if so simply return with the solution.
    • if not, then just continue the rest of the recursive function as before, but just before returning the result, make sure to store it in the cache, against the current set of parameter values.

And that should be it. This should usually be enough to quickly convert that cpu-hogging sloth of a recursive function to its much slicker DP cousin!

Of course a lot of caveats apply, and this is just the tip of the iceberg. Often top down approaches may still not be slick enough to pass the timing limits in some fancy competitive programming competition. But if you just want to be happy and satisfied about owning a toolkit to cook up some efficient solution to a DP problem, this is definitely not bad.

Great Engineers

Here is my  (possibly wrong and  definitely still a WIP) opinion on what makes engineers, and in particular programmers really really good.

  1. They are infinitely curious.

    • Can we do better than this? (They are always asking this)
    • What was the reason for a particular design decision?
    • Why did changing the code in this way, make the bug disappear? (In particular they are never satisfied that a fix makes the bug go away, until they have really reached the root cause)
    • Under what conditions would my code break or behave unexpectedly?
    • Most importantly, they never make any assumptions about there code or believe in any assumptions, but verify them before proceeding.
  2. They are fearless:

    • They are never to afraid or lazy of diving deep into any complex code to make sure they understand what it does (see point 1)
    • They aren’t afraid of making potentially large impact changes on a code base, if it will make it better in the long run.

 

man page of the week : mincore

There’s always some nifty command you can find lurking in the linux manpages. This then helps you instantly and magically do whatever you were trying to do. The caveat is that to do the trick you do sometimes need to remember some elusive combination of options. For instance, I have completely forgotten a neat combination of options I used with iostat during my intern project, that fetched me the right level of detail, for checking the read/write performance.

But before I ramble too far, here’s the point : whenever I can remember, I will let some neat manpage enjoy some limelight on this blog, as a tribute to all the times manpages have saved my life :).

So here I present mincore. The command that lets you check if a page is in memory, or will need to be fetched from the disc! No magic combinations for this one so far! Go man it now!

Hello World!

Welcome to my new blog! This is where I’ll blog about neat stuff I discover as I explore computer systems, though there might be a fair bit of overlap with other related areas that also interest me.

So I just managed to bag a free domain from namecheap, and it is : objdump.me

Why objdump ?

First of all let’s get clear on the name. I learnt about objdump regretfully quite late ( when I was interning at Amazon) and it than came back again while taking grad OS at UT Austin. This is a very neat shell command, that will show you the contents of your assembled code ( e.g. a.out ) in all the gory details, you can actually view the segments and all. Its pretty exciting. Jessica McKellar has a couple of very very useful blogposts that explain this in a fun exercise. Highly recommended :