The Gluster Blog

Gluster blog stories provide high-level spotlights on our users all over the world

A lightning tour of the thingy that runs your code

Gluster
2012-11-13
What happens when you press “go” ? 

We think in abstractions… we code in mega-abstractions… but…

I recently read about Amdahl’s law in this awesome article about caching for high performance web sites: http://architects.dzone.com/articles/role-caching-large-scale.  The emphasis was on detection of the true bottleneck in a system.  The Linux operating system’s transparent and universally accepted organization lends itself quite well to recursive bottleneck analysis.  Nevertheless, its never easy, when given a slow or stalled program, to tell exactly whats wrong.

Why is this hard?

There is no right answer to this question – but part of the problem is that as developers we deal in abstractions which do not map directly to the work done by a system.  This is (kind of) a good thing – it frees us to focus on business logic, modularity, and horizontal scalability.  Language makers have done an amazing job at separating programmers from machines, and we should applaud them wholeheartedly.  But when software runs into scalability or performance problems, abstraction can really come back to bite you.  Disks might be slow, resources could be scarce, networking performance may be dismal, or simple libraries may be corrupted or missing.  In any case – you’ve got to be able to find out why – and your core programming language chops won’t always help. 


But what will help… is an understanding of your operating system. So the following is a quick tour of the Linux innards I always find useful, followed by several links which shed some much needed light on the internals of how programming languages ultimately connect to the kernel.

1) First, the directory structure in linux.  Very basic: 
/bin: These are binaries that anybody can run. 
/sbin: These are binaries that only root runs (ifup, ifconfig, …) 
/home: This is where all your data goes. 
/tmp: This is where all your temporary files go. Its cleared automatically by the OS, periodically. 
/usr: This is where user create/built programs typically go. For example, if you compile your own version of a particular C library because it didn’t work out of the box on your machine, you would put it in /usr/local. 
/opt: Applications and programs go here, in particular, the ones that you get from external sources, in one peice. For example, an executable version of google chrome would go in /opt. 
/var: System related files go here that are created by programs. Log files (/var/log), mail related stuff, cached (/log/cache/) data from programs that run.
/etc: Configuration files for stuff like networking, disks, etc… Linux runs several services: When you change something, you need to restart the corresponding service… Otherwise you sill see no results. 
/dev: The files in this directory represent devices (audio, keyboards, disk drives, etc..)

2) Some quick notes about networking:

  • the dhclient command: This command will renew your ip address (if its dynamic).  Importantly, this command also tells your DHCP server what your computer’s name is.  
  • /etc/init.d/networking restart: This command will restart the networking services, reload configuration files.
  • (file) /etc/network/interfaces: This has incantations that define the network cards, while defining their corresponding attributes – for example, what IP address to assign to it, and what IP address should be used to reach the outside world (the gateway), etc…  There is, however *no DNS information* in this file.  DNS info is in another file called … 
  • (file) /etc/resolv.conf : This is where your DNS information resides (its a map of domain names, to ip addresses).
3) Disks ~ like any OS, linux mounts and partitions disks for you.  It is these disks, partitions, and ultimately, the filesystems on those partitions, which contain the data you see when you type “ls”.  Some of the basic commands available on Linux to play with this are:

  • The fdisk command: This will show you how many disks you have on your machine, including partitions (for example, one partition is for the kernel, another for swap space, etc…).
  • The df and du commands for looking at disks/directory usage: These commands tell you the filesystem and directory usage statistics.  
  • The rsync command for synchronization: This command synchronizes directories.  Notably, you can use the -e option to sync files over ssh in different places.
  • (file) /etc/fstab: This file defines defines filesystem mounting that occurs when your OS loads.
  • A revealing and accessible comparison of the linux file system, compared with others: http://www.howtogeek.com/115229/htg-explains-why-linux-doesnt-need-defragmenting/ 

Do disks in the cloud work the same? 

Yes:  They are still mounted as devices, and the device abstraction is sufficiently robust to support extremely diverse disk implementations.  Its instructive to consider the way, in the cloud, the “/dev” directory maps to disks.  As an experiment, we can create an AMI instance with no external storage, and run the following command:

ls /dev/ 

Subsequently, we can attach an EBS (Elastic Block Storage – a fancy new-fangled cloud storage solution) drive to that same instance using the AWS console.  Shortly thereafter, when running the same command (ls /dev), we will see a new entry:  This entry represents the raw disk device.  By mounting it as a filesystem, we can then write to that detached storage device.  

4) How your code works in the Linux environment:

  • When you run your programs, they ultimately call system functions.  
  • Those system functions request operations of the kernel OR, they simply write instructions that the CPU can access.  
  • Kernel operations access devices or do other privileged tasks: Accessing a disk, connecting to a socket, etc… 
  • Library operations don’t require access to the kernel: Simple calculations can be done without accessing the kernel.  Finally, you should know that the Linux operating system is written in C.  So when a higher level language requests a service (i.e. opening a file), it ultimately does it by a C binding of one sort or another.  

5) Finally – a list of articles that will help you to understand your code, in order of increasing abstraction.

If you’re really adventurous, you can see the way the whole thing fits together here in this clickable visualization of the entire dependency hierarchy of a linux OS.  Very useful for getting a picture of whats going on under the hood: http://www.makelinux.net/kernel_map/.

BLOG

  • 06 Dec 2020
    Looking back at 2020 – with g...

    2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

    Read more
  • 27 Apr 2020
    Update from the team

    It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

    Read more
  • 03 Feb 2020
    Building a longer term focus for Gl...

    The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

    Read more