Bash One-Liners Explained, Part II: Working with strings

2012-08-24 04:36

Bash One-Liners Explained, Part II: Working with strings

by Peteris Krumins

at 2012-08-23 20:36:46

original http://feedproxy.google.com/~r/catonmat/~3/HL3b-lAPWk8/bash-one-liners-explained-part-two

This is the second part of the Bash One-Liners Explained article series. In this part I'll show you how to do various string manipulations with bash. I'll use only the best bash practices, various bash idioms and tricks. I want to illustrate how to get various tasks done with just bash built-in commands and bash programming language constructs.

See the first part of the series for introduction. After I'm done with the series I'll release an ebook (similar to my ebooks on awk, sed, and perl), and also bash1line.txt (similar to my perl1line.txt).

Also see my other articles about working fast in bash from 2007 and 2008:

Let's start.

Part II: Working With Strings

1. Generate the alphabet from a-z

$ echo {a..z}

This one-liner uses brace expansion. Brace expansion is a mechanism for generating arbitrary strings. This one-liner uses a sequence expression of the form {x..y}, where x and y are single characters. The sequence expression expands to each character lexicographically between x and y, inclusive.

If you run it, you get all the letters from a-z:

$ echo {a..z}
a b c d e f g h i j k l m n o p q r s t u v w x y z

2. Generate the alphabet from a-z without spaces between characters

$ printf "%c" {a..z}

This is an awesome bash trick that 99.99% bash users don't know about. If you supply a list of items to the printf function it actually applies the format in a loop until the list is empty! printf as a loop! There is nothing more awesome than that!

In this one-liner the printf format is "%c", which means "a character" and the arguments are all letters from a-z separated by space. So what printf does is it iterates over the list outputting each character after character until it runs out of letters.

Here is the output if you run it:

abcdefghijklmnopqrstuvwxyz

This output is without a terminating newline because the format string was "%c" and it doesn't include \n. To have it newline terminated, just add $'\n' to the list of chars to print:

$ printf "%c" {a..z} $'\n'

$'\n' is bash idiomatic way to represent a newline character. printf then just prints chars a to z, and the newline character.

Another way to add a trailing newline character is to echo the output of printf:

$ echo $(printf "%c" {a..z})

This one-liner uses command substitution, which runs printf "%c" {a..z} and replaces the command with its output. Then echo prints this output and adds a newline itself.

Want to output all letters in a column instead? Add a newline after each character!

$ printf "%c\n" {a..z}

Output:

a
b
...
z

Want to put the output from printf in a variable quickly? Use the -v argument:

$ printf -v alphabet "%c" {a..z}

This puts abcdefghijklmnopqrstuvwxyz in the $alphabet variable.

Similarly you can generate a list of numbers. Let's say from 1 to 100:

$ echo {1..100}

Output:

1 2 3 ... 100

Alternatively, if you forget this method, you can use the external seq utility to generate a sequence of numbers:

$ seq 1 100

3. Pad numbers 0 to 9 with a leading zero

$ printf "%02d " {0..9}

Here we use the looping abilities of printf again. This time the format is "%02d ", which means "zero pad the integer up to two positions", and the items to loop through are the numbers 0-9, generated by the brace expansion (as explained in the previous one-liner).

Output:

00 01 02 03 04 05 06 07 08 09

If you use bash 4, you can do the same with the new feature of brace expansion:

$ echo {00..09}

Older bashes don't have this feature.

4. Produce 30 English words

$ echo {w,t,}h{e{n{,ce{,forth}},re{,in,fore,with{,al}}},ither,at}

This is an abuse of brace expansion. Just look at what this produces:

when whence whenceforth where wherein wherefore wherewith wherewithal whither what then thence thenceforth there therein therefore therewith therewithal thither that hen hence henceforth here herein herefore herewith herewithal hither hat

Crazy awesome!

Here is how it works - you can produce permutations of words/symbols with brace expansion. For example, if you do this,

$ echo {a,b,c}{1,2,3}

It will produce the result a1 a2 a3 b1 b2 b3 c1 c2 c3. It takes the first a, and combines it with {1,2,3}, producing a1 a2 a3. Then it takes b and combines it with {1,2,3}, and then it does the same for c.

So this one-liner is just a smart combination of braces that when expanded produce all these English words!

5. Produce 10 copies of the same string

$ echo foo{,,,,,,,,,,}

This one-liner uses the brace expansion again. What happens here is foo gets combined with 10 empty strings, so the output is 10 copies of foo:

foo foo foo foo foo foo foo foo foo foo foo

6. Join two strings

$ echo "$x$y"

This one-liner simply concatenates two variables together. If the variable x contains foo and y contains bar then the result is foobar.

Notice that "$x$y" were quoted. If we didn't quote it, echo would interpret the $x$y as regular arguments, and would first try to parse them to see if they contain command line switches. So if $x contains something beginning with -, it would be a command line argument rather than an argument to echo:

x=-n
y=" foo"
echo $x$y

Output:

foo

Versus the correct way:

x=-n
y=" foo"
echo "$x$y"

Output:

-n foo

If you need to put the two joined strings in a variable, you can omit the quotes:

var=$x$y

7. Split a string on a given character

Let's say you have a string foo-bar-baz in the variable $str and you wish to split it on the dash and iterate over it. You can simply combine IFS with read to do it:

$ IFS=- read -r x y z <<< "$str"

Here we use the read x command that reads data from stdin and puts the data in the x y z variables. We set IFS to - as this variable is used for field splitting. If multiple variable names are specified to read, IFS is used to split the line of input so that each variable gets a single field of the input.

In this one-liner $x gets foo, $y gets bar, $z gets baz.

Also notice the use of <<< operator. This is the here-string operator that allows strings to be passed to stdin of commands easily. In this case string $str is passed as stdin to read.

You can also put the split fields and put them in an array:

$ IFS=- read -ra parts <<< "foo-bar-baz"

The -a argument to read makes it put the split words in the given array. In this case the array is parts. You can access array elements through ${parts[0]}, ${parts[1]}, and ${parts[0]}. Or just access all of them through ${parts[@]}.

8. Process a string character by character

$ while IFS= read -rn1 c; do
    # do something with $c
done <<< "$str"

Here we use the -n1 argument to read command to make it read the input character at a time. Similarly we can use -n2 to read two chars at a time, etc.

9. Replace "foo" with "bar" in a string

$ echo ${str/foo/bar}

This one-liner uses parameter expansion of form ${var/find/replace}. It finds the string find in var and replaces it with replace. Really simple!

To replace all occurrences of "foo" with "bar", use the ${var//find/replace} form:

$ echo ${str//foo/bar}

10. Check if a string matches a pattern

$ if [[ $file = *.zip ]]; then
    # do something
fi

Here the one-liner does something if $file matches *.zip. This is a simple glob pattern matching, and you can use symbols * ? [...] to do matching. Code * matches any string, ? matches a single char, and [...] matches any character in ... or a character class.

Here is another example that matches if answer is Y or y:

$ if [[ $answer = [Yy]* ]]; then
    # do something
fi

11. Check if a string matches a regular expression

$ if [[ $str =~ [0-9]+\.[0-9]+ ]]; then
    # do something
fi

This one-liner tests if the string $str matches regex [0-9]+\.[0-9]+, which means match a number followed by a dot followed by number. The format for regular expressions is described in man 3 regex.

12. Find the length of the string

$ echo ${#str}

Here we use parameter expansion ${#str} which returns the length of the string in variable str. Really simple.

13. Extract a substring from a string

$ str="hello world"
$ echo ${str:6}

This one-liner extracts world from hello world. It uses the substring expansion. In general substring expansion looks like ${var:offset:length}, and it extracts length characters from var starting at index offset. In our one-liner we omit the length that makes it extract all characters starting at offset 6.

Here is another example:

$ echo ${str:7:2}

Output:

or

14. Uppercase a string

$ declare -u var
$ var="foo bar"

The declare command in bash declares variables and/or gives them attributes. In this case we give the variable var attribute -u, which upper-cases its content whenever it gets assigned something. Now if you echo it, the contents will be upper-cased:

$ echo $var
FOO BAR

Note that -u argument was introduced in bash 4. Similarly you can use another feature of bash 4, which is the ${var^^} parameter expansion that upper-cases a string in var:

$ str="zoo raw"
$ echo ${str^^}

Output:

ZOO RAW

15. Lowercase a string

$ declare -l var
$ var="FOO BAR"

Similar to the previous one-liner, -l argument to declare sets the lower-case attribute on var, which makes it always be lower-case:

$ echo $var
foo bar

The -l argument is also available only in bash 4 and later.

Another way to lowercase a string is to use ${var,,} parameter expansion:

$ str="ZOO RAW"
$ echo ${str,,}

Output:

zoo raw

Enjoy!

Enjoy the article and let me know in the comments what you think about it! If you think that I forgot some interesting bash one-liners related to string operations, let me know in the comments also!