2014/09/08: grep and binaries

grep(1) behaves differently on what it considers "binary files". Already a single "strange" character suffices to make the file be considered "binary" This differs from the traditional g/re/p in ed(1).


/tmp>printf "This line contains a bad character\0\nHere everything is fine.\n" > foo.txt
/tmp>cat foo.txt | xxd
0000000: 5468 6973 206c 696e 6520 636f 6e74 6169  This line contai
0000010: 6e73 2061 2062 6164 2063 6861 7261 6374  ns a bad charact
0000020: 6572 000a 4865 7265 2065 7665 7279 7468  er..Here everyth
0000030: 696e 6720 6973 2066 696e 652e 0a         ing is fine..
/tmp>grep re foo.txt
Binary file foo.txt matches
/tmp>ed foo.txt
61
g/re/p
Here everything is fine.
Q
/tmp>

Note that the offending character does not occur in the line searched for. Therefore, the string Binary file foo.txt matches can come quite surprising in scripts using constructions like grep re foo.txt | ... especially as that string usually does not match the regular expression searched for.

Fortunately, there is the -a option to tell grep to tread the input unconditionally as text file.


/tmp>grep -a re foo.txt
Here everything is fine.
/tmp>