I have a huge csv file (on order of terabytes).
Now, I want to insert one row which is a header to the the top.
For example if input.csv looks like this:
1,2,3,4
22,3,23,1
I want it to look like
id1,id2,id3,id4
1,2,3,4
and so on
How do i do this from shell, terminal, awk, bash?/
In place, using sed:
sed -i 1i"id1,id2,id3,id4" file.csv
edit:
As #Ed Morton points out, using sed with the -i switch sed edits the file in place, and can therefore be dangerous when editing large files. If you supply a prefix after the -i option then sed creates a backup. So something like this would be safer:
sed -i.bak 1i"id1,id2,id3,id4" file.csv
The original file will then be located in file.csv.bak
This is that simple as :
{ echo "id1,id2,id3,id4"; cat file.csv; } > newfile.csv
using simple shell concatenation.
EDIT
after discussion thread below, I propose this :
create a file with your header, said head.txt
Then :
cat head.txt file.csv > newfile.csv
Edit. When I wrote this answer, I overlooked the "terabyte" part of the question. Hence, do not use the method presented here. I still leave this post, as it advertises the use of this wonderful tool, ed, the standard text editor.
As usual, ed is the standard text editor. The solution using sed -i doesn't, as it mentions, "edit the file in place". Instead, it outputs its content to a temporary file, and then renames this file to the original one. That's really not good for large files!
Using ed instead really edits the file. Something along the following lines:
#!/bin/bash
file="input.csv"
{
ed -s "$file" <<EOF
1
i
id1,id2,id3,id4
.
wq
EOF
} > /dev/null
Explanation: 1 goes to the first line, i goes into insert mode, then we insert id1,id2,id3,id4 then . to go back to normal mode, and wq to write and quit.
With this method, you're really editing the file and it's twice faster than the sed method. Also, ed is known to be "large file safe"!
Done.
There's no easy way, you're going to have to rewrite the file. Probably the safest way is to
( echo "id1,id2,id3,id4" ; cat file ) > newFile && rm file
IHTH
echo "id1,id2,id3,id4" >> data.csv
Related
I have a problem: I have a file that, if I knew how, I would like to edit from the command. I would like to locate the file by content on that line.
I am in CyberPatriot, and my team is second in my state. I know someone who is on the number one team and I know one of the people on the first team. It kills me so I want to make a list of commands that I can go off of to make it faster and more efficient.
Imagine I had this file:
example
oof
goo
random
yes
and I wanted to change it to this:
example
oof
goo
random 'added text'
yes
How do I do so?
I know I can use the echo command to add text to the end of a file, but I don't know how to add text to the end of a specific line.
Thanks, Owen
You can use sed for this purpose.
sed 's/random/& Hello World/' file
to append text to the matched string.
You can use ^random$ to make sure the entire line is matched, before appending.
If you need to modify the file directly, you can use the -i flag, which facilitates in-place editing. Further, using -i.bak creates a backup of the original file first before modifying it, as in
sed -i.bak 's/random/& Hello World/' file
The original copy of the file can be found in file.bak
More about sed : https://www.gnu.org/software/sed/manual/sed.html
Use something like below
sed '4!d' file | xargs -I{} sed -i "4s/{}/{} \'added text\'/" file
Basically in the above command, we are getting the 4th line of the file using sed sed '4!d' file and then using this line to replace it with the same text and some new text(added text)
I am on macOS High Sierra, and I've had a few issues with sed,
after spending a day on Google and this site, I've honestly tried everything I could think of and was suggested.
I have an example.txt file with 3 lines.
line1
line2
line3
The 3 lines can be anything, I do not know them upfront. (scenario)
And at some point I do know what the line is going to be.
All I wish to achieve is to use 'whatever' onliner that basically says:
in that file, replace line 2, with the new data.
sed -i "2s/.*/$NEW_DATA/" "$FILENAME"
This should work, but on macOS, this does not.
sed -ie "2s/.*/$NEW_DATA/" "$FILENAME"
Note the e? Someone on here suggested to add an e, and yes, that works.. but it means it adds the e behind the .txt. I end up with 2 files, example.txt and example.txte
sed -i'' "2s/.*/$NEW_DATA/" "$FILENAME"
Note the '' behind -i, and yes, without a space? This should work too.
It doesn't, it will complain that command c requires \
I could go on, but the request is quite simple:
on macOS, in a bash shell script, how do I simply replace a specified line of text in a .txt file, with a completely new line of text -- not knowing what the line of text was before?
If this can be a simple macOS one liner with awk or whatever, that's fine. It doesn't have to be sed. But when I search this site, it seems to be the recommended one to go with in this regards.
From the sed man page in macOS:
-i extension
Edit files in-place, saving backups with the specified extension.
If a zero-length extension is given, no backup will be saved.
Therefore this can be used to replace line 2 without keeping backup:
sed -i '' "2s/.*/$NEW_DATA/" testfile.txt
If the goal is just to replace contents of line 2 awk could also be used, for example:
awk 'NR==2{$0="your content"}1' testfile.txt > testfile.tmp && mv testfile.tmp testfile.txt
I have a logfile that is starting to grow in size, and I need to remove certain lines that match a given pattern from it. I used grep -nr for extracting the target lines and copied them in a temp file, but I can't figure how can I tell sed to delete those lines from the log file.
I have found something similar here: Delete line from text file with line numbers from another file but this doesn't actually delete the lines, it only prints the wanted output.
Can anyone give me a hint?
Thank you!
I think, what you really need is sed -i '/pattern/d' filename.
But to answer your question:
How to delete lines matching the line numbers from another file:
(Assuming that there are no special characters in the line_numbers file, just numbers one per line...)
awk 'NR==FNR{a[$0]=1; next}; !(FNR in a)' line_numbers input.log
If you already have a way of printing what you want to standard output, there's no reason why you can't just overwrite the original file. For example, to only print lines that don't match a pattern, you could use:
grep -v 'pattern' original > tmp && mv tmp original
This redirects the output of the grep command to a temporary file, then overwrites the original file. Any other solution that does this "in-place" is only pretending to do so, after all.
There are numerous other ways to do this, using sed as suggested in the comments, or awk:
awk '!/pattern/' original > tmp && mv tmp original
If you want to use sed and your file is growing continuously, then you will have to execute sed -i '/REGEX/d' FILENAME more frequently.
Instead, you can make use of syslog-ng. You just have to edit the /etc/syslog-ng/syslog-ng.conf, wherein you need to create/edit an appropriate filter (somewhat like: f_example { not match(REGEX); }; ), save file, restart the service and you're done.
The messages containing that particular pattern will not be dumped in the log file. In this way, your file would not only stop growing, but also you need not process it periodically using sed or grep.
Reference
To remove a line with sed, you can do:
sed "${line}d" <originalLogF >tmpF
If you want remove several lines, you can pass a sed script. Here I delete the first and the second lines:
sed '1d;2d' <originalLogF >tmpF
If your log file is big, you probably have two pass. The first one to generate the sed script in a file, and a second one to apply the sed script. But it will be more efficient to have only one pass if you be able to recognize the pattern directly (and do not use "${line}d" at all). See Tom Fenech or anishsane answers, I think it is what you really need.
By the way you have to preserve the inode (not only the file name) because most of logger keep the file opened. So the final command (if you don't use sed -i) should be:
cat tmpF >originalLogF
By the way, the "-i" option (sed) is NOT magic, sed will create a temporary buffer, so if we have concurrent append to the log file, you can loose some lines.
I am looking for a way to insert a line to a specific position in a file.
Kinda like this: using sed
But the problem is, i want to write the output into the same file where i get the input and i need to do it with more than one file.
sadly this does not work: sed '3iline 3' input.txt > input.txt
this would work: sed '3iline 3' input.txt > tmp.txt && cat tmp.txt > input.txt
but i doesn't work with find and exec anymore...
i hoped something like this would be possible:
find /usr/local/share/ -iname 'xyz.htm' -exec sed '19i<p>TEXT</p>' {} > {} \;
but it doesn't work like this, so i ended up writing a short script which worked. But it still bothers me, because i keep thinking it should be possible to do it short (maybe onliner) and easy.
I hope someone can point me in the right direction.
If I understood correctly, You need -i option.
Which will do the changes in original file. It is not require any other piping and redirection.
sed -i.bak '3iline 3' input.txt
This will take the backup of original file with .bak extension.
From man sed :
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
Specifically, I'm using a combination of >> and tee in a custom alias to store new Homebrew updates in a text file, as well as output on screen:
alias bu="echo `date "+%Y-%m-%d at %H:%M"` \
>> ~/Documents/Homebrew\ Updates.txt && \
brew update | tee -a ~/Documents/Homebrew\ Updates.txt"
Question: What if I wish to prepend this output in my textfile, i.e. placed at the beginning of the file as opposed to appending it to the end?
Edit1: As someone reported in the answers below, the use of temp files might be a good approach, which at least helped me partially:
targetLog="~/Documents/Homebrew\ Updates.txt"
alias bu="(brew update | cat - $targetLog \
> /tmp/out1 && mv /tmp/out1 $targetLog \
&& echo `date "+%Y-%m-%d at %H:%M":%S` | \
cat - $targetLog > /tmp/out2 \
&& mv /tmp/out2 $targetLog)"
But the problem is the output to STDOUT (previously made possible by tee), which I'm not sure can be incorporated in this tempfile approach …?
sed will happily do that for you, using -i to edit in place, eg.
sed -i -e "1i `date "+%Y-%m-%d at %H:%M"`" some_file
This works by creating an output file:
Let's say we have the initial contents on file.txt
echo "first line" > file.txt
echo "second line" >> file.txt
So, file.txt is our 'bottom' text file. Now prepend into a new 'output' file
echo "add new first line" | cat - file.txt > output.txt # <--- Just this command
Now, output has the contents the way we want. If you need your old name:
mv output.txt file.txt
cat file.txt
The only simple and safe way to modify an input file using bash tools, is to use a temp file, eg. sed -i uses a temp file behind the scenes (but to be robust sed needs more).
Some of the methods used have a subtle "can break things" trap, when, rather than running your command on the real data file, you run it on a symbolic link (to the file you intend to modify). Unless catered for correctly, this can break the link and convert it into a real file which receives the mods and leaves the original real file without the intended mods and without the symlink (no error exit-code results)
To avoid this with sed, you need to use the --follow-symlinks option.
For other methods, just be aware that it needs to follow symlinks (when you act on such a link)
Using a temp file, then rm temp file works only if "file" is not a symlink.
One safe way is to use sponge from package moreutils
Unlike a shell redirect, sponge soaks up all its input before
opening
the output file. This allows for constructing pipelines that read from
and write to the same file.
sponge is a good general way to handle this type of situation.
Here is an example, using sponge
hbu=~/'Documents/Homebrew Updates.txt'
{ date "+%Y-%m-%d at %H:%M"; cat "$hbu"; } | sponge "$hbu"
Simplest way IMO would be to use echo and cat:
echo "Prepend" | cat - inputfile > outputfile
Or for your example basically replace the tee -a ~/Documents/Homebrew\ Updates.txt with cat - ~/Documents/Homebrew\ Updates.txt > ~/Documents/Homebrew\ Updates.txt
Edit: As stated by hasturkun this won't work, try:
echo "Prepend" | cat - file | tee file
But this isn't the most efficient way of doing it any more...
Similar to the accepted answer, however if you are coming here because you want to prepend to the first line - rather than prepend an entirely new line - then use this command.
sed -i "1 s/^/string_replacement/" some_file
The -i flag will do a replacement within the file (rather than creating a new file).
Then the 1 will only do the replacement on line 1.
Finally, the s command is used which has the following syntax s/find/replacement/flags.
In our case we don't need any flags. The ^ is called a caret and it is used to represent the very start of a string.
Try this http://www.unix.com/shell-programming-scripting/42200-add-text-beginning-file.html
There is no direct operator or command AFAIK.You use echo, cat, and mv to get the effect.
{ date; brew update |tee /dev/tty; cat updates.txt; } >updates.txt.new
mv updates.txt.new updates.txt
I've no idea why you want to do this. It's pretty standard that logs like this have later entries appearing, well, later in the file.