How to read CSV file stored in variable - bash

I want to read a CSV file using Shell,
But for some reason it doesn't work.
I use this to locate the latest added csv file in my csv folder
lastCSV=$(ls -t csv-output/ | head -1)
and this to count the lines.
wc -l $lastCSV
Output
wc: drupal_site_livinglab.csv: No such file or directory
If I echo the file it says: drupal_site_livinglab.csv

Your issue is that you're one directory up from the path you are trying to read. The quick fix would be wc -l "csv-output/$lastCSV".
Bear in mind that parsing ls -t though convenient, isn't completely robust, so you should consider something like this to protect you from awkward file names:
last_csv=$(find csv-output/ -mindepth 1 -maxdepth 1 -printf '%T#\t%p\0' |
sort -znr | head -zn1 | cut -zf2-)
wc -l "$last_csv"
GNU find lists all files along with their last modification time, separating the output using null bytes to avoid problems with awkward filenames.
if you remove -maxdepth 1, this will become a recursive search
GNU sort arranges the files from newest to oldest, with -z to accept null byte-delimited input.
GNU head -z returns the first record from the sorted list.
GNU cut -z at the end discards the timestamp, leaving you with only the filename.
You can also replace find with stat (again, this assumes that you have GNU coreutils):
last_csv=$(stat csv-output/* --printf '%Y\t%n\0' | sort -znr | head -zn1 | cut -zf2-)

Related

How to grab the last result of a find command?

The result of my find command produces the following result
./alex20_0
./alex20_1
./alex20_2
./alex20_3
I saved this result as a variable. now the only part I really need is whatever the last part is or essentially the highest number or "latest version".
So from the above string all I need to extract is ./alex20_3 and save that as a variable. Is there a way to just extract whatever the last directory is outputted from the find command?
I would do the last nth characters command to extract it since its already in order, but it wouldn't be the same number of characters once we get to version ./alex20_10 etc.
Try this:
your_find_command | tail -n 1
find can list your files in any order. To extract the latest version you have to sort the output of find. The safest way to do this is
find . -maxdepth 1 -name "string" -print0 | sort -zV | tail -zn1
If your implementation of sort or tail does not support -z and you are sure that the filenames are free of line-breaks you can also use
find . -maxdepth 1 -name "string" -print | sort -V | tail -n1
There could be multiple ways to achieve this -
Using the 'tail' command (as suggested by #Roadowl)
find branches -name alex* | tail -n1
Using the 'awk' command
find branches -name alex* | awk 'END{print}'
Using the 'sed' command
find branches -name alex* | sed -e '$!d'
Other possible options are to use a bash script, perl or any other language. You best bet would be the one that you find is more convenient.
Since you want the file name sorted by the highest version, you can try as follows
$ ls
alex20_0 alex20_1 alex20_2 alex20_3
$ find . -iname "*alex*" -print | sort | tail -n 1
./alex20_3

How to create argument variable in bash script

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory.
I was trying to assign n and the name of sub-directory into arguments like $1, $2.
Current directory: Greetings
Sub-directory: language_files, others
Sub-directory: English, German, French
Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….
I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.
For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:
langs=$1
n=$2
for langs in language_files/;
Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"
The result I wanted was:
$ ./script1.sh English 2
The file has 1100 bytes!
The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.
no need for looping
find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"
for byte counting you should use -c, since -m is for char counting (it may be the same for you).
You don't use the loop variable in the script anyway.
Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:
count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')
That should give you the $count you need. Then you can echo it however you like.
EXPLANATION
If you wish to learn how it works:
The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)
If you wish to be cleverer, then you can optimize as #karafka has done.

find - grep taking too much time

First of all I'm a newbie with bash scripting so forgive me if i'm making easy mistakes.
Here's my problem. I needed to download my company's website. I accomplish this using wget with no problems but because some files have the ? symbol and windows doesn't like filenames with ? I had to create a script that renames files and also update the source code of all files that calls the rename file.
To accomplish this I use the following code:
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
grep -rl "$SUBSTRING" * | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
done
This is having 2 problems.
This is taking way too long, I've waited more than 5 hours and is still going.
It looks like is doing a append in the source code because when i stop the script and search for changes the URL is repeated like 4 times ( or more ).
Thanks all for your comments, i will try the 2 separete step and see, also, just as FYI, there are 3291 files that were downloaded with wget, still thinking that using bash scripting is prefer over other tools for this?
Seems odd that a file would have ? in it. Website URLs have ? to indicate passing of parameters. wget from a website also doesn't guarantee you're getting the site, especially if server side execution takes place, like php files. So, I suspect as wget does its recursiveness, it's finding url's passing parameters and thus creating them for you.
To really get the site, you should have direct access to the files.
If I were you, I'd start over and not use wget.
You may also be having issues with files or directories with spaces in their name.
Instead of that line with xargs, you're already doing one file at a time, but grepping for all recursively. Just do the sed on the new file itself.
Ok, here's the idea (untested):
in the first loop, just move the files and compose a global sed replacement file
once it is done, just scan all the files and apply sed with all the patterns at once, thus saving a lot of read/write operations which are likely to be the cause of the performance issue here
I would avoid to put the current script in the current directory or it will be processed by sed, so I suppose that all files to be processed are not in the current dir but in data directory
code:
sedfile=/tmp/tmp.sed
data=data
rm -f $sedfile
# locate ourselves in the subdir to preserve the naming logic
cd $data
# rename the files and compose the big sedfile
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
echo "s/$SUBSTRING/$NEWSTRING/g" >> $sedfile
done
# now apply the big sedfile once on all the files:
# if you need to go recursive:
find . -type f | xargs sed -i -f $sedfile
# if you don't:
sed -i -f $sedfile *
Instead of using grep, you can use the find command or ls command to list the files and then operate directly on them.
For example, you could do:
ls -1 /path/to/files/* | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
Here's where I got the idea based on another question where grep took too long:
Linux - How to find files changed in last 12 hours without find command

Using 'find' to select unknown patterns in file names with bash

Let's say I have a directory with 4 files in it.
path/to/files/1_A
path/to/files/1_B
path/to/files/2_A
path/to/files/2_B
I want to create a loop, which on each iteration, does something with two files, a matching X_A and X_B. I need to know how to find these files, which sounds simple enough using pattern matching. The problem is, there are too many files, and I do not know the prefixes aka patterns (1_ and 2_ in the example). Is there some way to group files in a directory based on the first few characters in the filename? (Ultimately to store as a variable to be used in a loop)
You could get all the 3-character prefixes by printing out all the file names, trimming them to three characters, and then getting the unique strings.
find -printf '%f\n' | cut -c -3 | sort -u
Then if you wanted to loop over each prefix, you could write a loop like:
find -printf '%f\n' | cut -c -3 | sort -u | while IFS= read -r prefix; do
echo "Looking for $prefix*..."
find -name "$prefix*"
done

Selecting single directory that satisfies certain pattern

I would like to be able to get name of the first directory that matches a certain pattern, say:
~/dir-a/dir-b/dir-*
That is, if the directory dir-b contained directories dir-1, dir-2, and dir-3, I would get dir-1 (or, alternatively, dir-3).
The option listed above works if there is only one subdirectory in dir-b, but obviously fails when there are more of them.
You can use bash arrays, like:
content=(~/dir-a/dir-b/dir-*) #stores the content of a directory into array "content"
echo "${content[0]}" #echoes the 1st
echo ${content[${#content[#]}-1]} #echoes the last element of array "comtent"
#or, according to #konsolebox'c comments
echo "${content[#]:(-1)}"
Another method, make a bash function like:
first() { set "$#"; echo "$1"; }
#and call it
first ~/dir-a/dir-b/dir-*
If you want sort files, not by name but by modification time, you can use the next script:
where="~/dir-a/dir-b"
find $where -type f -print0 | xargs -0 stat -f "%m %N" | sort -rn | head -1 | cut -f2- -d" "
decomposed
the find finds files by defined criteria
the xargs runs the stat command for every found file and prints the result as "modification_time filename"
the sort sorts the result by the time
the head gets the first of them
and the cut cuts the unvanted time field
You can adjust the find with -mindepth 1 -maxdepth 1 to don't descend deeper.
In linux, it can be shorter, (using -printf format), but this works in OS X too...

Resources