How to create dynamic substring with awk [closed] - bash

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Let say i have file like below.
ABC_DEF_G-1_P-249_8.CSV
I want to cut to be like this below.
ABC_DEF_G-1_P-249_
I use this awk command to do that like below.
ls -lrt | grep -i .CSV | tail -1 | awk -F ' ' '{print $8}' | cut -c 1-18
Question is, if the number 1, is growing, how to make the substring is dynamic
example like below...
ABC_DEF_G-1_P-249_
....
ABC_DEF_G-10_P-249_
ABC_DEF_G-11_P-249_
...
ABC_DEF_G-1000_P-249_

To display the file names of all .CSV without everything after the last underscore, you can do this:
for fname in *.CSV; do echo "${fname%_*}_"; done
This removes the last underscore and evertyhing that follows it (${fname%_*}), and then appends an underscore again. You can assign that, for example, to another variable.
For an example file list of
ABC_DEF_G-1_P-249_9.CSV
ABC_DEF_G-10_P-249_8.CSV
ABC_DEF_G-1000_P-249_4.CSV
ABC_DEF_G-11_P-249_7.CSV
ABC_DEF_G-11_P-249_7.txt
this results in
$ for fname in *.CSV; do echo "${fname%_*}_"; done
ABC_DEF_G-1_P-249_
ABC_DEF_G-10_P-249_
ABC_DEF_G-1000_P-249_
ABC_DEF_G-11_P-249_

You can do this with just ls and grep
ls -1rt | grep -oP ".*(?=_\d{1,}\.CSV)"
If you are concerned about the output of ls -1, as mentioned in the comments you can use find as well
find -type f -printf "%f\n" | grep -oP ".*(?=_\d{1,}\.CSV)"
Outputs:
ABC_DEF_G-1_P-249
ABC_DEF_G-1000_P-249_
This assumes you want everything except the _number.CSV, if it needs to be case insensitive then you can the -i flag to the grep. The \d{1,} allows for the number between _ and .CSV to grow from one to many digits. Also doing it this way you don't have to worry about if the number 1 in your example increases:
ABC_DEF_G-1_P-249

You should not be parsing ls. Perhaps you are looking for something like this:
base=$(printf "%s\n" * | grep -i .CSV | tail -1 | awk -F ' ' '{print $8}' | cut -c 1-18)
However, that's a useless use of grep you want to get rid of right there -- Awk does everything grep does, and everything tail does, too, and actually, everything cut does as well. The grep can also be avoided by using a better wildcard, though:
base=$(printf "%s\n" *.[Cc][Ss][Vv] | awk 'END { print substr($8, 1, 18) }')
In the shell itself, you can do much the same thing with no external processes at all. Proposing a suitable workaround would perhaps require a better understanding of what you are trying to accomplish, though.

Related

How can I delete empty line from my ouput by grep? [duplicate]

This question already has answers here:
Remove empty lines in a text file via grep
(11 answers)
Closed 4 years ago.
Exists way to remove empty lines with cat myfile | grep -w #something ?
I looking for simple way for remove empty lines from my output like in the way the presented above.
This really belongs on the codegolfing stackexchange because it's not related to how anyone would ever write a script. However, you can do it like this:
cat myfile | grep -w '.*..*'
It's equivalent to the more canonical grep ., but adds explicit .*s on either side so that it will always match the complete line, thereby satisfying the word boundary conditions imposed by -w
You can pipe your output to awk to easily remove empty lines
cat myfile | grep -w #something | awk NF
EDIT: so... you just want cat myfile | awk NF?
if you have to use grep, you can do grep myfile -v '^[[:blank:]]*$'

How to number output in list in bash? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to number each line that gets outputted when i list a directory, so that instead of typing out the full name of the object, i can identify it with a number in the list. In Bash.
Ex. os-list is a directory I use to store numerous of objects that are ever changing.
os1.xxx.iso is the object name.
From
ls os-list
os1-xxx.iso
os2-xxx.iso
What is the path?: os1-xxx.iso
To
ls os-list
[1]os1-xxx.iso
[2]os2-xxx.iso
What is the path? 1
What is the term that im looking for this kind of operation in bash?
The command select can be used:
files=$(ls os-list)
select choice in ${files[#]}; do
break
done
echo "${choice}"
You can modify this to your needs, just look for more examples with select.
I would change the prompt (PS3="What is the path: ") and replace the break in the select loop (check for a valid response).
I want to number each line that gets outputted when i list a directory
For your exact desired output format:
ls | nl | sed 's/^[ \t]*//' | sed -r 's/^[0-9]*()/[\0]/' | sed 's/\t//'
or
ls | cat -n | sed 's/^[ \t]*//' | sed -r 's/^[0-9]*()/[\0]/' | sed 's/\t//'
If you only want to have number as reference, and not in your exact desired format then simply:
ls | cat -n
or
ls | nl
would suffice, sed pipe is added to enclose given number in square brackets and remove starting/trailing spaces to conform to your desired output. Admitedly, this could be done with awk as well, pipe is not optimized, just given as reference point.
Edit:
with awk like so:
ls | cat -n | awk '{print "[" NR "]"$2}'
Selecting filename based on index (example with index 12 given):
ls | cat -n | awk '{print "[" NR "]"$2}' | grep "^\[12\]" | sed 's/^\[12\]//'
Note of caution: this approach supposes that between listing and selecting no file is added (if file is added in between and your sort order is messed up 12th file in listing and 12th file in select might ended not being the same file).

CSV find blank value in third column KSH

Hi my data set is simple as show below
4,a,1.5
t,6,,
6,t,h
I am trying to use awk or grep to count the rows in which there is a blank in the third colmn. In this case it would be 1 since only the middle one has a blank in the column so far what i have tried is below. The login is trying to use awk to search for a blank string then count it the same with grep find where there is a blank in the third column then count it.
COUNT=$('awk '' $DATAFILE | wc -l')
COUNT=$('grep -e '.*,.*,,' $DATAFILE' | wc -l)
awk -F, '$3==""{c++} END{print c+0}' file
Your grep has to much quotes:
count=$(grep -E ".*,.*,," $DATAFILE | wc -l)
would work a bit, but you do not want to match a line with an emty fourth field.
Better seems to be
count=$(grep -E "^[^,]*,[^,]*,," $DATAFILE | wc -l)
This will still give problems with input like
field1,"field 2 with , insides quotes",,
Your question said nothing about this situation, what do you consider to be the third field here? That would be another question.
Edit:
#Sundeep commented correctly, that you could use the grep -c, avoiding wc -l. I tried to show what was wrong in the OP's answer, but I should have added the advice to use -c.

How to extract the exact api call from failed stack

I need to process the stack similar to below which has a string "oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver" but I need the exact method call that failed. e.g. in this case the exact one is "oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver.attachFile". There are multiple stacks that I have as part of html pages which I need to process and there will be multiple methods failing. How can I do it in shell script? This entire stack is in one line html entry which is causing grep to return the whole stack itself. I tried multiple things but nothing worked clearly. I guess awk or similar regex tool could be a way to go but not sure.
at oracle.adf.view.rich.automation.test.selenium.RichWebDriverTest.getElement(RichWebDriverTest.java:1414)
at oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver.attachFile(ApplcoreWebdriver.java:1460)
at oracle.apps.fnd.applcore.attachments.ui.util.accessor.FndManageAttachmentsPopupAccessor.attachFile(FndManageAttachmentsPopupAccessor.java:1475)
at oracle.apps.fnd.applcore.attachments.ui.util.accessor.FndManageAttachmentsPopupAccessor.updateRowFileAttachment(FndManageAttachmentsPopupAccessor.java:550)
at oracle.apps.fnd.applcore.attachments.ui.AttachmentsBaseSelenium.testLMultiFileAdd_22108390(AttachmentsBaseSelenium.java:1782)
at oracle.javatools.test.WebDriverRunner.run(WebDriverRunner.java:122)
I cannot guarantee it being the most efficient of solutions, but am able so suit the need. Am using all grep, sed combined together.
Splitting the single big html file as multiple lines using cat & tr
cat file | tr ' ' '\n' | grep -w "$search_string" | sed 's/(.*)//'
For testing purposes am using the stack snippet which you shared in the OP.
$ cat file
at oracle.adf.view.rich.automation.test.selenium.RichWebDriverTest.getElement(RichWebDriverTest.java:1414)
at oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver.attachFile(ApplcoreWebdriver.java:1460)
at oracle.apps.fnd.applcore.attachments.ui.util.accessor.FndManageAttachmentsPopupAccessor.attachFile(FndManageAttachmentsPopupAccessor.java:1475)
at oracle.apps.fnd.applcore.attachments.ui.util.accessor.FndManageAttachmentsPopupAccessor.updateRowFileAttachment(FndManageAttachmentsPopupAccessor.java:550)
at oracle.apps.fnd.applcore.attachments.ui.AttachmentsBaseSelenium.testLMultiFileAdd_22108390(AttachmentsBaseSelenium.java:1782)
at oracle.javatools.test.WebDriverRunner.run(WebDriverRunner.java:122)
Running the above command for the search string oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver
$ cat file | tr ' ' '\n' | grep -w "oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver" | sed 's/(.*)//'
oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver.attachFile
Any suggestions simplifying it are welcome.
Well, I think I found a working solution:
for stack in $all_stacks
do
words=`echo $stack | tr ' ' '\n'`
for word in $words
do
search_string="oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver"
if [[ $word == *${search_string}* ]];
then
echo $word
fi
done
done
The idea was to split the line (stack) into words separated by space and process each word by matching with the required string. Then the matching word finally prints the required value (including api call)
Update based on ans from #Inian :
Below is updated one command to achieve this by processing multiple html files which contains error stack to be processed.
find . -name '*-errors.html' | xargs cat | tr ' ' '\n' | grep -w "oracle.apps.fnd.applcore.test.selenium.ApplcoreWebdriver" | sed 's/(.*)//' | cut -d '<' -f1 | sort -u

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Resources