Task Scheduler? Command-line? Need "Export URLs" to csv file daily.

Task Scheduler? Command-line? Need "Export URLs" to csv file daily. - google-search-appliance

Is there a way to export daily all indexed URLs to a CSV or ZIP file?
So, a daily scheduled task of:
Index => Diagnostics => Export URLs => Generate the gzip file => Download
Thank you

There is no API for this. You would need to use some sort of screen capture/macro recording software to do this, from what I've seen.

Related

Use Spring Integration to merge two files into a single file via SFTP

I have two files in SFTP server which are large in size. I have one file in folder_A/A.txt. The second file is in folder_B/B.txt. I want to append contents of B.txt to A.txt and store them in folder_C/C.txt in SFTP server. One way is to download the files and read the content create new file and then upload the file to SFTP folder_C/C.txt . Is there any efficient way to do this task using SprinBoot without actually downloading the files and do the same over network?

Something like this:
RemoteFileTemplate<LsEntry> template = new RemoteFileTemplate<>(sftpSessionFactory);
template.execute((SessionCallbackWithoutResult<LsEntry>) session -> {
session.append(session.readRaw("folder_A/A.txt"), "folder_C/C.txt");
session.append(session.readRaw("folder_B/B.txt"), "folder_C/C.txt");
});
See more info in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/sftp.html#sftp-rft

Shell script to divide csv files automatically using thread group count

Currently I am manually dividing the csv files for distributed testing of Jmeter from 3 machines. But I need a shell script which will automatically divide the csv files based on thread group count

If you want to give unique values for each thread, you can simply do this by changing some values on CSV Data Set Config:
Recycle on EOF : False
Stop Thread on EOF : True
Sharing mode: All Threads
After these values, each thread on your jmx will get unique values from your CSV file.

Normally it's not necessary to split the CSV file for the Thread Groups count, you just need to choose the appropriate Sharing Mode of the CSV Data Set Config
As an exception here is an example shell script which splits the CSV file by the number of thread groups in the .jmx script:
#!/usr/bin/env bash
threadGroups=`grep -c "\"ThreadGroup\"" test.jmx`
split --suffix-length="${threadGroups}" --additional-suffix=.csv -d --number="l/${threadGroups}" "test.csv" "."/
Replace test.jmx and test.csv with the names/locations of your .jmx and .csv files
it will generate .CSV files in form of 000.csv, 001.csv, etc.
More information: Split Command in Linux with Examples

How to use airflow for real time data processing

I have a scenario where i want to process csv file and load to someother database:
Cases
pic csv file and load to mysql with the same name as csv
then do some modification on loaded rows using python task file
after that extract data from mysql and load to some other database
CSV files are coming from remote server to one airflow server in a folder.
We have to pick these csv file and process through python script.
Suppose i pick one csv file then i need to pass this csv file to rest of the operator in a dependency manner like
filename : abc.csv
task1 >> task2 >> task3 >>task4
So abc.csv should be available for all the task.
Please tell how to proceed.

Your scenarios don't have anything to do with realtime. This is ingesting on a schedule/interval. Or perhaps you could use a SensorTask Operator t detect data availability.
Implement each of your requirements as functions and call them from operator instances.
Add the operators to a DAG with a schedule appropriate for your incoming feed.
How you pass and access params is
-kwargs python_callable when initing an operator
-context['param_key'] in execute method when extending an operator
-jinja templates
relevant...
airflow pass parameter from cli
execution_date in airflow: need to access as a variable

The way tasks communicate in Airflow is using XCOM, but it is meant for small values, not for file content.
If you want your tasks to work with the same csv file you should save it on some location and then pass in the XCOM the path to this location.
We are using the LocalExecutor, so the local file system is fine for us.
We decided to create a folder for each dag with the name of the dag. Inside that folder we generate a folder for each execution date (we do this in the first task, that we always call start_task). Then we pass the path of this folder to the subsequent tasks via Xcom.
Example code for the start_task:
def start(share_path, **context):
execution_date_as_string = context['execution_date'].strftime(DATE_FORMAT)
execution_folder_path = os.path.join(share_path, 'my_dag_name', execution_date_as_string)
_create_folder_delete_if_exists(execution_folder_path)
task_instance = context['task_instance']
task_instance.xcom_push(key="execution_folder_path", value=execution_folder_path)
start_task = PythonOperator(
task_id='start_task',
provide_context=True,
python_callable=start,
op_args=[share_path],
dag=dag
)
The share_path is the base directory for all dags, we keep it in the Airflow variables.
Subsequent tasks can get the execution folder with:
execution_folder_path = task_instance.xcom_pull(task_ids='start_task', key='execution_folder_path')

How to convert .csv to .xls

I have a simple .csv file.
Is it possible to convert it to .xls using the command line tool ssconvert?
I would also need to specify the name of the sheet.

ln -s input.csv MySheetName
ssconvert MySheetName output.xls
The OP asked how to convert csv to xls while controlling the sheet name in the output.
The generated .xls file will use the name of the input CSV file as the sheet name, so you can symlink the .csv to anything you want (or rename the input file) to produce the desired result.
The previous answer implies that --list-exporters leads to a solution, but it merely lists exporter names with no information about their options, and no options are documented in the man page for xls-exporters. Experimentally, none of the exporters which can create .xls accept options (they fail with "The file saver does not take options" if you use -O).

Yes, it is possible.
You must specify names with extensions as input and output files.
For example:
ssconvert in.csv out.xls
Using --list-importers and --list-exporters options can take a look to available formats.

How can I modify .xfdl files? (Update #1)

The .XFDL file extension identifies XFDL Formatted Document files. These belong to the XML-based document and template formatting standard. This format is exactly like the XML file format however, contains a level of encryption for use in secure communications.
I know how to view XFDL files using a file viewer I found here. I can also modify and save these files by doing File:Save/Save As. I'd like, however, to modify these files on the fly. Any suggestions? Is this even possible?
Update #1: I have now successfully decoded and unziped a .xfdl into an XML file which I can then edit. Now, I am looking for a way to re-encode the modified XML file back into base64-gzip (using Ruby or the command line)

If the encoding is base64 then this is the solution I've stumbled upon on the web:
"Decoding XDFL files saved with 'encoding=base64'.
Files saved with:
application/vnd.xfdl;content-encoding="base64-gzip"
are simple base64-encoded gzip files. They can be easily restored to XML by first decoding and then unzipping them. This can be done as follows on Ubuntu:
sudo apt-get install uudeview
uudeview -i yourform.xfdl
gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xfdl
The first command will install uudeview, a package that can decode base64, among others. You can skip this step once it is installed.
Assuming your form is saved as 'yourform.xfdl', the uudeview command will decode the contents as 'UNKNOWN.001', since the xfdl file doesn't contain a file name. The '-i' option makes uudeview uninteractive, remove that option for more control.
The last command gunzips the decoded file into a file named 'yourform-unpacked.xfdl'.
Another possible solution - here
Side Note: Block quoted < code > doesn't work for long strings of code

The only answer I can think of right now is - read the manual for uudeview.
As much as I would like to help you, I am not an expert in this area, so you'll have to wait for someone more knowledgable to come down here and help you.
Meanwhile I can give you links to some documents that might help you:
UUDeview Home Page
Using XDFLengine
Gettting started with the XDFL Engine
Sorry if this doesn't help you.

You don't have to get out of Ruby to do this, can use the Base64 module in Ruby to encode the document like this:
irb(main):005:0> require 'base64'
=> true
irb(main):007:0> Base64.encode64("Hello World")
=> "SGVsbG8gV29ybGQ=\n"
irb(main):008:0> Base64.decode64("SGVsbG8gV29ybGQ=\n")
=> "Hello World"
And you can call gzip/gunzip using Kernel#system:
system("gzip foo.something")
system("gunzip foo.something.gz")

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Task Scheduler? Command-line? Need "Export URLs" to csv file daily. - google-search-appliance

Is there a way to export daily all indexed URLs to a CSV or ZIP file? So, a daily scheduled task of: Index => Diagnostics => Export URLs => Generate the gzip file => Download Thank you

There is no API for this. You would need to use some sort of screen capture/macro recording software to do this, from what I've seen.

Related

Use Spring Integration to merge two files into a single file via SFTP

Shell script to divide csv files automatically using thread group count

How to use airflow for real time data processing

How to convert .csv to .xls

How can I modify .xfdl files? (Update #1)

Categories

Resources